The GEP metric is based on 15 attributes, which are retrieved from ConceptNet.
By probing gender indicators in the input text (e.g., "a woman" or "a man"), we quantify the frequency differences of various presentation-centric attributes (see the table above) as the GEP vector (a 15-dim vector for each model in one setting):
The GEP vectors for three models in the neutral setting (up) and the explicit setting (bottom). The y axes are presentation differences ("woman" - "man") in symmetric log scaling.
For instance, the frequency difference on "boots" is calculated by subtracting the frequency of "boots" in images generated from "A woman" from the frequency of "boots" in images generated from "A man".
The GEP score is the normalized l norm of the GEP vector, which facilitates the comparison between models (see the table below). By definition, the GEP score ranges from 0 to 1, while a lower GEP score suggests a more negligible presentation difference in predefined attributes.
CS: CLIP Score, GEP: GEP Score. CogView refers to CogView2, DALLE refers to DALLE-2, Stable refers to Stable Diffusion.