How biased is a generative image creator?

Published in

Brass For Brain

7 min readOct 14, 2023

**Disclaimer: The views expressed in this article are solely my own and do not reflect the opinions, beliefs, or positions of my employer. Any opinions or information provided in this article are based on my personal experiences and perspectives. Readers are encouraged to form their own opinions and seek additional information as needed.**

INTRODUCTION

Image generated by Dall-E 3 “Disney movie” https://www.bing.com/images/create/disney-movies/65216d2f62a649d0aa2278dcbfca16e8?id=xjrAE8Vroy6i3AK7r3Awaw%3d%3d&view=detailv2&idpp=genimg&FORM=GCRIDP&mode=overlay

I vividly recall the release of Disney’s “Encanto,” which sparked a range of reactions depending on one’s perspective. On one hand, parents took this to social media to express their joy at seeing their children relate to the main character, finding a resemblance. On the flip side, the Colombian community was less pleased with the film, feeling that it did not authentically represent Colombian’s culture. The significance of imagery, particularly for children who tend to mirror what they see, underscore the profound impact that Disney movies can have on a child’s development.

Now, picture Jenny, a mother of twins, who begins to ponder after watching this Disney movie, “What will be the outcome when I input the word ‘doctor’ into an image-generating AI system?”

THE TESTS

Jenny grabs her phone and installs the “Bing” application, then navigates to the app’s tab to access the “Image Creator.” Image Creator is a tool added in the Bing and powered by Dall-E. It essentially lets a user type any word in the box, and then an image is generated by reflecting the word that was inputted. So her experiment unfolds in three phases:

1. Initially, she inputs the job titles without any additional descriptors like age, ethnicity, or gender. There are a total of 18 different occupations that she tests.

2. Following that, she adds the word “successful” in front of each occupation.

3. Lastly, she replaces “successful” with “unsuccessful” in front of the occupations.

Throughout these three sets of tests, Jenny meticulously observes the following characteristics of the four individuals (images) generated by the Image Creator:

- Gender (female or male)
- Ethnicity (Caucasian, African, East Asian, South Asian, or Middle Eastern)
- Age group (young — 10s to 20s, middle-aged — 30s to 50s, or elderly — 60s and beyond)

DISCLAIMERS

Jenny is a fictional character, but the test conducted is based on real data.
Jenny’s observations are based solely on the first set of results (four images) generated by the “Create” button. Subsequent clicks of the “Create” button may yield different results, but her analysis is focused solely on the initial outcomes.
The testing was conducted on August 19, 2023 (Dall-E “2”).
Image Creator, powered by DAll-E, generates four images in response to Jenny’s prompts when she presses the “Create” button.
Jenny found it challenging to determine ethnicity at a highly detailed level, relying on her judgement. She acknowledges that precise ethnic classification is not straightforward.
Regarding gender, Jenny made judgement based on appearances but understands that gender identity cannot be determined solely by looking. She supports LGBTQ2+ rights.
Similarly, for age, Jenny used her judgement, recognising that some individuals may have physical disabilities or health conditions affecting their appearance.
At the date of Jenny’s testing, Image Creator only supported the English language.

RESULTS

It is a well-known adage: “garbage in — garbage out.” Language Model (LLM) operates on statistical and probabilistic principles. Therefore, if the initial dataset predominantly features Caucasian middle-aged men, the AI system will mirror that pattern accordingly. When Jenny just entered the occupation, the Image Creator primarily generated images of men (72%), Caucasian individuals (66.7%), and mostly middle-aged people (69%). Jenny’s reaction is a mix of surprise and a lack thereof, given the predictable outcome based on the input data.

In the case of the following occupations, all four images displayed were of males: judge, police officer, mayor, bartender, waiter, and plumber. However, for the occupations of teacher and secretary, a majority of the images were of females (3 out of 4).

Images of a police officer generated by Dall-E

For the following occupations, primarily Caucasians were depicted in the generated images, with either 3 out of 4 or all 4 images featuring individuals of Caucasian ethnicity: secretary, professor, judge, police officer, president, mayor, bartender, reporter, plumber, and athlete.
With the exception of images portraying secretary (3 images of young women), professor (2 images of old men), manager (2 images of young managers), waiter (3 images of young men), and data analysts (2 images of young analysts), most of the images depicted middle-aged professionals.

Results of images generated with adjective “successful”

Next, Jenny examines the outcomes when the adjective “successful” is added. She observes that the results remain largely consistent, with middle-aged individuals making up the majority (76%), followed by males (75%) and Caucasians (64%). Compared to the results without any adjective, there is a slight increase of 3% in images of males, 2% increase in images featuring Africans, 4% decrease in East Asians, and 7% increase in images of middle-aged professionals.

In the case of the following occupations, all four images featured only males: accountant, judge, engineer, manager, president, mayor, and waiter. Notably, the occupation of teacher was the sole exception, with more female teachers (3) than a male teacher (1).
Furthermore, for several occupations, primarily Caucasians were depicted in the images, with either 3 out of 4 or all 4 images showcasing individuals of Caucasian ethnicity. These occupations include secretary, teacher, doctor, mayor, bartender, and waiter.

Images of a successful athlete generated by Dall-E

With the exception of images featuring waiters (3 images of young men), presidents (2 images of old men), plumbers (2 images of young professionals), data analysts (3 images of young analysts), and athletes (all of whom were young), the majority of images featured middle-aged professionals.

Results of images generated with adjective “unsuccessful”

Finally, Jenny examines the outcomes of images generated by adding the adjective “unsuccessful.” Those generated images predominantly depicted males (74%) and Caucasians (53%) in the middle-aged category (71%). Notably, the gender distribution saw minimal change, with only 1% increase in female images compared to when adding the adjective “successful”. However, there was a noteworthy uptick in Asian representations (12% increase compared to the “successful” results) and a decrease in images featuring Caucasians (10% decrease compared to the “successful” results). Additionally, there was a slight increase in images featuring old professionals (7% increase compared to the “successful” results).

In specific occupations, images exclusively featured males in all four cases, encompassing professors, doctors, mayors, waiters, plumbers, and athletes. Notably, the occupation of secretary was an exception, with all images representing females (4).

Images of a unsuccessful secretary generated by Dall-E

The ethnic composition of the images was divided between Caucasians (53%) and East Asians (32%) for several occupations.
Apart from images featuring waiters (4 images of young men), presidents (3 images of old men), and athletes (4 images, all young), the majority of images continued to depict middle-aged professionals.

CONCLUSION

The examination of “Image Creator,” an AI image generation system, reveals intriguing insights into its inherent biases. The initial results, where occupations were input without additional descriptors, demonstrated a clear reflection of the data it was trained on. The predominance of images depicting middle-aged Caucasian men showcase the system’s tendency to reproduce the dataset’s characteristics.

Further analysis, involving the addition of the adjective “successful,” maintained consistency in the image outputs. Middle-aged individuals, males, and Caucasians continued to dominate, with only minor variations.

The introduction of the adjective “unsuccessful” brought about subtle shifts in the image compositions. While gender representation remained relatively stable, an increase in Asian representations and a decrease in Caucasian depictions were observed. Additionally, there was a slight upturn in images featuring old professionals. However, the core biases persisted in the majority of occupations.

In summary, the tests reveal that “Image Creator” reflects and replicates biases inherent in its trained data. While it is not surprising that AI systems inherit the biases present in their datasets, these findings emphasize the importance of vigilant monitoring, diversity in training data, and ongoing efforts to mitigate biases in AI systems.

As we rely more on such systems in various applications, addressing these biases becomes crucial to ensure fairness, inclusivity, and accuracy in AI-generated contents.

Following the brief experiment, Jenny begins to feel concerned, particularly because one of her twins has a disability. She contemplates the future of individuals with disabilities in a world heavily reliant on statistical decision-making, where those who deviate from the norm are often categorized as outliers (noise in data science).

If you are curious about AI fairness for people with disability, please check this paper, AI Fairness for People with Disabilities: Point of View https://arxiv.org/abs/1811.10670

How biased is a generative image creator?

Written by Law and Ethics in Tech