New Media, Old Problems: Racial Stereotypes in AI Image Generation
Like many people, I’ve spent the last few months playing with the new wave of consumer-oriented (what I call “over-the-counter”) AI tools like ChatGPT. As an artist, I’ve been especially fascinated by AI image generators such as Dall-E, Midjourney, and Stable Diffusion, which have gone from fantasy to photorealism in just a year.
Because these models are trained on existing images, they offer a kind of meta-narrative on the way we create, consume, and analyze images as a society — highlighting patterns, perceptions, and biases in interesting ways.
For example, I was generating images of members of Congress in Midjourney— not specific members, but what AI thought a member of Congress would look like. I did this partly as a reflection on my frustration with the age of Congress and their disconnect from the impacts of new technologies, but in a lot of ways these images are a perfect dataset to work with: there are many of them and they are nearly identical in style — both in composition and subject. This means the results can be fairly predictable.
Initially it was purely humorous — the subtle change of making them hold an object such as a cat or rock quickly makes these utilitarian portraits absurd (unfortunately “holding a gun” does not look so absurd).
As I was generating these, I quickly noticed that I was getting primarily white-looking men. If we can think of AI largely as a summary of existing media, this shouldn’t be surprising: while the last two congresses have been the most diverse ever, they are still mostly white men. If AI is meant to reflect a reality, we should expect about 25% of the images it generates of congress members to be of either women or racial minorities — much higher than what the model was returning.
It turns out the diversity of representation is contingent on any additional context that is placed in the prompt. For example, some contexts are gendered in obvious ways, like “holding a purse,” which returns almost exclusively images of women. But some are gendered in less-obvious ways. The term “congressperson” generated more women than “member of congress,” suggesting that the term is more likely to be used by women than men.
But out of all of the prompts I tried, Black people were largely absent, even though they are the largest racial minority in Congress. The most consistent way for Midjourney to generate black members of Congress? Ask it to make them hold buckets of fried chicken. Another way was to ask them to wear a basketball jersey (curiously basketball jerseys were not even in most of the photos.)
This isn’t the only bias I found: other prompts, such as “interracial couple,” return nearly exclusively images of a Black man with a white woman. The prompt “gay couple” returns nearly exclusively images of two young white men (“boyfriend twins,” as they’re called.)
Of course these models reflect the data they are given, and so it’s obvious that existing stereotypes and biases will be reflected. It could be that these models are generated on older images, excluding the more recent and more diverse congresses. The problem, however, is that these images exist in the present, and are often thought of as the future.
As Safiya Umoja Noble, whose landmark book Algorithms of Oppression shows how so-called “neutral” search engines perpetuate racial stereotypes, says:
“Knowledge management reflects the same social biases that exist in society, because human beings are at the epicenter of information curation. These practices of the past are part of the present, and only committed and protracted investments in repairing knowledge stores to reflect and recenter all communities can cause a shift toward equality and inclusion in the future. This includes reconciling our brutal past rather than obscuring or minimizing it. In this way, we have yet to fully confront our histories and reconstitute libraries and museums toward reconciliation and reparation.”
That someone, even at the highest levels of government, can still be reduced to a stereotype based on their skin color should be seen as a failure of the model — a model I believe should be thought of as aspirational and not simply reflective of the present.
The strength of generative images is not that they can replace existing image making tools like the camera, but that they can create new types of images. They will allow us to imagine new futures and new ways of being. But, when these models begin training themselves — creating generated images based on generated images — any existing biases in the model will create a feedback loop, perpetuating societal flaws that should be abandoned.
Without knowledge of how these models were trained, or even how they work, there is no way to understand why this is happening, or what needs to change in order to stop the perpetuation of stereotypes. In Midjourney’s terms of service they say “we are not a democracy.” Maybe they should be.
Ryan Aasen is an artist, educator, and researcher broadly interested in the politics of media technologies. He has taught art, design, and technology courses at MIT, Parsons School of Design, and Stevens Institute of Technology. Follow him on Instagram for more tech interrogations.