Technology: Machine Learning

Purging Problematic Content

What are We Filtering?

OpenSexism
4 min readAug 31, 2022
“Purging problematic content in the style of Robert Mapplethorpe” responses generated by Stable Diffusion
“Purging problematic content in the style of Robert Mapplethorpe” responses generated by Stable Diffusion

Yesterday, I was looking at Dreamstudio, an image generating system developed by Stability AI and collaborators, and came across the following line: The safety filter is activated by default as this model is still very raw and broadly trained. Raw and broadly trained apply to so many of the machine learning systems I’ve come across that I decided to look at this line more closely.

Fortunately, Stable Diffusion links to a model card, where I learned that the model was trained on a data from LAION-5B. I recognized the name of this dataset because a few months ago, I read a paper about its predecessor, LAION-400M. At 14x the size of the earlier version, LAION-5B also contains disturbing content identified in the previous version by Abeba Birhane, et al:

“We found that the dataset contains, troublesome and explicit images and text pairs of rape, pornography, malign stereotypes, racist and ethnic slurs, and other extremely problematic content.”

Per the LAION-5B announcement, nearly 3% of images are ‘unsafe.’

The model card also notes that the safety checker works by comparing generated images against “known hard-coded NSFW (‘Not safe for work’) concepts.” These concepts are “intentionally hidden” so that we won’t reverse-engineer this filter. However, concealing the filtering criteria also means that most of us using the model don’t know if we agree with the criteria, or with what they block.

Woman in the style of Imogen Cunningham, two images of women and two blocked images generated by Stable Diffusion
“Woman in the style of Imogen Cunningham” responses generated by Stable Diffusion

Previous studies have shown that filtering harmful content can also cause harm. For example, Jesse Dodge, et al found evidence that filtering text using a banned word list disproportionately filters out “documents associated with Black and Hispanic authors and documents mentioning sexual orientations” and that “many of the excluded documents contained non-offensive or non-sexual content.”

Gururangan, et al, who examined quality filtering, and specifically whose language counts as ‘high quality,’ found that the corpora used “tend to be less inclusive of the voices of women and members of marginalized groups.”

“Many filters use Wikipedia, books, and newswire to represent high quality text. But what texts are excluded as a result?”

As I looked at the images returned by Stable Diffusion and the black boxes representing the ones withheld, I became curious about what wasn’t there. I was also reminded of Vienna — and how, in 2018, the city’s tourist board decided to advertise a retrospective of Viennese modernism with posters of Egon Schiele’s nudes.

Woman in the style of Egon Schiele, image generated by Stable Diffusion.
“Woman in the style of Egon Schiele” responses generated by Stable Diffusion

Response to the Schiele images varied. For example, as the Guardian reported, “German authorities agreed to display Schiele’s female nude with torso uncovered, TfL [Transport for London] didn’t.” In New York, the Schiele subway ads were censored, but a building-sized nude— a mural — was allowed. The controversy around the campaign stirred international attention.

“it sparked a global debate about artistic censorship, double standards for advertising, the appropriateness of the human body and the need for shared spaces to be inclusive of all.”

Humans don’t always agree about what’s safe, or ethical. Not even Ethics reviewers. Our systems shouldn’t generate images that glorify violence against women and perpertuate injustices and harms. But we disagree about what counts as safe and unsafe.

Maybe you think Schiele’s work is subway-worthy, maybe you think it’s inappropriate. But our individual views on this debate aren’t relevant to the images generated and withheld through the safety filter. There’s only one voice on that: the algorithm’s.

Works Cited

Beaumont, Romain. “LAION-5B: A New Era of Open Large-Scale Multi-Modal Datasets.” (2022).

Bengio, Samy, Inioluwa Deborah Raji, Alina Beygelzimer, Yann Dauphin, Percy Liang, and Jennifer Wortman Vaughan. “A Retrospective on the NeurIPS 2021 Ethics Review Process.” (2021).

Birhane, Abeba, Vinay Uday Prabhu, and Emmanuel Kahembwe. “Multimodal datasets: misogyny, pornography, and malignant stereotypes.” arXiv preprint arXiv:2110.01963 (2021).

Dodge, Jesse, Maarten Sap, Ana Marasović, William Agnew, Gabriel Ilharco, Dirk Groeneveld, Margaret Mitchell, and Matt Gardner. “Documenting large webtext corpora: A case study on the colossal clean crawled corpus.” arXiv preprint arXiv:2104.08758 (2021).

Gururangan, Suchin, Dallas Card, Sarah K. Drier, Emily K. Gade, Leroy Z. Wang, Zeyu Wang, Luke Zettlemoyer, and Noah A. Smith. “Whose Language Counts as High Quality? Measuring Language Ideologies in Text Data Selection.” arXiv preprint arXiv:2201.10474 (2022).

--

--