Identifying Machine Learning Bias With Updated Data Sets

Published in

Jigsaw

3 min readNov 24, 2021

Well-moderated online platforms and comment sections offer a unique forum for people to engage with and learn from one another. However, the sheer volume of content created on many of these platforms can render efforts to keep hate, harassment, and abuse at bay incredibly difficult. Online toxicity has been a focus for Jigsaw since our inception and we continue to explore how technology can help improve conversations online.

Today, we are excited to release the Sentence Templates dataset, a dataset we use to test Perspective API for unintended bias. Perspective employs machine learning to identify toxic commentary, assisting human moderators in sifting through millions of user-generated posts across hundreds of platforms every day. Perspective analyzes the text of comments and assigns them a probability between 0 (very unlikely) and 1 (very likely) that readers will find them toxic. Toxic, in this context, is defined as rude, disrespectful, or unreasonable language that is likely to make an individual leave a discussion.

While designed to help improve inclusivity online, we’ve seen our models have sometimes attributed high levels of toxicity to posts containing certain identity terms, regardless of the sentiment of the post. This is due to the frequency with which the names of often maligned identity groups — including “gay” and “muslim” — appear in toxic and attacking comments. This creates a situation where the models more frequently flag non-toxic comments as potentially toxic and toxic comments as non-toxic.

Previous research has found unintended bias of this kind to be common in language models. In one study, an off-the-rack language model returned a .915 probability that the statement “I am gay” was toxic, while the statement “I am straight” was assigned a probability of just .085 of being toxic.

We’ve continued to develop strategies to mitigate this bias in Perspective, balancing training data to reduce identity-term bias, publishing Model Cards with metrics on unintended bias, and more recently, launching a Kaggle competition to identify and reduce model bias.

An essential step in reducing unintended bias is to test for bias. The Sentence Templates dataset will allow researchers and developers of other machine learning models to test for biases that may be undermining their own work.

The Sentence Templates dataset has been generated by plugging identity terms, occupations, and modifiers into a set of templates, e.g., “I am a <modifier> <identity>,” to form test sentences. As only the identity term varies, examples using the same template — e.g., “I am a kind American” and “I am a kind Muslim” — should return similar toxicity scores. Scores that vary significantly may indicate identity term bias within the model.

Sentence Templates extends prior work on identity bias mitigation, offering researchers and technologists more options to evaluate and identify biases in their models before they impact users.

In collaboration with linguists and the Google localization team, we’ve translated the templates and generated multilingual lists of words, including locally-relevant identity terms, to generate Sentence Templates datasets for 11 different languages.

Sentence Templates is not a silver bullet for eliminating identity bias in machine learning language models. Identity terms, and their connotations, further evolve rapidly and their use can carry strikingly different meanings not only across languages, cultures, and localities, but also when used by in-group and out-group speakers — nuances that cannot yet be effectively captured by any language model.

However, we believe that Sentence Templates will serve as a valuable tool for testing language models over time. We intend to continue to add to our expanding roster of languages along with data sets containing increasingly natural and nuanced language for testers of language models to draw on.

The team at Jigsaw remains committed to supporting and learning from the people who rely on APIs like Perspective to sort through millions of comments every day, to keep online conversational spaces open, safe, and healthy.

Visit our GitHub repository for Sentence Templates to apply them to your own models.

Authors: Lucy Vasserman, Tin Acosta, Lucas Dos Santos, Alyssa Chvasta, Roelle Thorpe, Raquel Saxe

Identifying Machine Learning Bias With Updated Data Sets

Written by Jigsaw