Scaling machine learning fairness with societal context

Published in

Jigsaw

4 min readDec 13, 2022

Since our launch in 2017, our priority for Perspective — a free API that uses machine learning to identify harmful, unreasonable, offensive, or otherwise “toxic” comments — has been to make our machine learning models more accessible and useful to people all over the world. Part of that work means ensuring that although these models will inevitably make some mistakes, these errors do not disproportionately impact marginalized communities. Over the years, we have actively worked to reduce unintended bias in our models, through dataset augmentation and balancing. More recently, we’ve partnered with Google’s Societal Context Understanding Tools and Solutions (SCOUTS) research team to expand the number of identity terms relating to categories such as ethnicity, religion, age, gender, or sexual orientation, used in these processes. We’ve found that the combination of these efforts has significantly improved the fairness of our models and helped us continue to stay true to the mission of our work.

The challenges of scaling proactive bias mitigation

Last Fall, we released the Sentence Templates dataset, which we use to test Perspective API for unintended bias on terms that can have both toxic and neutral connotations depending on context. For instance, terms for often marginalized identity groups — including “gay” and “muslim” — can appear in toxic and aggressive comments but also be neutral descriptors depending on how they are used. We put different identity terms into predefined toxic and nontoxic sentence templates to evaluate how the model does for each identity term and use the same set of identity terms to supplement and balance our datasets. By using data balancing prior to model training, and bias metrics to evaluate progress, we’re able to reduce unintended model bias towards comments with those identity terms.

Creating identity term lists for this effort proved to be a significant barrier to adding new languages to Perspective. List curation is hard to scale: it is a time consuming, manual process that sometimes reflects the curator’s own biases. Furthermore, these lists need to be customized by a domain expert for each language and locale. This limited the size of the identity term lists in all languages Perspective supports, which constrained coverage of bias evaluation and mitigation. We were able to better support more languages and scale our lists through a more automated process, which became possible through our collaboration with the SCOUTS team.

Partnering with SCOUTS to improve identity term coverage and selection bias

For the past three years, The SCOUTS team within Google’s Responsible AI and Human Centered Technology organization has been working to gather and understand the broader context associated with identity terms. Their Societal Context Repository (SCR) is a continuously updated database with significant coverage of terms across multiple identity facets (e.g. ethnicity, religious belief, age, profession, gender identity, sexual orientation, etc.). The database, used internally at Google, provides additional context like connotations (e.g. neutral, pejorative, etc.) of each identity term and whether the term can have a prevalent non-identity meaning. Selection bias is minimized through this process since the terms and associated context in the SCR are collected based on peer and community-reviewed ontology, methodology, and principles. We recognized that by augmenting our existing limited list of terms with a list constructed using the SCR, we could potentially achieve a number of positive outcomes: increase the coverage and freshness of terms used for bias mitigation, reduce selection bias that results from the curation process, and most importantly, increase the scalability of the entire process for both existing and new languages, because the ability to automate and modify database queries requires less human involvement.

In addition to our experiments with the SCR, we also tested an additional hypothesis: that mitigation for unfair bias can be more effective when applied in the pre-training step — the phase where the machine learning model learns about language generally — instead of later in the fine-tuning step — where the model specifically learns about “toxic” language. Until this point, we had only performed bias mitigation at the fine-tuning step. Testing this hypothesis offered hope that adding bias mitigation during the earlier phases of training would further reduce unintended bias.

Results: Improved bias evaluation at scale

Each of the experiments showed improvements in the areas tested. We also trained a new machine learning model with the most optimal experimental parameters, using the SCR and early stage mitigation for unfair bias. For this model, the overall AUC-ROC scores, which describe the model’s performance on datasets from 0 to 1, improved for a number of tests including Dutch (improved by 0.07) and English (improved by 0.01, based on an already quite high performance). For Dutch, bias evaluation improvements were especially pronounced for terms in sexual orientation and disability categories, which improved by 0.14 and 0.16 respectively, using the BPSN AUC metric, indicating that the newer model is less biased for these terms.

We remain excited about these new advancements to Perspective and will continue exploring additional applications of the SCR, starting by expanding the number of languages that the tool can support. We also plan to research how this data can help Perspective detect more nuanced discussions about identity, for example, where conversation participants use in-group jargon.

Reducing unintended model bias is a continuous process. While we believe the latest advancements to improve bias evaluation will significantly improve the safety of online conversations, we know there’s more to be done. We remain steadfast and focused on ensuring that everyone can use Perspective’s models effectively to promote safe online discussions all over the world.

Contributors: Rachel Rosen, Alyssa Chvasta, Lucy Vasserman, Sameer Sethi, Emmanuel Klu, Donald Martin

Scaling machine learning fairness with societal context

Written by Jigsaw