Q&A: Mark Díaz on Intersectional Thinking for ML Fairness

Published in

People + AI Research

6 min readJun 1, 2022

Illustrated portrait of Mark Díaz, an Afrolatino man in his early 30’s with medium-length black curly hair and a beard, wearing pink translucent glasses. — *Illustration for Google by* *Harry Woodgate*

Mark Díaz is a Research Scientist in Google’s Responsible AI organization. He works on ways of accounting for social bias in dataset development and human subjectivity in data labeling. This Q&A has been collaboratively edited with David Weinberger.

Q: Let’s start from the beginning: What is intersectionality and why does it matter?

Mark: I can describe myself as a man, gay, Latino, or Black, each as a separate term. But it would be impossible to fully understand my social experiences as, say, a gay person while ignoring how my other identities also shape these experiences. My experience in the world is shaped by how these identities interplay and in particular how people respond and react to these identities altogether.

Intersectionality is a way of understanding what all those social identities mean when it comes to structural oppression. The inspiration for intersectionality came to Kimberlé Crenshaw when she was examining legal cases in which Black women were discriminated against in their workplace not just because they were Black or women, but because they were Black women. The combination of social identities can create a type of discrimination that you would you miss if you considered them apart.

Q: Why does this matter to machine learning (ML) and machine learning models?

Mark: For information to be legible to a machine learning system, it has to be quantified. But social experience is often difficult to pin down and quantify.

For example, when I was growing up, the Census questionnaires made me pick either Black or Latino as my race. There was no “and” allowed. From a dataset perspective, that produces information about people that’s artificially separated, and makes it hard to understand intragroup differences among people who might share a category. That artificial restriction on data can lead to machine learning casting a distorting lens on intersectional problems.

But with the right data, machine learning can be a powerful tool for intersectional analyses. After all, machine learning is about discovering patterns, and with intersectional data it could reveal important and unexpected patterns.

I wouldn’t ever say that an ML algorithm can learn all the nuances of intersectional social experience, but with the right data it can help us to think about things through the lens of intersectionality.

Q: When you say “with the right data,” what do you have in mind?

Mark: That’s a huge challenge. There’s been great work by researchers that highlight intersectional approaches. For example, Joy Buolamwini and Timnit Gebru conducted an analysis of skin tone representation in standard image datasets used to evaluate the performance of facial analysis algorithms, such as those used to detect faces in images or classify gender. When evaluating several algorithms used for gender classification, they found disparities in algorithm accuracy between women and men, as well as between people with darker and lighter skin tones. Accuracy was worst for dark-skinned women. As another example, Alex Hanna, Emily Denton, Andrew Smart, and Jamila Smith-Loud offered suggestions about including disaggregated data that lets you analyze subcategories of other categories. The idea is to include subcategories within broader categories you are interested in (e.g., race, gender, religion, etc.) so that you can reveal differences or trends among subgroups.

Q: How does that help?

Mark: Typically metrics are averaged across groups, but that misses important differences. An example I often give is looking at socio-economic status metrics for Asians in the United States. Often data is used to describe the broad category of “Asian”, but that misses variations in the economic realities across groups. For example, Hmong Americans and other Southeast Asians tend to have lower incomes and face different challenges than other Asian Americans, but that wouldn’t be obvious from looking at an averaged metric. Including subgroups enables you to check if there’s representation across different subcategories. It also helps with fairness testing, and understanding how systems work when they have different categories as inputs.

Q: How do you pick the categories?

Mark: That’s one of the golden questions. When I think about this, I try to start with: Where is this system going to be used? Why is it going to be used? Who are the people who stand to be most impacted if the system doesn’t work properly?

Q: How does a team figure that out?

Mark: Often it’s a matter of learning from past failures or iterating over different versions of a model. Or when you’re testing a system, identifying weak points. At Google there’s work on making fairness testing part of routine robustness testing that checks if the system will still be able to perform consistently across a wide range of inputs. For example, a face detection algorithm should perform consistently across lighting conditions, skin tones, and eyewear. Robustness testing aims to identify a broad set of potential weaknesses, but testing how a system performs with intersectional data can help us understand if the system will work similarly with different populations. For example, it could help ensure that image recognition works well for everyone. Or imagine how helpful this could be for ensuring that a speech recognition system produces quality automated captions for different languages or dialects.

Then there’s the challenge that social identity is always in flux. For example the U.S. Census forms’ checkboxes for race and ethnicity have changed significantly over the years. And even when the boxes have stayed the same, what it means to be a member of a racial group changes over time. For example, there was a time in the US when Irish-Americans weren’t considered to be white. Or when the legally-codified “one-drop” rule meant that if you had a single Black ancestor, you were unambiguously counted as Black.

Then there are the cultural differences. If you’re doing intersectional work in India, caste is important to consider as a part of social identity and experience, but caste may not show up or make sense in data from other cultures, and if it does, it probably has very different cultural meanings and significance.

Q: So, how does a machine learning team decide which categories count?

Mark: It depends on the purposes of the machine learning project, and who it will affect.

Q: How is the development team involved in those decisions?

Mark: It’s up to the development team to decide on what data and categories to use, ideally in collaboration with experts in both technical and nontechnical fields, including members from stakeholder groups. Machine learning can actually help with analyzing categories. For example, topic modeling is a technique used to cluster things that are similar. But even then, humans have to decide how many clusters would give the best outcomes, and what the threshold should be for saying that these things have enough in common that we ought to count them as a cluster. That’s another human decision.

Q: Isn’t that a place that human bias can sneak into a model?

Mark: Inevitably yes. But machine learning developers are trying to build systems that work well enough to accomplish the task at hand and support human decision-making. While we know that we can’t build systems that perfectly represent human experience and eliminate human bias, we also know that we can audit and reduce bias within those systems.

This is true for trying to take an intersectional approach because “perfect” data doesn’t exist. All datasets have limitations. So, at a lot of turns there are very practical choices: We do have this data and we don’t have that data, so how can we best work with the data that we do have and what does that mean for which goals we should or should not pursue? Creatively addressing this is just part of the challenge.

Q: But going forward you hope that datasets will more frequently include intersectional data when appropriate?

Mark: Having an intersectional approach to creating and evaluating machine learning systems is a critical part of producing and using systems fairly, or at least more fairly. An intersectional approach allows you to identify which people, groups, or subgroups are experiencing unfair outcomes, and to see that within context. It’s super important to ask people how they identify themselves so we can make outcomes more useful and relevant to them, and so we can identify disparate treatment within larger groups. Without intersectionality, you could miss health disparities among Latino subgroups, or remain unaware of the particular forms of discrimination experienced by people who are, say, transgender, older-aged, and disabled. ML models will be less accurate, less useful, and generate less fair outcomes if their data and models don’t account for the varied aspects of people’s identities.

Q&A: Mark Díaz on Intersectional Thinking for ML Fairness

Written by People + AI Research @ Google