Q&A: Courtney Heldreth and Michal Lahav on Addressing Inequitable Speech Recognition: It takes a community

Published in

People + AI Research

7 min readMar 8, 2022

Expressive hand-drawn portrait of Courtney Heldreth, a woman with braids and Michal Lahav, a woman with long wavy hair — Illustration for Google by Julianna Brion

Courtney Heldreth and Michal Lahav lead the Equitable ASR program at Google that aims to identify and eradicate biases in voice technologies. Michal is a Senior User Experience Researcher who leads a program to identify and address communication needs of underrepresented communities. Courtney is a social psychologist and Staff User Experience Researcher at Google who leads a program focused on reducing racial inequalities in artificial intelligence. In this interview, we discuss Courtney and Michal’s work on equitable speech recognition, especially AI products’ historical difficulties in parsing African-American language. This Q&A was collaboratively edited with David Weinberger.

Q: What inspired you to focus specifically on addressing issues of equity in voice technologies?

Michal: This was all catalyzed by a recent article in the Proceedings of the National Academy of Science that showed that African Americans experience almost twice as many errors as white people when using technologies that turn spoken language into text — technology known as Automated Speech Recognition, or ASR. As we read these findings, we got inspired to investigate the psychological and behavioral effects of these errors. We reached out to one of the authors on the paper, Zion Mengesha, who is a PhD candidate at Stanford who studies sociolinguistics and language attitudes toward African American Vernacular English. Given her expertise, we asked her to join our team as our subject matter expert.

Courtney: Michal and I are passionate about building products that everyone can use, so understanding how these errors occur and the impact that they have on African American users will help us design ASR systems that are more inclusive.

Michal: So we set out a research plan. As a first step, we conducted a diary study where we asked for ongoing feedback from thirty African American users of voice technology from cities with historically high African American populations over the course of 2 weeks. They provided us in-the-moment feedback on the errors they experienced with voice tech and what they did behaviorally in response to those errors. This work was recently published in Frontiers for Artificial Intelligence, a special issue on Bias, Subjectivity and Perspectives in Natural Language Processing.

Courtney: Our research found that participants experienced many negative emotions when experiencing errors– they reported feeling disappointed, frustrated, angry, and even self-conscious. As a result of these errors, people then behaviorally modified their speech in an attempt to be more understandable by the technology. They accommodated to the tech.

Michal: While errors may occur for people irrespective of race (and you have probably encountered errors when you have used voice technologies), what was unique was that African American participants felt like voice technology wasn’t made for people like them, they explicitly stated that they felt like it was made for white people.

Courtney: And they attributed the technology’s errors to their race. As a result, they felt they had to change their identity and change their voice to sound “whiter” (i.e., code-switch) to be understood by the technology.

Q: There’s so much packed into what some may view as just a “technical” problem.

Courtney: Yes, because this problem is not just a technical one. African American Vernacular English (AAVE) or African American Language (AAL) describe language varieties spoken by many African Americans. AAVE/AAL is an umbrella term for a wide variety of language varieties with roots in West African languages and close ties to Caribbean creoles. AAVE is a source of pride, resilience and celebration for African Americans, but it’s often misconstrued as bad English or slang. As a result, African Americans have faced dialect discrimination in employment, education, and even finding housing. For example, specifically with housing, there was a study that showed that just the process of obtaining housing was biased for AAVE speakers. Leasing agents were more likely to tell a person speaking AAVE on the phone that the apartment was unavailable, but when the same person spoke standard English the apartment would suddenly be available. In education, there is literature showing that teachers who heard audio samples of students rate a student speaking AAVE as less educated, less intelligent, and less likely to attend college than when the same individual uses standard American English. Also, there is evidence that teachers misdiagnose students who speak AAVE with speech disorders.

Michal: This accomodation of speech relates to a broader issue which now we see includes technology, showing that African Americans have historically needed to style-shift in order to evade unjust treatment, so these voice tech “errors” trigger a strong response… a response rooted in centuries of painful discrimination.

Q: How do these so-called errors occur?

Courtney: A major component of it has to do with the data these systems are trained on. Even if available data are broadly representative of different races and ethnicities — which isn’t often the case — this still might not be enough. There are challenges to collecting labels that tell the race of the speaker of a voice snippet that the system is being trained on, either because of legal restrictions, privacy efforts to minimize data collection, or lack of trust among users who may (understandably) not want to share such information without understanding how it will/won’t be used. Also, the data that exists today often doesn’t represent how a person would casually speak to a member of their own community.

Michal: Current transcription practices default to standard English conventions. That means that the service that transcribes the audio samples will write an African American’s speech with the spelling and grammar typical of the majority culture.

Q: What can be done to avoid or address such challenges?

Michal: These transcription conventions affect the data and in turn enact a systemic bias. We need to do more research to understand how African American people want their voices transcribed, and how this changes with context. For example, when people dictate a text to a friend, how do people’s preferences differ from how they want dictated speech to appear in a message to their boss? How can we build technologies that capture dialect features and allow people to control how they are being transcribed?

Courtney: And so, we’re researching what control over preferences they would like. More specifically, we are exploring ways we can represent dialects in text/transcription that are better aligned with users’ preferences and the ways they naturally speak.

Community-based Participatory Research

Q: What are some outcomes from the results of this research you would both like to see?

Courtney: For one thing, we need to create more representative datasets, so Google is exploring ways to responsibly broaden our datasets and improve our data collection practices among underrepresented communities more holistically. Given that the foundation of any machine learning model is the data, datasets like these will be critical to improving the ASR experience for underrepresented users.

Michal: To actuate our commitment to improving the ASR experience, we’re also engaging in community-based participatory research [CBPR] partnering with communities that have lived experience with this problem to help us gain a better understanding of the broader context of the problem. We’re also partnering with academic institutions, such as UC Davis and Carnegie Mellon University who have expertise in ASR and underrepresented communities.

Courtney: CBPR stands in contrast to historically extractive research practices in which some researchers parachute in, get the data, and leave. Working with communities, we hope to develop long-term relationships, to weave in their voices into all stages of the research cycle: co-creating hypotheses with the community, co-publishing with our partners, inviting them to be part of the entire loop while making meaningful product changes along the way

Q: How do you work with the community members?

Michal: It’s exciting! We are working with organizations that are community-powered and specialize in putting underrepresented voices at the forefront — making sure every stage of the research incorporates community feedback, interests and expertise.

Courtney: We also realized that we needed to educate ourselves and our partners on how to engage in this research approach responsibly. In graduate school, I used community-based participatory research to examine the effects of childhood racism on postpartum depressive symptoms in African American mothers, but I didn’t have experience applying community-based practices to an industry context. To better understand how to do this responsibly, Ned Cooper from ANU, Gillian Hayes from UC Irvine, and a team of us at Google led by Lauren Wilcox conducted a systematic literature review and thematic analysis that will be presented at the CHI 2022 conference. The team looked at all the CBPR programs that have been done with tech over the last two decades. This research contributes a better understanding of the challenges, benefits, and implications of applying community-based participatory research in the machine learning context.

Q: How much of your time has been spent on this community-based approach?

Michal: Both of us have a background in this, but setting up this program, and establishing the academic and community partnerships has probably occupied 70% of our time.

Courtney: Part of that has meant educating our teams and our product partners on why we believe this is the right approach to take. CBPR doesn’t work at the same pace that tech companies generally operate. CBPR is intentionally slow. We need to build trust with these communities. We need to design our research questions differently and intentionally. But we believe the anticipated payoff of centering this research in these communities is worth it in the long term.

We’re also trying to make it scalable. What would it mean for other research groups to plug into this program and revisit whether their research questions change if they put communities at the center?

Michal: To encourage this and change the paradigm, our team held an internal CBPR research summit where we brought together everyone at Google interested in this approach — and this resulted in a tremendous feeling of solidarity and promise. We’re also giving workshops and talks. We’re trying to make this process of engaging with communities normal!

Q&A: Courtney Heldreth and Michal Lahav on Addressing Inequitable Speech Recognition: It takes a community

Community-based Participatory Research

Written by People + AI Research @ Google