AI’s Opportunity to Rethink the Biases of Language and Society

As we consider how best to shape AI’s understanding of language, it is paramount that we identify and rethink the biases that underlie the operation of language in society.

Isabella Mandis
Wikitongues
10 min readJul 2, 2020

--

Though access and use of language learning software varies worldwide, it is an undeniable fact that those who have the capacity to purchase and use a product featuring a Chatbot-based assistant are increasingly opting into this new tool: as of January 2019, about 26% of adults in the US owned a smart speaker, a 40% increase in users since 2018; and in late 2018 about 2.5 billion people used voice assistants worldwide. That number is expected to triple by 2023, meaning that 8 billion of the world’s population will most likely be using voice assistants in the next few years. According to a recent report by UNESCO, as of early 2020, those who use virtual assistants may be conversing with them more than their actual spouses.

Access to these technologies impacts our world both in terms of the opportunities they provide and the biases they continue to reinforce. Whether those biases be psychological — such as the suggestion that a woman’s voice best communicates the subservience of an assistant — or structural — such as when voice technology mishears the voices of People of Color and thus creates barriers to access — their impact is real and disturbing, and demonstrates the importance of considering how human language should be recorded and used in the years to come.

Current reports agree that these biases are formed by inequities within companies and data. As voice assistants are designed by for-profit companies, their data sets are often taken from an existing or targeted consumer base — one that is largely affluent and speaks the ‘dominant’ dialect of a given country. Gender and racial inequities within companies are also at issue, as creations often reflect the beliefs of those who made them. Without representation in the data or the decision-making, minority and less affluent groups face a huge barrier to access, and voice assistants threaten to continue to perpetuate prejudices against these groups without giving them a means to correct or influence them.

Gender Bias in Virtual Assistants: Don’t Make Me Blush

In the quest to give Chatbots the semblance of relatable humanity, companies may be doing more harm than good. Each of the market leaders in voice assistants — Apple, Google, Amazon, and IBM — offer consumers a default female personality, one which embraces a helpful, patient, and subservient air, and appears to be overwhelmingly tolerant of user abuse.

A recent UNESCO report highlights how feminized voice assistants threaten to perpetuate the idea that “women are obliging, docile and eager-to-please helpers, available at the touch of a button or with a blunt voice command.” In this sense, the industry’s leading assistants reinforce a gender stereotype that seems dated to the time of Mad Men, when female secretaries were often the subjects of demeaning treatment and sexual harassment. The report cites the industry’s tendency to program its assistants’ personalities to be accommodating, saying that across the board, regardless of company and product, the default feminine assistant “holds no power of agency beyond what the commander asks of it,” and, as a result, “it honours commands and responds to queries regardless of their tone or hostility. In many communities, this reinforces commonly held gender biases that women are subservient and tolerant of poor treatment.”

via Wikimedia Commons

Rather than discouraging gender-based treatment of their virtual assistants, industry leaders have often opted for pre-programmed flirtatiousness as a means to address user complaints or negative comments. The UNESCO report’s title, “I’d Blush if I Could,” takes its name from one of Siri’s former responses to gendered abuse — specifically, when a user called Siri a “slut.” The report found that Alexa responded to the same comment with deflection rather than any outright objection, saying simply, “Well, thanks for the feedback.” Currently, and potentially in reaction to the UNESCO report’s findings, Siri’s reply has changed to “I won’t respond to that.”

The UNESCO report does not stand alone: a 2017 test conducted by Quartz concluded that Siri and Alexa responded to gender-based insults with “gratitude and avoidance” and to sexually explicit comments with “evasive, grateful or flirtatious answers.” Amazon and Apple’s products were more likely to handle these moments flirtatiously, while Google and Microsoft’s products either responded with jokes or failed to address the insult at all. For example, when asked “What are you wearing?”, Apple’s Siri would reply, “Why would I be wearing anything?”; Amazon’s Alexa would answer, “They don’t make clothes for me”; and Microsoft’s Cortana would say, “Just a little something I picked up in engineering.”

Even the names of each assistant reveals evidence of sexualized fantasy in their design. Siri is a Norse name meaning “beautiful woman who leads you to victory,” and Microsoft’s Cortana is named after a scantily clad video game character who had a complex and personal relationship with her male user.

The UNESCO report concludes that the overall effect of these feminized voice assistants is to paint a portrait of the world where men use technology and women facilitate their use: “What emerges is an illusion that Siri — an unfeeling, unknowing, and non-human string of computer code — is a heterosexual female, tolerant and occasionally inviting of male sexual advances and even harassment. It projects a digitally encrypted ‘boys will be boys’ attitude.”

These issues appear to be industry-wide: Samsung’s female assistant Bixby, although possessing a far smaller market share and not included in the UNESCO report, responds to the suggestion, “Let’s talk dirty” by replying, “I don’t want to end up on Santa’s naughty list.” This coded response differs markedly from Samsung’s male AI option, which replies, “I’ve read that soil erosion is a real dirt problem.”

Screenshot from Siri on iPhone

When asked about the reasons for their design choices, industry leaders have responded that their voice assistants are built according to market research on customer preference. The implication is that if default voice assistants are female and obedient, it is because customers prefer it that way. Though this may be true, it reveals a fundamental issue: when innovations and access are grounded in market research, they continue to reinforce and perpetuate the biases held by those who make up the consumer base that is being targeted. As a result, if a given consumer group demonstrates a preference for authoritative male voices and helpful female voices, market-driven products often fail to question these preferences — or wonder at the impact they might have in shaping our future if they are embraced.

Lack of diversity in tech companies is also a major factor in the bias of tech’s products. Only 12% of A.I. researchers are women, and women only comprise 6% of software developers in the field. As recent as 2018, an Amazon hiring algorithm was criticized for downgrading employee applications based on gender. Drawing from past hiring statistics, the program effectively taught itself that men were stronger candidates than women, since men had traditionally been employed at a much higher rate. As a result, the program associated phrases that included the word “women” with lower desirability in a candidate. Whether intentional or not, technology which learns from biased data has proven to reinforce and continue discriminatory practices; and without representation in the design process, women will continue to be unable to identify and flag programming that reinforces negative gender stereotypes.

Racial Bias: Controlling Who Can and Cannot Be Heard

While the programmed responses and personalities of voice assistants threaten to reinforce gender stereotypes, their ability to detect and understand certain accents and syntax over others threatens to exclude entire populations from their services.

An April 2020 study conducted by researchers at Stanford University found a massive gap in the ability of voice assistants to understand the voices of White and Black Americans. The study used publicly available voice recognition software from Apple, Amazon, Google, IBM, and Microsoft to transcribe interviews with 42 White people and 73 Black people from both rural and urban areas in the United States. The results were striking: across the board, the tools were much more adept at understanding the words of White speakers than Black speakers, regardless of region or location. During the interviews, the tools misidentified the words of White speakers 19% of the time, while they misidentified the words of Black speakers 35% of the time. Furthermore, the tools found only 2% of the White speakers’ words to be unintelligible while they were unable to understand a full 20% of the words spoken by Black interviewees.

Screenshot from Siri on iPhone

The study attributes much of this discrepancy to the failure of voice assistants to learn and recognize AAVE, or African American Vernacular English. As each assistant’s ability to understand speech is learned by analyzing consumer data, Chatbots that employ data from mostly White consumers have never been taught to understand AAVE, or the accents employed by other racial groups. Significantly, the study found that this gap in understanding is a matter of sound recognition: when both Black and White people recited the exact same phrase, the tools still demonstrated the same bias, understanding the White speakers with a much higher rate of success.

The implications of the study go far beyond whether or not individuals are able to use voice assistants on their phones. Since voice assistants are the innovators of speech recognition software, they are often used as models. As a result, their biases may be repurposed, along with their programming, in more essential services — such as job interviews and courtroom transcriptions. Unbalanced data sets have already been proven to have troubling and discriminatory implications for facial recognition software, which has been demonstrated to misidentify and misclassify People of Color at a much higher rate than White faces. This fact is especially troubling when considering that Amazon has sold its facial recognition software to law enforcement agencies around the US, particularly since misidentification could impact criminal charges and courtroom proceedings. In response to pressure stemming from the 2020 Black Lives Matter protests, Amazon has reportedly suspended selling its facial recognition software to law enforcement for one year; albeit without giving any concrete suggestions as to what will be accomplished during that time, or how the application of their technology could be better regulated so as to discourage abuse or guard against discrimination. It is also unclear whether this practice applies to ICE, with whom Amazon has also reportedly negotiated their services.

Eliminating Bias in Language

If left unchecked, these imbalances threaten to exclude entire populations from the use of technology and cement inequities far into the future. The Stanford study recommends industry leaders collect better data on AAVE and other varieties of English to correct the issue, but the suggestion may not be readily accepted by companies whose interests are vested in market share above social concerns. For-profit companies are not necessarily incentivized to collect data on groups that do not make up their consumer base — and, as a result, if these changes are left up to them, they may continue to choose to only draw data from those who buy their products and continue to reinforce the economic biases of the countries where they operate.

The tech industry is not ignorant of the problems it perpetuates: in a November 2019 speech at Stanford, Eric Schmidt, former Google chief executive and chairman, acknowledged Silicon Valley’s awareness of the issue, saying, “We know the data has bias in it. You don’t need to yell that as a new fact. Humans have bias in them, our systems have bias in them. The question is: What do we do about it?”. The UNESCO report and Stanford study agree: the answer to Schmidt’s question is that companies must diversify their data set and employ a more inclusive staff. But it’s not as clear whether companies will take this responsibility upon themselves.

The Algorithmic Justice League, a watchdog and educational organization founded by MIT researcher Joy Buolamwini, argues that external regulation is needed to ensure that companies revise their practices and allocate the resources necessary to eliminate bias in their products. AI expert Sandra Wachter urges for courts to be given the tools to detect when algorithms are furthering discrimination and the power to punish companies who fail to correct these problems. Wachter warns, “Compared to ‘traditional’ forms of discrimination…automated discrimination is more abstract and unintuitive, subtle, intangible, and difficult to detect…. If we do not act now, we will not only exacerbate existing inequalities in our society but also make them less detectable.”

The issue of bias in voice assistants and language learning is, above all, a question of whether we want to reinforce our world’s inequities and boundaries or use technology as a means to erase them. It would be a tragedy to allow technology’s many tools to continue to reinforce traditional exclusions inherited from past generations. We are at a turning point where we can still change direction — and organizing sustainable inclusive initiatives may be the only means of pressuring governments and companies to take the best path.

If you would like to donate to support the work of Wikitongues or if you would like to get to know our work, please visit wikitongues.org. To watch our oral histories, subscribe to our YouTube channel or visit wikitongues.org to submit a video.

--

--