How AI became biased and exclusionary

Jan Hinrichs
Beluga-team
Published in
4 min readAug 20, 2017

In April a story broke about how AI is taught to be gender prejudiced by either human programming or simply by learning from language patterns already existing on the internet. The story broke after a massive study analyzing millions of words online, looked closely at how different terms related to each other in text. Hence, it came out that male names were for example more associated with career-related topics, while female names were associated with family. Unfortunately, the list goes on…

Meanwhile, fast-forward a couple of months and it turns out AI has become quite prejudiced against dialects and even choosing to actively filter out vernacular language or slang from language protocols.

This, of course, raises the question of societal growth impact or whether or not this can be downplayed as an isolated incident. To give a clearer image, we need to look at a couple of different aspects. Firstly, as our interaction with AI and personal assistants such as Alexa and co. become more mainstream, some parts of the population might not receive the same treatment or even be completely ignored by bots. Try to imagine all the various domains which natural-language technology is powering, such as automated customer service, phone systems as well as the web and social media mining to comb through data for information. This while effectively discriminating against entire minorities.

Let’s look at how it discriminates against minorities: The language being fed into many NLP units is Standardized English –a generally accepted language norm and also the most formal version of the language. If you look at the English speaking world you have a big group of native speakers –roughly 400 million and another even bigger group who speak it as a second or third language. If you subtract the number of dialects and regions where English is spoken with local influences, your results will start to become convoluted. Meaning, there is no clear way to define, how many of these people speak the „standardized English“.

When thinking about programming a language, picking the language norm might seem like the obvious choice, because it is highly regulated and therefore can’t create many difficulties while running a program. Meanwhile, what might work perfectly for a service provider or a telephone company, doesn’t have to work for social media or internet interaction. The internet doesn’t apply itself to said conventions and offers a more colorful and rich interpretation of language, composed from the millions of voices and tonalities of its different users across the globe.These rich voices which contribute to a world wide dialogue are filtered out in some cases of AI running rampant.

The racist AI

Assistant Prof. Brendan O’Connor and his student Su Lin Blodgett, from the University of Massachusetts, in Amherst, took it upon themselves to analyze how NLP interprets dialects. In a first step the team looked at Twitter’s language usage and it’s different minorities, by using demographic filtering and collecting about 60 million tweets from the black community. Subsequently, they went on to test a number of natural-language processing tools to see how these would interact with the given statements. The result was quite astonishing, one of the used tools decided to classify the black community postings as being of Danish origin. Furthermore, the duo then tested a number of popular machine learning APIs that analyze a text based on its feeling, only to find that these struggled just as much. Finally, O’Connor concluded that the problem affects so many different systems, that it extends to any unit which uses language, INCLUDING search engines. A valid point prof. O’Connor makes is that:

„If you analyze Twitter for people’s opinions on a politician and you’re not even considering what African-Americans are saying or young adults are saying, that seems problematic.“

This could be one of the most incendiary findings with regards to AI interpretations and its exclusionary, nay — racist character.

Prof O’Connor goes on saying that: „If you purchase a sentiment analyzer from some company, you don’t even know what biases it has in it,“ and that we do not possess the required skills to properly audit the way a system affects minorities.

Research efforts into the matter, such as offered by the AI NOW initiative have become more aware of these biases and are working across disciplines to understand the both social and economic implications of artificial intelligence in everyday life. With AI only being able to “see” what is in the data it is being fed, a necessity arises to question who stands behind it and how the bias is defined. AI Now researches and measures the nature of such bias and the impact of such bias on diverse populations.

Programmers and computer scientists will have to become more accountable and face their new responsibilities with regards to their work. They will have to find an answer to the social problems we face in „real life“ and create an awareness of them when working with AI and language processing units.

First published on LinkedIn

--

--

Jan Hinrichs
Beluga-team

Founder & CEO of Beluga Linguistics, Citizen, Activist, Papa...