A Is for AI, B Is for Bias

How Artificial Intelligence Facilitates an Extension of Human Bias, and What We Can Do About It

Aminah Aliu
The Startup
8 min readJan 9, 2021

--

As a person of color and child of immigrants, I’ve had first-hand experience with the prevalence of discrimination in society: from the neighborhoods we inhabit (housing discrimination) and the schools we attend (educational discrimination), to the jobs we occupy (employment discrimination) and the wages we receive for those jobs (pay discrimination), those in marginalized communities are often subjected to the stereotypes and prejudices of others.

As a younger kid, I might have thought that I could just escape this all by popping in a couple of earbuds, avoiding people, and living on the internet. But, WHAT FUN, it turns out that discrimination exists on the internet, too 🥳. Things like the prices we pay for online goods, the ads that target us, and the information we consume are all–you guessed it!–influenced by someone else’s stereotypes and prejudices.

🤖 AI & Discrimination

* For the purposes of this article, the words ‘bias’ and ‘discrimination’ are used interchangeably.

You might be wondering, what does discrimination have to do with artificial intelligence, Aminah?

Why does any of this matter??

Well, it’s easy enough to establish the salience and relevance of artificial intelligence in today’s society. Everyone from AI-expert Dr. Kai-Fu Lee¹ to entrepreneur and billionaire investor Mark Cuban² agrees that AI is arguably one of the few technologies–if not the only one–that will completely disrupt and redefine life as we currently know it.

Here’s the thing, though: while training computer algorithms and collecting the data used to train said algorithms, we may end up projecting our own biases–intentionally or otherwise–onto them.

Now hold up a minute, you might say, how can AI discriminate? Isn’t it based on data, not someone’s opinions?

Well…I’m sorry to break even more bad news to you, but data-based does not mean bias-free³.

🎲 Types of Bias in AI

I know, I know, you must be horribly devastated… FIRST I tell you that discrimination exists, and THEN I tell you that the internet is no escape.

But, wait.

IT. GETS. WORSE…

Not only does bias exist on the internet, but there are also different flavors of it–just like ice cream*.

*Please note that this list is not exhaustive, just some of the main types of bias.

🤝 Interaction Bias

Many artificial intelligence algorithms have the ability to self-learn through unsupervised machine learning⁵. While supervised learning trains AI from pre-labeled and pre-sorted data, testing its ability to draw the correct conclusions, unsupervised learning allows AI to draw its own patterns and conclusions from unstructured and unlabeled data⁶.

Giving AI the ability to draw it’s own conclusions from data is great and all, but leaves the algorithm vulnerable to interaction bias⁷, which occurs when AI essentially absorbs the biases of the users it interacts with through the data it collects from them.

The Twitter chatbot Tay, created innocently enough by Microsoft in 2016, is a perfect example. Designed to “engage and entertain people…through casual and playful conversation,” Microsoft’s plan was that “The more [Twitter users] chat with Tay the smarter she gets⁸.”

Many of Tay’s tweets have now been deleted. This is one of the less offensive ones (source)

Unfortunately, the Twitter users taking Tay up on her offer of polite banter were of the racist, homophobic, and ignorant variety. In less than 24 hours, Tay’s tweeting algorithm “learned” from these users, and began tweeting statements in support of Hitler (to mention just one of the horrific examples).

👨🏽‍⚕️ Latent Bias

The good thing about datasets is that they allow machines to tap into human knowledge…the bad thing about datasets is that they may unknowingly allow for the perpetuation of harmful human biases.

Unlike interaction bias, which is dependent on user’s engagement with the algorithm, latent bias comes from correlations inherent in the dataset the algorithm is training from.⁹ Also unlike interaction bias, latent bias is more prevalent in supervised learning models (which, as I said above, train AI from pre-labeled and pre-sorted data).

Let’s look at an example to better understand this bias.

Imagine an AI algorithm called 🔍DoctorFinder🔎. This algorithm is given a pre-labelled dataset with pictures of doctors and its job is to correctly label new pictures.

Unfortunately, there’s a high chance 🔍DoctorFinder🔎 would have a latent bias skewed towards men, owing to the current stereotype in the medical field that men are usually doctors and women are typically nurses. Thus, such an algorithm would be more likely to inaccurately label someone like Rebecca Lee Crumpler¹⁰ (the first African American woman to earn an MD in the US) as ‘NOT_A_DOCTOR’.

🍎 Selection Bias

Selection bias occurs when datasets overrepresent a certain group and underrepresent another.⁹

Source: Google AI

In the image above, an algorithm trained on images of more western-style weddings was unable to accurately label an African-style wedding (as a Nigerian-American, this is just painful).

🏃🏽‍♀️ Has There Been Progress?

Thankfully there have been strides made towards dismantling bias in algorithms. Here is a recent example:

👓 Computer Vision with the ImageNet Database¹¹

In 2009, computer scientists from Princeton and Stanford released ImageNet, a database with labelled images of objects, places, and people that could be used for academic research.

In order to scale this database, however, much of the image labelling was crowdsourced, which led to labels that were biased, inappropriate, or offensive. According to Kaiyu Yang, the lead author of the research paper this group published, “When you ask people to verify images by selecting the correct ones from a large set of candidates, people feel pressured to select some images and those images tend to be the ones with distinctive or stereotypical features.”

After reassessing ImageNet, the scientists recognized two sources of bias:

  1. ImageNet’s categories were based off of a much older database of words called–you guessed it!–WordNet. Some of the words in WordNet did not transfer well to the visual nature of the ImageNet database and, more specifically, words used to describe a person’s faith or culture were more prone to returning only the most striking of images, leading to algorithms that perpetuated stereotypes (see Kaiyu Yang’s explanation above).
  2. ImageNet’s images are sourced from online image search engines like Flickr, and these types of search engines have been shown to return images that are biased towards the demographic of young, white males.
Source: Towards Fairer Datasets

In order to counteract this bias, the researchers first identified offensive words and removed a little over half of the 2,932 categories they previously used to label people. After that, they removed words that had a rating of <4 out of 5 for “imageability” (see below).

Source: Towards Fairer Datasets

The resulting refined and filtered database still had over 133,000 images!

After this process, the researchers created a tool that allowed their users to get sets of images demographically balanced according to the user’s needs along the attributes of skin color, gender expression, and age (see below).

Source: Towards Fairer Datasets

Way to go, researchers! You get a gold star 🌟.

Even though this is just one of many examples, please see below for more information¹² ¹³ ¹⁴.

🔦 Spotlight: Gender Bias¹⁵

A research paper published in 2018 corroborated the idea that computer algorithms reflect the values of their creators, leaving them vulnerable to perpetuating bias.

This paper focused specifically on gender bias in textual applications of machine learning, and found specific linguistic features in written text that reveal gender bias.

The BNC is the British National Corpus. Source: Gender Bias in Artificial Intelligence

Previous research found that:

  • Women’s occupations were often modified by their gender (e.g. “female lawyer”) emphasizing their roles as unexpected and differing from societal norms.
  • Women were described as girls (in order to portray them as weak, immature, or dependent) more often than men were described as boys.
  • Women were described more often in terms of their relationship to others.
  • Men were described more in terms of their behavior, while women were described in terms of their appearance and sexuality.
  • Women were mentioned less often than men in texts.

In order to prevent this bias, the paper suggested incorporating a computational and linguistic understanding of gender theory during the ‘feature extraction’ stage of machine learning¹⁸ as well as cultivating more diversity developer teams in the area of machine learning.

Please don’t do this (source)

💥 What Can I Do?

Whether you’re a full-fledged computer scientist or not, you can do something to counteract bias in artificial intelligence.

For starters¹⁶:

  1. Think more critically about the answers you get from AI algorithms. Instead of accepting them at face value, understand why the answers were given.
  2. On the flip side of that, write algorithms that are more transparent! Bias can’t be removed from algorithms if it’s hidden under hundreds of obscure layers.
  3. Check for discrimination by examining protected classes such as race, gender, etc. If your algorithm has trouble analyzing data from these classes or returns outputs that are offensive, then something needs to be changed.
  4. Ensure that your algorithms are trained on diverse data in order to mitigate the chances of said discrimination.
  5. Develop algorithms with a diverse array of people–their voices are critical and their perspectives are essential to counteracting bias.

“Who codes matters. How we code matters. Why we code matters” — Joy Buolamwini ¹⁷

💭 Final Thoughts

In the end, we can not hope to dismantle inequity and discrimination by hiding from them. We must be proactive about having these conversations–honestly, openly, and productively–in order to make any progress.

✍🏽 Main Takeaways:

  1. Discrimination exists. On the Internet, too.
  2. There are three main types of bias in AI: interaction bias, latent bias, and selection bias
  3. Progress has been made to identify and remove these biases, but it is not enough. Everyone must be more conscious of bias in their own algorithms or the algorithms of others in order to counteract this problem.

Sources:

  1. https://builtin.com/artificial-intelligence/artificial-intelligence-future
  2. https://www.inc.com/jeff-haden/mark-cuban-worlds-first-trillionaire-is-learning-1-skill-discovering-how-to-use-it-in-now-unimaginable-ways.html
  3. https://www.youtube.com/watch?v=59bMh59JQDo
  4. https://www.youtube.com/watch?v=gV0_raKR2UQ
  5. https://www.wired.com/brandlab/2020/05/reeducation-ai-self-learning-approach/
  6. https://towardsdatascience.com/unsupervised-learning-and-data-clustering-eeecb78b422a
  7. https://techcrunch.com/2016/12/10/5-unexpected-sources-of-bias-in-artificial-intelligence/
  8. https://www.theguardian.com/technology/2016/mar/24/tay-microsofts-ai-chatbot-gets-a-crash-course-in-racism-from-twitter
  9. https://www.forbes.com/sites/cognitiveworld/2020/02/07/biased-algorithms/?sh=2213136176fc
  10. https://www.aamc.org/news-insights/celebrating-10-women-medical-pioneers
  11. https://www.sciencedaily.com/releases/2020/02/200214105246.htm
  12. https://www.sciencedaily.com/releases/2020/09/200930144424.htm
  13. https://www.sciencedaily.com/releases/2020/07/200707113229.htm
  14. https://www.sciencedaily.com/releases/2020/10/201001200236.htm
  15. https://doi.org/10.1145/3195570.3195580
  16. https://www.youtube.com/watch?v=gV0_raKR2UQ
  17. https://www.youtube.com/watch?v=UG_X_7g63rY
  18. https://medium.com/@mehulved1503/feature-selection-and-feature-extraction-in-machine-learning-an-overview-57891c595e96

One Last Thing–

I sincerely hope you got value out of this article! For more, check out the other articles my medium page. Have more questions or thoughts? Let’s connect on LinkedIn!

--

--

Aminah Aliu
The Startup

Aminah Aliu is a writer and a junior at Princeton University. She enjoys conversation with others, writing poetry, and learning about STEM.