From Data to Discrimination: How AI is Perpetuating Biases

Published in

SI 410: Ethics and Information Technology

7 min readFeb 17, 2023

As the age of Artificial Intelligence (AI) continues to grow and advance at a rapid rate, the use of AI has had a great impact on the way we live and interact with the world around us. From virtual assistants such as Siri and Alexa, to chatbots that can provide personalized responses to just about any question we ask, AI has truly become an integral part of our daily lives. Instead of opening up my phone to check the weather, send a text message, or play music, I’ve learned to simply speak one command and automatically accomplish these tasks.

As the development of AI continues to progress, the ultimate goal of AI is to ease the lives of people and create a more accessible world for everyone. However, as this technology continues to rapidly evolve, so too does its ability to learn and make decisions from biased data. This biased data used in AI systems ends up magnifying discrimination towards marginalized groups, despite the intended purpose of creating a more equitable society.

The Root Cause of the Issue

According to the book Data Feminism, the authors Catherine D’Ignazio and Lauren Klein discuss how the key cause of gender and racial bias in our information systems is the data that shapes them. AI systems function in a way where they are designed to learn and make specific decisions based on the pattern that they were trained on. When these patterns are inherited from biased data, the end result will inevitably be biased as well.

Ignazio and Klein states that “The data that is being used for these products is essentially created by small groups of people, but then the products that function based on this data are scaled up to users around the world”. It is evident that these small groups of people are nowhere close to being representative of all the different social, racial, and ethnic groups around the globe. It is difficult for these people to imagine what life is like outside of their own personal experiences, resulting in a lack of ability to detect biases in the systems that they are designing. Only in instances where the system is exposed to certain groups of people that it was not essentially trained for, do the shortcomings of the design process become apparent.

Additionally, this issue of utilizing biased data supports Ignazio and Klein’s concept of “privilege hazard”, the idea that when data teams are primarily composed of people who come from dominant groups, their perspectives unintentionally influence the decisions that are being made. Because their background demographic contributes to their social status and persona, their ignorance comes from their self perceived notion that they are on “top” of others.

Revealing the Faults of AI in Everyday Life

For example, I have witnessed firsthand the struggles and consequences that bias in algorithms can cause. During my software engineering internship at a robotics startup company, I was tasked with finding the best solution for implementing a facial recognition system into one of their self driving robots. This was to be done through extensive research and testing of several different facial recognition systems, specifically open source facial recognition projects or SaaS (Software as a Service) options. Upon doing research on which options have the highest accuracy, I chose 3 platforms to train and test my data on. This internship took place in Seoul, South Korea, and the demographic makeup of the entire team belonged to the same ethnic group. This meant that my data that I was testing consisted of a group of people comprised of the same racial background, specifically Korean. I found that there were major difficulties for the softwares to detect the differences among my team members, and would often mistake one person for another.

The situation that I encountered through my internship only further supports the claim that facial recognition softwares have proven to be far less accurate for people of color. This can technically be seen as a form of racial discrimination, for the software has inherent biases that allow one group of people to have an advantage over the other. Although this surely was not intentional, it is a consequence of the biased data that these systems were trained on.

A research study called the “Gender Shades” project, in which individuals were classified into 4 different categories, revealed the discrepancies in classification accuracy of facial recognition technologies. Reports of the study showed that for all the algorithms, they performed the worst on darker-skinned females. The statistics revealed that there were error rates up to 34% higher compared to that of lighter-skinned males (Racial Discrimination in Face Recognition Technology). In the case for my internship, the bias of the facial recognition systems did not make the software accessible to all groups of people equally. Because these systems were created in a way where the algorithm was trained on primarily lighter-skinned individuals, it faced difficulty when trying to detect people of color.

Another prevalent example of bias and discrimination in AI is in hiring algorithms. With the job market in high demand, Amazon has been creating computer models that will scan hundreds of resumes at once and choose the top few candidates to be selected to move onto the next round of the hiring process. It was not long before it was revealed that the system was not rating candidates in a gender-neutral way (Amazon scraps secret AI recruiting tool that was biased against women). The data being trained on for this AI system were the resumes of prior applicants, most of whom were men due to the male dominance in the tech industry. Based on this data, the system taught itself that male candidates were preferable compared to women. It downgraded resumes that contained the word “woman” or candidates who attended women colleges, thus resulting in less female candidates being chosen. This created an unfair advantage for men when applying to jobs, for they would already be ranked higher simply because of the fact that they were male over female.

These types of situations are prevalent not only in the job industry, but also in everyday life. Take for example someone who is trying to buy a home, and applies for a mortgage loan. AI has been used in credit scoring algorithms and loan applications for a long time now. Similarly to the Amazon resume screening incident, these algorithms are also “fed with data from previous years that are essentially based on historical discrimination” (The Secret Bias Hidden in Mortgage Approval Algorithms). This pretty much guarantees that the algorithm right now will have faults when it comes to discriminating against race. This study showed that Black applicants were 50–120% more likely to be denied for a loan compared to a White applicant with the exact same credit score. This just proves that bias exists within all different sectors of industries that use AI, and that the root cause behind it is the skewed data that is being inputted into the system.

Graph showing the denial rate for loans based on credit score and race/ethnicity from The Markup

Creating an Equitable Society Through AI

It is evident from these examples that AI has the potential to sustain and even expand current prejudices, if not carefully monitored. The power and benefit that AI itself brings can truly be life changing, but only if it is done in an equitable manner. We know that the advancement of technology is one that carries several advantages, however, the potential risks that it comes with must be addressed before expanding its use cases. The core issue lies in not the technology itself, but rather in the data that is being used to train these systems. As long as the sample data continues to be biased, the decisions that the system makes will undoubtedly be biased as well.

It is important to regulate the advancement of AI — especially in high risk use cases such as education recommendations, credit scoring, employment, and criminal surveillance (The Problem With Biased AIs). This regulation is necessary in order to protect consumers and ensure that there is no possibility of unintended biases occurring. This article discusses several methods in creating unbiased AI, such as educating data scientists and programmers on what ethical and responsible data looks like. It also mentions the importance of being transparent with customers, which gives insight on how these types of algorithms are created. To build a more equitable AI system in which bias will not be as prevalent, one simple suggestion is to train algorithms on diverse and representative datasets (Racial Discrimination in Face Recognition Technology).

By acknowledging and confronting the biases that come with the further development of AI, we are one step closer to creating a more equitable future that benefits all individuals, regardless of skin color or background. It is important that we take these measures to prevent the encouragement of discrimination against a certain group of people. I hope that through the collective efforts in confronting this issue, it would lessen the negative effects that it is causing on society today. The promise of AI can still be reached — but only if we take the steps to prioritize ethical approaches to its development and deployment.

From Data to Discrimination: How AI is Perpetuating Biases

Written by Hayley Cho