Diving into Machine Learning’s Systemic Bias Problem

Cal Schurman
5 min readJan 18, 2022

--

Source

As the use of Machine Learning and Artificial Intelligence spreads throughout many industries, it’s important not to overlook the unconscious biases being programmed into these algorithms. Machine Learning plays an important roll in areas such as the Healthcare system, Criminal Justice system, and Finance, just to name a few. But the systemic biases that are being fed into these models are harmful to our society — specifically to those who belong to marginalized groups based on their gender, race, ethnicity, and socioeconomic status.

What are unconscious biases?

Unconscious biases, also known as implicit biases, are prejudices or social stereotypes that occurs subconsciously against certain groups of people. Researchers suggest that these biases are ingrained and stem from the human tendency to categorize.

Evolutionary behavior causes humans to have unconscious biases — whether we notice it or not — and these thought processes are being programmed into our Machine Learning models.

Source

Where does this bias stem from?

Machine Learning is a component of the data science field which focuses on programming models to imitate the way that humans carry out tasks. Machine Learning uses algorithms that are trained on relevant datasets to make insights or predictions regarding certain problems, which are then used for decision making.

Machine Learning models generally follow a specific workflow, shown below, which involves data collection, data preparation, model training, and testing. Furthermore, depending on how well the model scores on the input data, improvements will be made and then re-tested until an optimized model has been created.

Prejudices in our models stem from the datasets that are being used to train them. If our data is inherently biased, then the algorithms that we are training based off of that data are going to be biased. As a result, our decision making based off of those models will be biased — all of which is being done without awareness.

Source

Lets dive deeper! As you can see from the Machine Learning workflow above, the first step is to gather data. Say we gather data that has some inherent biases, such as the relationship between mortgage loan approval and race (you can read more about this issue from this article on Forbes). Those biases now have the potential to be exacerbated during the cleaning and manipulating step.

Now we move onto training the model and testing our data. If we are feeding biased data into our model and training it to inherit those biases, then as we try to improve our model it will take those biases and continue to re-train itself, exacerbating them even further. So what initially started out as unconscious micro-biases have now been inherited and trained even deeper into our Machine Learning models — resulting in 80% of Black applicants’ mortgage loans to be rejected.

Because the number of White men in the technology industry surpasses those of other races or genders, the models we create from training our data are reflections of the way that they perceive the world. Any unconscious biases within our Artificial Intelligence are a representation of our own personal biases.

Source

What are some more examples of these biases?

The Healthcare system and the Criminal Justice system are two examples of industries that suffer greatly from implicit biases in algorithms.

The Healthcare system historically suffers from racial, gender, and socioeconomic biases. Black patients have historically been given limited access to pain medication, and doctors have shown underlying prejudices with pain tolerances based on race. Similarly, medical professionals have commonly been shown to dismiss women in pain and have been reluctant to treat them, which has resulted in irresponsible data. Research also shows that doctors tend to avoid providing treatments for low income patients based on affordability, which has also lead to unreliable and inequitable data. The cumulation of these underlying biases within the data are what’s causing Machine Learning biases within the Healthcare system.

One major example of bias within the Criminal Justice system is the use of the Risk Assessment Algorithm, which is an algorithm that will state the likelihood of a defendant re-offending, which leads to the decision of what sentence the defendant should receive. But this algorithm is trained on historical crime data, which is extremely prejudiced because said data is a reflection of disproportionally targeted populations by law enforcement.

These are just a few examples of underlying biases within our Artificial Intelligence — there are many more examples throughout a wide range of industries.

Source

What are ways in which we can mitigate these biases in our data?

First and foremost, we have to be intentional about gathering data. We have to start by asking ourselves: Are there any imbalances in our data? Where is this data coming from and who gathered it? Are we doing anything to make sure there are no prejudices in our data before training our models?

It is also extremely important to have a diverse team of people involved in all aspects of processing data and training models, so we can be sure to reduce our chances of creating a biased system. Additionally, we have to strike a balance between making sure that we have inclusive and diverse data, while also being sure our data is still an accurate representation of our population and is not skewed.

Not all biases are bad though…

Humans are not all intrinsically the same, so some forms of bias is helpful to keep. Let’s take Autism for example. People who identify as male are diagnosed with Autism at 4x the rate than those who identify as female. This isn’t because men are more likely to have Autism, but more-so that the symptoms present different in women than they do in men. Therefore, biases relating to how Autism symptoms present in women vs. how symptoms present in men should remain a part of algorithmic decision making.

It is important not to eradicate all biases within our Artificial Intelligence, but determine and train with intentionality. Knowing which biases are harmful and which biases are helpful is a necessary step in our efforts towards equitable Artificial Intelligence.

--

--