Cognitive Bias in Machine Learning

The High Stakes Game of Digital Discrimination

Published in

Center for Open Source Data and AI Technologies

6 min readAug 17, 2018

Companies from a wide range of industries use machine learning data to do everyday business. From consumer marketing and workforce management to healthcare treatment decision solutions and public safety and policing solutions, whether you realize it or not your life is increasingly more affected by the outcomes of machine learning algorithms. Machine learning algorithms make decisions like who gets a bonus, a job interview, whether or not your credit card limit (or interest) is raised, and who gets into a clinical trial. Machine learning algorithms even help make decisions about who gets parole and who languishes in prison.

The result is that people’s lives and livelihood are affected by the decisions made by machines. And these algorithms have demonstrated that they fall short.

Machine Learning algorithms are built using data that is trained to inherently make assumptions. When given new input data a machine learning model generates values based on a trained machine learning model. This means the data are wholly dependent on the set of training data it’s given for scoring. Scoring is also called prediction. Without proper attention, cognitive biases that are common in society will inevitably bleed into the results. Training data that doesn’t account for variances in race, sexual orientation or identity, or age, can have outcomes that very negatively effect people’s lives.

Here are some examples of ways in which machine learning algorithms fall short and how the companies that built them responded to these findings.

Facial Recognition Tested by ACLU

In a report by the American Civil Liberty’s Union (ACLU) from July 2018, Amazon’s Facial Recognition Tech, called “Rekognition,” falsely matched 28 Members Of Congress with arrest mugshots. The members of Congress who were falsely matched with the mugshot database that were used in the test include Republicans and Democrats, men and women, and legislators of all ages, from all across the country. The false matches were disproportionately people of color even though people of color make up only 20% of the people in Congress. The ACLU is using this as an opportunity to call on Congress for a moratorium on using facial recognition technology in law enforcement.

UPDATE: Amazon says clients should use a confidence threshold of 95% or higher for “facial recognition or law enforcement activities.” But the ACLU points out that Amazon’s website, right now, recommends an 80% confidence for recognizing human faces.

Sentiment Analysis

Google’s Cloud Natural Language API was launched in 2016. In fall of 2017 Andrew Thompson from Motherboard experimented with the product and found disturbing biases in the algorithm. When he fed it the statement, “I’m Christian” the result was positive. But when he entered, “I’m a Jew”and “I’m a gay black woman” the result was negative. Sentiment analyzers are generally trained using news stories and books, which results in similar biases as can be found in society.

Google’s former head of Artificial Intelligence, John Giannandrea, has said, “It’s important that we be transparent about the training data that we are using, and are looking for hidden biases in it, otherwise we are building biased systems” but at the time this article was written Google declined to shed light on how the API is trained.

A Google spokesperson issued the following statement: “We dedicate a lot of efforts to making sure the NLP (Natural Language Processing) API avoids bias, but we don’t always get it right. This is an example of one of those times, and we are sorry. We take this seriously and are working on improving our models. We will correct this specific case, and, more broadly, building more inclusive algorithms is crucial to bringing the benefits of machine learning to everyone.”

ProPublica Study of At-Risk Criminal Data

In courtrooms across the nation a product called Correctional Offender Management Profiling for Alternative Sanctions, or COMPAS, utilizes machine learning algorithms to determine whether or not a criminal is at risk of repeating their crime. This information is then used to determine who can be set free at every stage of the criminal justice system, from assigning bond amounts to fundamental decisions about defendants’ freedom. In 2016 ProPublica did a study because bias was suspected in the output of this program. They found two disturbing things about the output. The first was that the formula was particularly likely to falsely flag black defendants as future criminals, mislabeling them this way at almost twice the rate as white defendants. Secondly, white defendants were labeled as low risk more often than black defendants.

A for-profit company, Northpointe, developed the algorithm that produces these risk scores. They disputed ProPublica’s analysis, criticizing their methodology and defending the accuracy of their test: “Northpointe does not agree that the results of your analysis, or the claims being made based upon that analysis, are correct or that they accurately reflect the outcomes from the application of the model.”

The Gender Shades Project

Joy Buolamwini, a PhD student at the MIT Media Lab completed a research study in early 2018, called the Gender Shades Project, on facial recognition software at three major companies, IBM, Microsoft, and Face++ and found that they all demonstrate both skin-type and gender biases. Across all three, the error rates for gender classification were consistently higher for females than they were for males, and for darker-skinned subjects than for lighter-skinned subjects.

The Gender Shades Project pilots an intersectional approach to inclusive product testing for AI. Video produced by Joy Buolamwini and Jimmy Day.

Buolamwini said, “We have entered the age of automation overconfident, yet underprepared. If we fail to make ethical and inclusive artificial intelligence we risk losing gains made in civil rights and gender equity under the guise of machine neutrality.”

While all three companies showed a bias, IBM had the largest gap in accuracy, with a difference of 34% in error rates between lighter males and darker females.

“This is an area where the data sets have a large influence on what happens to the model,” says Ruchir Puri, chief architect of IBM’s Watson artificial-intelligence system. “We have a new model now that we brought out that is much more balanced in terms of accuracy across the benchmark that Joy was looking at. It has a half a million images with balanced types, and we have a different underlying neural network that is much more robust.”

“It takes time for us to do these things,” he adds. “We’ve been working on this roughly eight to nine months. The model isn’t specifically a response to her paper, but we took it upon ourselves to address the questions she had raised directly, including her benchmark. She was bringing up some very important points, and we should look at how our new work stands up to them.”

Real World Consequences

The result of all this negative press in regard to modern machine learning shortcomings is that the general public is increasingly more uncomfortable with being the subject of the outcomes. Many are calling for stricter legislation to ensure that their rights are not infringed upon by biases in machine learning algorithms.

So what is being done to ensure that cognitive bias is accounted for in these algorithms? Many Computer Science departments are scrambling to introduce required courses on ethics so that in the future data scientists do a better job accounting for edge cases in the human existence.

Despite the continued shortcomings in the accuracy of artificial intelligence technology, public service organizations still seek it out. Just this month San Francisco BART is looking into installing facial recognition technology on all 4000 of their security cameras to aid in identifying criminal activity.

If anything can be learned from all of this, it’s that machines should only aid NOT REPLACE humans in bias-sensitive decision-making applications.

Is Open Source the Answer?

Transparency around the data that machine learning models are trained on could help identify cognitive bias. Our team at the Center for Open-Source Data and AI Technologies (CODAIT) at IBM builds AI developers a credible source for machine learning models that not only gives them a one-stop marketplace for multiple free and community-sourced frameworks, but also the metadata and provenance needed to ensure that the origin and quality of these community models is understood. Called the Model Asset eXchange, or MAX, our hope with this project is that we create transparency around the building and training of these models and ignite a community of open source contributions toward solving the big problems that need solving in machine learning algorithm building. By building open source models and training data we increase transparency, providing more opportunity for hidden biases to be uncovered. This doesn’t solve the problem of cognitive bias in machine learning as a whole, but it opens the doors toward collaboration and innovation in this space.