Inclusivity in Machine Learning

Published in

Inclusify by Design

5 min readApr 8, 2021

White robot with green eyes with its hands up

Data science, Machine Learning (ML), and Artificial Intelligence (AI) are popular buzzwords that have come up in the last few years. The technical world has shifted from a group of programmers writing a piece of technology to machines doing it for us. Since technology is often perceived to be unbiased I was interested in exploring the biases that may be present in some ML models.

ML models are very opaque

Machine learning is already being leveraged in domains like recruiting (screening job applicants), banking (credit ratings/loan approvals), judiciary (recidivism risk assessments), welfare (welfare benefit eligibility), journalism (news recommender systems), and many more. Given the scale and impact of these industries, it is crucial that we take legal and technical measures to ensure that they operate fairly.

But despite the need for transparency, these models are largely opaque. ML models are highly sophisticated and cannot easily be comprehended by most people. Moreover, the data used to train many of these models are unknown or cannot be disclosed, and most people do not realize how their personal data is being used. This often happens when websites or applications have lengthy terms and conditions which most people don’t read. This was recently demonstrated by the controversy surrounding Whatsapps new privacy policy. Additionally, the use of personal data to make predictions gives rise to the possibility of confounding factors. A confounding variable is an unmeasured third variable that influences both the supposed cause and the supposed effect. For example, the tendency of a convicted criminal to re-offend i.e. recidivism rates amongst the black population is higher than other races — this may imply correlation but not causation. There might be a third confounding factor — for example, systemic racism in the justice system that may cause this.

You can learn more about how confounding factors can affect models here

How has this affected society?

Let’s delve into the risk-assessment algorithms used by the justice system. These are used to inform decisions about who can be set free at every stage of the criminal justice system, from assigning bond amounts to even more fundamental decisions about defendants’ freedom. In Arizona, Colorado, Delaware, Kentucky, Louisiana, Oklahoma, Virginia, Washington, and Wisconsin, the results of such assessments are given to judges during criminal sentencing.

ProPublica, a well-known investigative journalism newsroom sought out to analyze the Northpointe, Inc (a consulting and software firm) algorithm called COMPAS for recidivism rates. This is how they evaluated the algorithm:

They looked at more than 10,000 criminal defendants in Broward County, Florida.
They compared their predicted recidivism rates with the rate that actually occurred over a two-year period.
They compared the recidivism risk categories predicted by the COMPAS tool to the actual recidivism rates of defendants in the two years after they were scored

This is what they found:

They found that the score correctly predicted an offender’s recidivism 61 percent of the time, but was only correct in its predictions of violent recidivism 20 percent of the time.
According to ProPublica's analysis — Black defendants were often predicted to be at a higher risk of recidivism than they actually were. They found that black defendants who did not recidivate over a two-year period were nearly twice as likely to be misclassified as “higher risk” compared to their white counterparts (45% vs. 23%).
You can read more about this analysis here

This kind of bias is everywhere: MIT grad student Joy Buolamwini was working with facial analysis software when she noticed a problem: the software didn’t detect her face — because the people who coded the algorithm hadn’t taught it to identify a broad range of skin tones and facial structures. Learn about her experience here. There are several pieces of technology such as this that are not inclusive and representative of the real world. For example, emojis weren’t customizable to skin color until very recently.

The bottom line is no matter the bias, the recommendations of ML algorithms have a real impact on individuals and groups. Models such as this risk-assessment algorithm make life-changing decisions for many individuals. I implore you to think, is this a fair system?

Should we steer away from ML models?

In my experience as an Informatics student, learning about these ML models has taught me that the quality of a model’s output is highly dependent on the quality of the input it receives.. This phenomenon is called “Garbage in, garbage out”, to put it eloquently. The problem is that algorithms are trained on data produced by humans. If the data itself is tainted with historic and structural biases, that taint will invariably be imputed onto an algorithm’s output.

The good news is that there has been significant progress toward fairness in ML. In fact, Google made a guide to mitigate such biases which encourages practices like questioning the source of data, transparent algorithms, and diverse teams among others. There is also now a General Data Protection Regulation which will approach accountability in models.

I would love for you to experiment with how machines learn to be racist here. I had a lot of fun with this website by putting different keywords to search for how different news outlets frame the same search word and how algorithms pick up on this. For example, when I searched for “women” — the model trained by left-leaning outlet articles categorized “bimbos” and “grandiloquent” as synonyms while right-leaning outlets categorized “feminists” and “orgasms” as synonyms.

With these insights, it is my opinion that if the ML programmers are educated and wary of these subtle biases they can build better and more effective algorithms.

What can I do?

If you are someone who creates such algorithms you have already taken the first step by being here. It is not easy to detect bias in real-world data, and there is no one-size-fits-all solution. Civic Analytics, a Data Science and Analytics company, conducted a case study that examines the ability of six different fairness metrics to detect unfair bias in predictions generated by models trained on datasets containing known, artificial bias.

If you are interested in the fairness of models you can do some research into models that affect your personal life to judge if the results are something you would trust.

Some of the best practices that can be employed when building an ML model are:

Thinking about the inputs and the outputs of the model.
The available fairness metrics can help up to a limit. It is for us to use them wisely.
Use a diverse team to create models.
Always study the data, its sources and also check the predictions.

The University of Washington hosts many talks and presentations on this topic. Feel free to join the UW AI mailing list to get notifications about upcoming events. You can do so by sending an email to uw-ai-request@cs.washington.edu, with the line “subscribe listname” in the body of the message.

Inclusivity in Machine Learning

ML models are very opaque

How has this affected society?

Should we steer away from ML models?

What can I do?

Sources

Written by Saasha Mor