Machine Learning Intern Journal — Algorithmic Bias

As the title indicates, this is the journal of a Machine Learning (ML) intern at the impactIA Foundation. I’ll be attempting to keep a weekly journal of my activities in the Foundation to keep track of my progress and leave a roadmap for the interns who come after me.

Published in

impactIA

5 min readNov 9, 2020

Tomorrow I celebrate the end of my third month at the impactIA Foundation, marking the halfway point of my internship. I think I’ve done enough reflecting back on the previous months, so I won’t bore you with more. Instead, I would like to write about algorithmic bias. Last week, I took part in an online seminar, titled: “Race, Surveillance, and Technologies of Resistance”. The premise was the following: “In a world where BIPOC (Black, Indigenous, People of Colour) are disproportionately surveilled by technological means, what hope will they gain by using the same mechanisms for their own agency and well-being?”.

As usual with these blog posts, I’d just like to remind you that these are journal entries meant to track my progress and thoughts throughout my internship and are not meant to be deeply researched and complete articles.

There is a common (albeit false) impression that while humans are error-proned and biased, machines can’t be bias. “They’re just 1s and 0s, how can they be bias?”. Well, the logical fallacy here is that although machines are (for now) objective, the humans that build these machines are not. Enter, algorithmic bias. The lack of transparency surrounding most algorithms used these days makes it very difficult to peer inside the black box and figure out what is really going on under the hood.

When discussing bias in algorithms, we are usually referring to Machine Learning, and more specifically ‘supervised’ and ‘unsupervised’ learning, where a machine learns to make judgements, or predictions, about the data it is fed based on the patterns it has learnt during training. You may have already noticed a little (not so little) red flag: “based on the patterns it has learnt during training” — that doesn’t sound very objective does it? What exactly these algorithms learn is still quite mysterious, and a lot of work is being done in trying to visualise the actual patterns spotted by these algorithms. How does an algorithm learn patterns? By being fed a lot of data. And that is where the main bias comes from, the data that is fed to the algorithm during training. Now all of this might sound a bit too abstract, so let’s explore some real-world examples to build a better picture of how data can lead to algorithmic bias.

There are so many examples, it’s hard to choose. Aylin Caliskan, a computer scientist, explains that Turkish (one of her native languages) has no gender pronouns. However, when she translates from Turkish to English using Google Translate, it “always ends up as ‘he’s a doctor’”. The original sentence didn’t reveal the gender of the doctor, but the training data the system was exposed to clearly biased the algorithm towards assuming that if you’re a doctor, you’re a man. Another example was Amazon’s resumé-screening tool. The company set out to create a system capable of mechanising the search for top talent by reading résumés. A bias that escaped the researched was that most résumés the company had collected over time were men’s. This lead to the system learning to discriminate against women. Now, don’t get me wrong, the system itself doesn’t have an abstract notion of what a ‘man’ and a ‘woman’ are, but as I pointed out above, the algorithm’s task is to find patterns, many of which escape human’s attention. The system learned to penalise résumés that included the word ‘women’, for example in ‘women’s tennis captain’.

Are you starting to see the bigger picture? Algorithms are already being used all around, where we go to school, whether we get a car loan, how much we pay for health insurance. But most of these algorithms are opaque, unregulated and incontestable, and this can cause significant harm. Let’s look at another example, the crime prediction software made by PredPol, a Big Data start-up based in Santa Cruz, California. This program processed historical crime data (red flag) and calculated, hour by hour, where crimes were most likely to occur. Now, this doesn’t sound as bad as the crime-stoppers from Steven Spielberg’s Minority Report, but there are serious flaws in this system. Jeffery Brantingham, the founder of PredPol, stressed that the model is blind to race and ethnicity. Instead of focusing on individuals, it focuses on geography. And it’s fair to say that if cops are spending more time patrolling high-risk zones, they’ll discourage burglars and car thieves, which would be beneficial to the community. However, there’s a big overlook here. Most crimes aren’t as serious as burglary and grand theft auto. By including less-serious crimes such as vagrancy, aggressive panhandling, and selling and consuming small quantities of drugs, the model is skewed. “These nuisance crimes are endemic to many impoverished neighbourhoods” says Cathy O’Neil, a mathematician and researcher. “Once the nuisance data flows into a predictive model, more police are drawn into those neighbourhoods, where they’re more likely to arrest more people. This creates a pernicious feedback loop. The policing itself spawns new data, which justifies more policing.”

This is just one of many ways that BIPOC are disproportionately affected by these biased technologies. There is a notorious issue with face recognition systems: they are very bad at detecting black faces. The Algorithmic Justice League is an organisation who’s mission is to raise awareness about the impacts of AI, equip advocates with empirical research, build the voice and choice of the most impacted communities, and galvanise researchers, policy makers, and industry practitioners to mitigate AI harms and biases. I highly recommend you check them out and familiarise yourself with the incredible work they are doing.

I could keep writing for hours, but I’m going to try keep this post relatively short and concise. I have barely scratched the surface when it comes to algorithmic bias and it’s implications. I am disappointed that my university didn’t spend more time discussing this in our course. At the end of the article I will recommend some books that helped me understand this issue, and some I want to read next.

An important note is that a lot of algorithmic bias isn’t intentional. But what is very obvious is that a major way of reducing algorithmic bias is diversity. The tech world is unfortunately still largely dominated by white men, and they will not necessarily be able to identify their blind spots. A diverse group from all backgrounds (race, gender, and more) will ensure each other’s blind spots are covered by someone else.

BOOKS/RESOURCES:

‘Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy’ by Cathy O’Neil (Book)
‘Invisible Women: Data Bias in a World Designed for Men’ by Caroline Criado Perez (Book)
‘Algorithms of Oppression: How Search Engines Reinforce Racism’ by Safiya Umoja Noble (Book)
‘How I’m Fighting Bias in Algorithms’ by Joy Buolamwini (TED Talk)
‘Hey Google: scan my race’ by Recorde Daily (Podcast)

I’d like to conclude this post with a quote by O’Neil from her book (that I highly recommend) ‘Weapons of Math Destruction’:

“Data is not going away. Nor are computers — much less mathematics. (…) these models are constructed not just from data but from the choices we make about which data to pay attention to — and which to leave out.”

Machine Learning Intern Journal — Algorithmic Bias

As the title indicates, this is the journal of a Machine Learning (ML) intern at the impactIA Foundation. I’ll be attempting to keep a weekly journal of my activities in the Foundation to keep track of my progress and leave a roadmap for the interns who come after me.

Written by Léo de Riedmatten