AI is not just learning our biases; it is amplifying them.

After my presentation “Bias: Statistical and Significant”, last week at the REWORK Women in AI in Healthcare Dinner in London, I was asked if I could write something on the topic aimed at a less technical audience. It’s my hope this article will provide enough of a technical intuition about the causes of biases in algorithms, while offering an accessible take on how we are inadvertently amplifying existing social and cognitive biases through machine learning — and what we can do to stop it.

What we mean by Bias

When we talk about bias we mean the same thing whatever our discipline. Whether we are talking about cognitive bias, social bias, statistical bias or any other sort of bias, bias is an inaccuracy which is systematically incorrect in the same direction. As a quick example, consider a recruiter who is consistently underestimating the ability of women. The recruiter is not just being unfair, they are actually hurting their own success at hiring good candidates. If it were an algorithm, it would be reducing its accuracy at test time. Bias is not just an ethical issue; it is primarily affecting the success of the person or algorithm which contains it.

Biased algorithms have already made it into production

What feels like an arms race around AI recently has created huge pressure for researchers to publish quickly and companies to hastily release their product to the market before someone else does. This has meant there has been little time to step back and actually analyse the biases. That, along with the focus on purely past dataset accuracy, has meant that biased algorithms have been scaled up to production. Here are two quite famous examples; of course bias isn’t just about race and gender, those are just some of the easiest places to notice obvious errors and injustices.

How Discriminative Models amplify bias

Discriminative models (in contrast to Generative Models) are the main cause of bias amplification and are also far more common in production. In a different post I will discuss the merits of Generative Models. In a nutshell, Generative Models learn a specific model of the problem and how each element of it interact with each other, allowing for greater interpretability of the model. Discriminative models on the other hand are more “black box” and learn to answer just specific questions.

So what can we do about it?

Firstly, we Researchers and Data Scientists should be comfortable with the fact that many datasets are not perfectly unbiased, and that optimising our algorithms to fit a dataset perfectly may not be what we need. There may be logical ways to reduce the bias in the dataset, but if not then, as mentioned in this article already, there are many ways to combat this bias technically in the algorithm. There is not one solution for every model, but there will be many methods to remove or at least reduce bias amplification. One solution that I am particularly interested in is to, where possible, use Generative (instead of Discriminative) models. They cannot be used on every problem, but they have the advantage of not amplifying bias in the same way as described earlier. They also increase the interpretability of the model — making it is easy to debug any bias in your dataset. This is what we do at babylon health and I would like to discuss (Bayesian) Generative Models further in a separate post.

Co-founder & CEO, myLevels, @laurahdouglas , Currently on Entrepreneur First's 10th London Cohort. Ex AI Researcher, Cambridge Mathematician

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store