DeepMind Relies on this Old Statistical Method to Build Fair Machine Learning Models

Causal Bayesian Networks are used to model the influence of fairness attributes in a dataset.

Jesus Rodriguez
DataSeries
Published in
6 min readOct 22, 2020

--

Source: http://sitn.hms.harvard.edu/uncategorized/2020/fairness-machine-learning/

I recently started a new newsletter focus on AI education. TheSequence is a no-BS( meaning no hype, no news etc) AI-focused newsletter that takes 5 minutes to read. The goal is to keep you up to date with machine learning projects, research papers and concepts. Please give it a try by subscribing below:

One of the arguments that is regularly used in favor of machine learning systems is the fact that they can arrive to decisions without being vulnerable to human subjectivity. However, that argument is only partially true. While machine learning systems don’t make decisions based on feelings or emotions, they do inherit a lot of human biases via the training datasets. Bias is relevant because it leads to unfairness. In the last few years, there has been a lot of progress developing techniques that can mitigate the impact of bias and improve the fairness of machine learning systems. A few months ago, DeepMind published a research paper that proposes using an old statistical technique known as Causal Bayesian Networks(CBN) to build more fairer machine learning systems.

How can we define fairness in the context of machine learning systems? Humans often define fairness in terms of subjective criteria. In the context of machine learning models, fairness can be represented as the relationships between a sensitive attribute( race, gender…) and the output of the model. While directionally correct, that definition is incomplete as it is impossible to evaluate fairness without considering the data generation strategies for the model. Most fairness definitions express properties of the model output with respect to sensitive information, without considering the relations among the relevant variables underlying the data-generation mechanism. As different relations would require a model to satisfy different properties in order to be…

--

--

Jesus Rodriguez
DataSeries

CEO of IntoTheBlock, President of Faktory, I write The Sequence Newsletter, Guest lecturer at Columbia University and Wharton, Angel Investor, Author, Speaker.