Technical Solutions to Reduce Algorithmic Bias: The Variational Fair Autoencoder

One of our core values at is to Love People. We mean this in every sense, as we shared in this post, but the one that is pertinent to our machine learning team is to build models ethically, considering how their output may impact people’s lives.

This image has a fascinating history behind it that speaks to different perspectives on equality, equity, and fairness

We work with large enterprises and apply machine learning to consumer behavioural data to help our clients increase the overall lifetime value of their customers, i.e., to guide them to the next best action or product to create stronger relationships with the business. There are different attributes and behaviours that we could consider when building our models and some of them are socially sensitive: for many use cases, we don’t want things like gender, ethnicity, or marital status to influence our predictions.

As much as we Love People, we also love cake (did anybody say red velvet?). And we really love having our cake and eating it too. When it comes to managing potential sources of bias in models, Variational Fair Autoencoders (VFAE) allow us to do just that (Louizos et al. 2016).


Generally, machine learning (ML) models take a set of inputs and are trained to learn how to predict a target. Inputs could be images, in which case the target could be the classification of objects in the images. Inputs could be text, with the target being corresponding text in a different language, if we were interested in creating a model that would translate for us. And the list goes on. ML models are very flexible in what inputs they accept and what targets they can be trained to predict.

Autoencoders are a unique type of ML model in which the input and output are very similar to one another, and in some cases identical. Autoencoders typically have two channels. The first performs an “encoding” of the input into a new mathematical representation, or latent space, that has a different representation of the original data. This “encoded” data then feeds into a “decoding” channel, which attempts to transform the latent representation back into the original space. Why would we do this? One powerful application relates to helping an algorithm work better on real-world data: using an autoencoder, we can inject noise into the input, and while trying to predict the “clean” dataset, we will obtain a model that is very flexible to more messy, and hence realistic, data. Another application is the reduction in the dimensionality of the original dataset. ML models require more resources to train, the more features are included in the input data. By encoding the data into a latent space with a lower dimension, we can save on computational resources, while at the same time still retaining the most valuable components of the data.

The word variational refers to Bayesian techniques to approximate the posterior, i.e. the probability distribution of the output, the model is trying to predict. In practice, this means we make assumptions regarding the probability distributions of our target and latent representation, with the outcome being that instead of attempting to predict the exact probability distributions, we only aim for an approximation. This results in significant computaional simplification, with a minor reduction in ultimate precision (so long as the assumed probability distributions are reasonable).

Most interesting, and novel, is the fair component. In VFAEs, we explicitly give information about the properties of the data we do not want influencing our models. The latent version of a dataset generated by this model contains much of the valuable details of the original dataset, but with a reduced influence of the sensitive information. We can then train models on this latent representation, and obtain good predictions that are less biased by sensitive properties of the people included in the dataset.

In our models we will use neural networks to transform the data through the encoding and decoding channels.

Real data with real people

Let’s explore how this might work using the Stanford Open Policing Project — Ohio (link), which aggregates anonymous records of police arrest, both pedestrian and traffic, in the state of Ohio. This dataset includes sensitive information, such as the race of the people involved, which is something we definitely do not want to influence our model’s predictions. We will use whether the person was arrested as the target variable.

(As a side note, we believe a use case like predicting if someone should be arrested shouldn’t be fully automated by an algorithm; this kind of sensitive analysis requires human judgment (at least for the foreseeable future)).

In a perfect scenario, we would train a model that has both a high precision and recall. Precision is the measure to which a model correctly classifies people, whereas recall is a measure of coverage of the classification. In our test case here, a high precision would mean predicting a person should be arrested only if the model is super confident. On the other hand, a high recall would mean a model which predicts as many arrests as possible, so as not to miss any. We can see how there is a tradeoff between the two. Given the sensitivity of the task, we choose to optimize for precision in correctly classifying those who should be arrested. This would come at the expense of missing people, but we are OK with that.

This dataset is highly imbalanced, with only about 0.6% of cases being arrests. Furthermore, it does not include many features (26 overall, with 16 being either of no use or redundant). Another challenge is that once the data is processed, which will be described below, the number of columns significantly expands due to categorical features being transformed into dummy variables. This results in a very sparse dataframe.

The first thing to do is to clean up the features. The following table lists all the features in the raw dataset, whether the feature will be kept, and reason for doing so:

Let’s clean and prepare the data

For more details on the cleaning and processing of the data, please take a look at the source code, which can be found here.

Here we would mention only the violations column, which we found interesting. It includes lists of strings, each denoting the violations (allegedly) performed by the person. For model simplicity we only want to focus on one, and we want to make sure it is the one that is most severe. Overall there are 12 unique violations included in the dataset, ordered here with #1 being the worst (which is a subjective definition):

  1. dui
  2. speeding
  3. stop sign/light
  4. license
  5. cell phone
  6. paperwork
  7. registration/plates
  8. safe movement
  9. seat belt
  10. equipment
  11. lights
  12. truck
  13. other
  14. other (non-mapped)

We replace the violations column with an integer column representing the worst violation attributed to the person.

The result is a data frame with 82 columns, which is now ready to be analyzed using a machine learning algorithm.


The classification model we will use will be Random Forest. Once we perform a pass on the dataset, we get the following results

Precision = 0.55
Recall = 0.13

However, it is upon a deeper look that we see an issue. Consider the distribution of the races of the people in the dataset:

We can see that after modelling, the distribution of the groups of people changes. In particular, the ratio of African Americans significantly increases among those who are predicted to be arrested, whereas the fraction of Caucasians decreases. As we mentioned, we need to define a feature to use in our VFAE as the sensitive feature. For simplicity we will choose just one, even though VFAEs do accept multiple, and we will use ‘driver_race_Black’, which is how it is recorded in the dataset, given the significant bias the model shows against this group of people. We can quantify this result by looking at a modified version of the discrimination parameter proposed by Zemel et al. (2013). The original is defined as follows

This parameter compares the probability our model predicts for arrest (a=1) of people in the protected class (s=1), with the corresponding probability for the non-protected class (s=0, i.e. everyone else). We want the probabilities to be as close to one another as possible, therefore we are looking for a discrimination parameter close to zero. However, there is a challenge with using this parameter in the case of a severely imbalanced dataset, which we find ourselves in. The issue is that the probability of arrests is very small, therefore the difference between two small numbers is likely to be small as well. To remedy this we consider instead the ratio between the two probabilities

We would ideally want discrimination_ratio to be as close to 1 as possible. We get the following result

Discrimination_ratio = 3.26

Clearly this is problematic, and this is where VFAE steps in.

Variational Fair Autoencoder time!

By transforming the dataset using a VFAE, which involves injecting information regarding the race of the drivers into the model, we obtain a dataset which contains most of the useful information, but removes the dependence on race. Let us see what happens when we retrain the model.

Precision = 0.54
Recall = 0.15

We see that we still obtain comparable performance as before. Looking at the discrimination parameters and diatribution of predicted arrests within the groups

Discrimination_ratio = 1.86
The plot on the left shows the previous results we discussed earlier, and the plot on the right shows the results after VFAE is applied.

We can see that the model becomes significantly less biased. Unfortunately, some information regarding race does leak through. There are several reasons for this. One could be a strong correlation between various features in the dataset, which makes it hard for the autoencoder to remove the dependence on the sensitive feature. For example, a bias might exist in determining whether people get stopped by a police officer, and whether they are searched for contraband, and whether they are ultimately arrested. Also, the limited feature set and imbalanced nature of the dataset reduces the signal present in the data to allow for extracting information.

These sorts of techniques are very exciting for because they allow us to train great models while making sure we are not accounting for information we neither need nor want to impact our predictions.

There are, of course, limitations to these models. For one, we need to explicitly specify what features we consider sensitive. This may not always be obvious. Also, given that we transform the dataset into a latent space, which is essentially abstract, we will have difficulty explaining the reasons for our model results. For example, in the above analysis although we can be confident that race plays a limited role in predicting whether a person should be arrested, we would have trouble saying what is an important factor. This is where techniques such as FairML, for example, can help.

Loving people means doing the best we can to ensure every person affected by our model is treated fairly. Doing so is a complicated task, especially when dealing with complex non-linear models and data that inherently contains bias, but one that we proudly undertake. Members of our team have published about this extensively (e.g. Tyler Schnoebelen’s post on ethics in data, Kathryn Hume’s discussion of bias in AI, and several Slack chats, such as this one and this one). Using VFAE is just one of the tools we will employ to achieve this goal. We are investing engineering effort to ensure our infrastructure is secure, are creating partnerships with the Vector Institute to explore technical solutions with leading researchers in the field, and are making deliberate choices about the projects we accept from clients so we can say with integrity that we are applying AI to make the world a better place.

Yevgeni Kissin is a Data Scientist at Yevgeni has a PhD in Astrophysics from the University of Toronto. In his former life, he spent his time studying the stars, but now he’s all about machine learning. He loves talking about anything space and ethics.