Visualizing ML model bias with dalex

Jakub Wiśniewski
ResponsibleML
Published in
5 min readJan 12, 2021

Using python packages dalex and fairtorch

In this blog, I will show you how to easily visualize model bias with python package dalex. To do that I will be using COMPAS datasetcontaining criminal history, jail and prison time, demographics, and COMPAS risk scores for defendants from Broward County. As you probably predict this data contains a bias towards black people. The model that we will train on this data will inherit the problems and will be discriminative. However, bias detection will not be the topic of this blog because I have already written about it here. We will be comparing 2 neural network models created with pytorch. In addition one of them will be created with an amazing package fairtorch that enables training neural networks that minimize both cost and bias. There was a lot of code, so if you are interested I will link the jupyter notebook with code here. Let’s get to work!

Using plots to determine if ML model is fair

After making two models called pytorch_before (default neural net) and pytorch_after (“fair” neural net created with fairtorch) we will then use dalex package to obtain the object of class Explainer and then using model_fairness() method we will obtain an object in which many fairness metrics are recalculated and ready to be visualized!

mf_after.plot([mf_before])

As you can see the pytorch_before is exceeding the default boundaries and therefore can not be deemed fair. pytorch_after however, did a very good job at minimizing the fairness metrics because it fits right into the green (good) zone. More about this plot in the previous blog

Using plots to visualize bias

Parity loss

To show how big the bias difference really is we will need something that reduces the bias over many subgroups (for example races, genders, nationalities, etc…) to one number where 0 will denote no bias and the bigger the value becomes the more serious the disparities are. We called such value parity_loss.

parity loss for metric M and groups a,b,…

The function to obtain the parity_loss seems complex at first but the intuition behind it is simple. The bigger the ratio of some metric M among subgroups (a,b, …) the worse. With this knowledge, we are ready to visualize the discriminating models from different perspectives.

Stacked Parity Loss Plot

The idea behind the plot below is to show all metrics cumulated (stacked) to compare the overall bias hidden in models.

mf_after.plot([mf_before], type = “stacked”)

As we can see the pytorch_after is really having only a little bias compared to it’s predecessor.

Fairness Radar

The next plot is a little bit more conservative in terms of presenting metrics. For each metric value, we have a scale on which the models are “spread”. Here I only took 3 metrics to compare instead of the default 5.

mf_after.plot([mf_before], type = “radar”, metrics = [“FPR”, “TPR”, “STP”])

Fairness heatmap

A great way to compare lots of metrics (and models!) is to use a heatmap. The idea here is the same as before. The higher the metric the more racist the models are.

mf_after.plot([mf_before], type = “heatmap”)

Here we can see all the metrics available in the package. We of course see that the model without constraints over the fairness metrics ( pytorch_before ) has bigger values of parity_loss metrics.

Performance and Fairness

There is a known phenomenon when ML engineer tries to lower the bias according to certain metrics. The performance also drops! To show how big the problem is we will use the plot below. Please note that we will use the FPR metric which is one of the most important ones for this dataset.

mf_after.plot([mf_before], type = “performance_and_fairness”, fairness_metric = “FPR”, performance_metric = “auc”)

The problem does not seem to be big! The AUC only dropped by around 0.01 which considering the big decrease in FPR parity loss is a great tradeoff. The plot is constructed in a way that the best models (considering performance and fairness metrics) will be placed on the top right corner.

Ceteris Paribus Cutoff

Using this kind of plot is somewhat controversial. The idea behind is that we choose one subgroup (in our case Female ) and we change it’s cutoff, so for different probability response, the person will be assigned to a different class (default is of course 0.5). I said it is controversial because it might not be considered fair to have 2 thresholds for different subgroups. The individual fairness here is violated. As stated in Dwork et al. (2012), the idea is simple:

Treating similar individuals similarly.

Here we break that rule when we would set the cutoffs to different values. But it can be argued, that under the right circumstances it can be a good, valuable idea.

mf_before.plot([mf_after], type = “ceteris_paribus_cutoff”, subgroup = “Female”)

For each model, we check what would happen if we would change the cutoff for Female to different value. As we can see to minimize parity loss of all metrics the cutoff for this subgroup should be set to 0.39 in pytorch_before. However, in pytorch_after we don’t have to do anything because the value of optimal cutoff matches the default one which is 0.5.

Summary

The fairness module in dalex is a great way to visualize and detect bias. You can install dalex with:

pip install dalex -U

If you want to learn more about fairness I really recommend:

--

--

Jakub Wiśniewski
ResponsibleML

My name is Jakub Wiśniewski. I am data science student and research software engineer in MI2 DataLab