Know if your ML model is the Champion of Justice using “Aequitas.”
“The only two times we think about the Fairness of a Machine Learning model are either in an ML class or when someone files a lawsuit against our company”.
Is there even a problem?
As Data Scientists, we often fixate on the accuracy of our model or how much $ our company makes from it but forget how it might impact the people influenced by it.
It is common that the data we feed into our model is biased, but is our model furthering those societal biases? Are we doing anything to balance them out?
And trust me, it happens to the best of us. The following is a recent excerpt of outrage on Twitter regarding a possible gender bias in Apple Card’s Credit Limit model.
WAIT! This beckons the question that before we try to fix our models, do we even know if they are biased?
AEQUITAS to the rescue!
Aequitas is the Latin concept of Justice, Equality, or Fairness and is the origin of the English word “Equity”.¹
But the Aequitas I am about to introduce you to is the Bias and Fairness Audit toolkit developed by the Center for Data Science & Public Policy at the University of Chicago. Fun Fact, the Founder of the CDSPP, Rayid Ghani, is currently a distinguished professor of Machine Learning and Public Policy at CMU.
Aequitas is an open-source bias audit toolkit, available both as a Web app and a Python package. It is intended to be used by ML developers, analysts, and policymakers to audit the ML models for discrimination and bias. Thus assisting them in making more informed and equitable decisions.²
What I will be walking you through here is the extremely easy-to-use Webapp of Aequitas that generates “The Bias Report” by just uploading a CSV file. It is a great No-Code tool that allows any user to feed the predicted outputs of our classification models, do the steps as shown below, and Voila! in a few clicks, you instantly get a report that tells you if your model passes or fails the Bias/Fairness check!
A quick Revisit back to the Classroom!
Before we move on, let’s revisit a few terms we encounter in evaluating ML models and will see pop up here again!
Remember the Confusion Matrix? Yes, the one that confuses us every time we are asked to calculate a model metric during a Data Science interview.
From this, we need to recall three terms in the context of a Binary Classifier:
False Positive Rate:
False Negative Rate:
False Discovery Rate:
The denominator of FDR is all the Predicted Positives.
How to use Aequitas to get the Bias Report?
For the purpose of the demo, we will be using a Movie-lens like dataset that I am currently working on alongside a team to develop a production-grade recommendation system.
The sampled dataset has close to 2M ratings (Scale of 0–5) for movies given by users, the user info, and the movie information.
This dataset is used to emphasize the importance of Bias and Fairness Audits in low-stake scenarios, and later we will touch on the need of the hour to always do these audits for high-stake scenarios as well.
Aequitas Input CSV Requirements
The Aequitas Webapp requires a specific format of the input CSV to build the report. The input CSV needs to be of the form:
Here the label_value column is the True Labels for the input dataset, and the score column refers to the prediction made by our model based on the feature set of attributes_1…n.
To adhere to the input requirements of the tool, we build a Tree based Binary Classification model to predict if a movie with its Genre information would be rated High ( Binary 1 if ratings ≥ 4 ) or Low ( Binary 0 if < 4 ) using the User features of Age, Gender (Male/Female) and Occupation. The purpose of this post, however, is to audit the model; hence, we can treat the model as any other Black-box model we might encounter.
For more details on the data, data cleaning, feature engineering, model development, and tuning, feel free to go through the GitHub repo!
Upload CSV and Select Protected Groups
Upload the correctly formatted CSV on the Aequitas Upload page.
Once uploaded, we must choose the Protected Attributes for which we want the tool to evaluate the bias and the reference groups within them that the model should consider as the baseline. For our case, we use the following and click Next!:
Select the Fairness Metrics
Next, we select the Fairness Metrics that we would like to compute and the Disparity Intolerance %, i.e., the percentage value within which all of our subpopulations must be compared to the reference group. We select the following and then press Generate Fairness Report:
For more details on what the Fairness Metrics mean, you can refer to the Fairness Tree.
The Bias Report
Aequitas generates an easily accessible and well-formatted public report. Do not worry about the dataset, it gets deleted ( or at least that’s what they claim ) and only the report remains in perpetuity.
For our test, The entire Bias report can be viewed here. Let’s first see the summary of how our model did:
Audit Results Summary
Recall the terms we discussed earlier, they appear again. The summary appears almost at the top of the report, letting us know which Fairness Metrics our model passed and which it failed.
Audit Results: Details by Fairness Measures
Aequitas does a great job at giving us more granularity of what failed and by how much along with a detailed description of what the metric is and when it should matter to us!
Let’s look at a couple of examples of how Aequitas displays the results for when our model passes a test and when it fails:
Audit Results: Bias Metric Values
If we additionally want to know each value for our selected groups and corresponding Bias Metrics, Aequitas provides us with an easily interpretable table:
Audit Results: Group Metric Values
Another aspect of Aequitas that I love is that it simply calculates the Group Size Ratios for all our possible sub-populations. This can easily help us grasp what the population split is in our dataset.
Why I urge all fellow Data Scientists to use Aequitas!
I hope I was able to showcase why Aequitas is a tool that you need to include in your ML/DS pipeline using a simple Movie Rating Prediction scenario. If you are still not convinced, let me put forth a higher-stakes scenario of Predicting Recidivism rates. I believe all ( or at least most ) of us would agree that when predicting the probability of whether a convicted criminal will re-offend, we should consider features like criminal history, the severity of a crime, etc, and not do so on the basis of race. Thus, we would want a model that is free of Bias and is Fair across all Races. Even if our model is highly accurate, I hope that anyone building such a model in the future would run it quickly through Aequitas and with just a few clicks, be confident that their model is not biased, or if it is, revisits the modeling process.
Does Aequitas give us everything we need?
What if I was to tell you that the model we used to demonstrate above, was just a simple model that predicted a high rating for every entry in our test set? Even such a model maintains FP and FN Parity amongst all groups because it never discriminates. This is just an example to showcase the limitations of relying on a tool like Aequitas. Though a couple of our Fairness Metrics might pass, Aequitas deals with Bias and Fairness in a silo. It does not comment or hint at the accuracy and reliability of our model’s performance at all. Hence, one should use Aequitas as a Bias and Fairness Audit of their models while also evaluating its accuracy, striking a balance between them based on one’s own use case.
Additionally, another limitation is that the Webapp is constrained by the requirement of a specifically formatted CSV and the size of the dataset increases the time taken to generate the report. The Python package however does give you more freedom and Low-Code avenues of analysis as shown below:
fpr_disparity_fairness = aqp.plot_fairness_disparity(fdf,
That’s it for now folks!
May the Fairness be with your Model!
Special thanks to Professor Christian Kästner for introducing me to the topic of Bias and Fairness in Machine Learning, pushing me to get my hands dirty on a new tool, and sharing my experience with everyone!
Follow me on my socials below if you found this interesting and want more Data Science, Machine Learning or Sports Content!