Measuring and Mitigating Bias: Introducing Holistic AI’s Open-Source Library
Artificial intelligence (AI) is increasingly present in our lives and becoming a fundamental part of many systems and applications. However, like any technology, it is important to ensure that AI-based solutions are trustworthy and fair. That’s where the Holistic AI library comes in.
The Holistic AI library is an open-source tool that contains metrics and mitigation strategies to make AI systems safer. Currently, the library offers a set of techniques to easily measure and mitigate Bias across numerous tasks and includes graphics to visualise the analysis. In the future, it will be extended to include tools for Efficacy, Robustness, Privacy and Explainability as well. This will allow a comprehensive and holistic assessment of AI systems.
The advantages of using the Holistic AI library include:
- Easy to use: The library is designed to be easy to use, even for those without technical knowledge of AI.
- Open-source: As an open-source library, Holistic AI is accessible to everyone and allows the community to contribute to its development and improvement.
- Improving the reliability of AI systems: By using the library, you can ensure that your AI systems are reliable and fair, which is especially important in critical applications.
- Holistic approach: The library allows for a comprehensive assessment of AI systems, including measures of bias, efficacy, robustness, privacy, and explainability.
In this blog post, we provide an overview of Holistic AI’s Bias analysis and mitigation framework, defining bias and how it can be mitigated before giving an overview of the bias metrics and mitigations available in the Holistic AI library.
What is Bias?
Bias in data can alter our perception of the world, leading to incomplete or inaccurate conclusions. It arises from consistent errors, such as inappropriate sampling or data collection tools, and personal beliefs that influence how we interpret results. To ensure fair and reliable data, it’s crucial to detect and address bias, particularly in decision-making or machine learning.
Why is it important to measure bias?
The advances in the use of artificial intelligence systems bring numerous possibilities and challenges to the field of bias evaluation. It is necessary for end users (governments, companies, consumers, etc.) to have confidence that the results generated by this type of technology will not reproduce the prejudices and discriminatory behaviours observed in society at large, since they can be transferred to the data. Through bias metrics, we can measure whether a data set is unbalanced for a particular race, gender, sexual orientation, religious, age, salary, etc.
To demonstrate a bit of what can be developed to measure bias, let’s do a case study with the UCL’s Adult dataset. This dataset is widely used in machine learning exercises and is suitable for applying bias metrics. The dataset has categorical features (Job role, education, marital-status, occupation, relationship, race, sex, and native-country) and integer features (age, years of study, capital gain, capital loss, and work hours per week). The prediction task is to determine whether a person makes over 50K a year (a classification task feature).
Two important pieces of information can be observed about this dataset. The first point is the imbalance observed between the number of men and women. The pie chart shows that 67% of observations in the dataset are men and only 33% are women, that is, one-third of the database contains information related to men. Thus, we have a clear visualisation of the participation of men and women in the dataset.
On the other hand, the comparison between the age distribution chart among people who earned more or less than 50K in the year shows that the average age of people who earned more than 50K is around 44, while the group of people who earned less than 50K has an average of 36 years old. In this sense, we can say that for the analysed dataset, people who earn more than 50K have, on average, a higher age than people who earn less than 50K. It is reasonable to imagine that older people have higher incomes associated with experience.
Figure 1: Percentage of men and women in the database and age distribution according to the classification
(>50K and <=50K).
It’s worth noting that creating this type of visualisation with the Holistic AI library is super simple. You just need to use the group_pie_plot and histogram_plot functions.
fig, axes = plt.subplots(nrows = 1, ncols =2, figsize=(15,5))
group_pie_plot(df['sex'], ax = axes[0])
histogram_plot(df['age'], df['class'], ax = axes[1])
plt.tight_layout()
In addition, we can analyse bias in the dataset in a simple and objective way. For example, we can generate results for five bias metrics (you can learn more about bias metrics in this Roadmaps for Risk Mitigation) with just three lines of code and thus measure whether the predictions made by a Machine Learning model have gender biases. In this case, we use the classification_bias_metrics function, but there are functions in the Holistic AI library for various other problems. In this case, the Four Fifths Rule metric lesser than 0.8 indicate a higher bias in favour of group_a.
group_a = np.array(X_test['sex']==1)
group_b = np.array(X_test['sex']==0)
classification_bias_metrics(group_a, group_b, y_pred, metric_type='equal_outcome')
What can we do to mitigate bias in AI?
Bias in AI can be addressed at different stages of the model life cycle, and the choice of mitigation strategy depends on factors such as data access and model parameters. The Holistic AI library offers three approaches for mitigating bias: pre-processing, in-processing, and post-processing. Pre-processing approaches transform the data before it is fed into the model, in-processing modifies the algorithm without changing the input data, and post-processing adjusts the outputs of the model.
Figure 2: A set of visualisation graphs generated using the Holistic AI library to depict mitigation bias. These graphs represent different aspects of bias in data.
These strategies help to improve the fairness, and trustworthiness of AI systems and can be applied to a variety of model types such as binary classification, multiclassification, regression, clustering, and recommender systems. For example, a pre-processing approach known as reweighing adjusts the importance of datapoints to mitigate bias. On the other hand, adversarial training can be used in-processing to adjust predictors associated with bias, and calibration can be used post-processing to ensure that positive outcomes are more evenly distributed across subgroups. An overview of the mitigation strategies in the Holistic AI library and the models they are suitable for can be seen in the table below.
For instance, for the Adult dataset, if we are at a stage where we have access to the training data set, we can employ reweighing pre-processing to guide the model training or we can conduct a more exploratory search into the importance of each example using in-processing Grid Search. And if retraining the model is not an option, post-processing techniques like Calibrated Equalized Odds can still be used to improve fairness. The best part? With the Holistic AI library, testing out these variants can be done with minimal lines of code — making it super simple to use.
This allows us to perform rapid analyses and even experiment with integrating pre- and post-processing strategies in the same pipeline. Then, we can compare all our results using the Holistic AI metric functions:
We can go deeper and try more strategies, testing better parameters settings and then visualise our results. Holistic AI has several visualisation methods to improve your analysis and bias mitigation results. For the Adult Dataset (a classification problem), using pre-processing reweighing, below are some of the visualisations you can create using the Holistic AI library.
# Pipeline Baseline
pipeline = Pipeline(steps=[
('scaler',StandardScaler()),
('model', LogisticRegression())
])
# Pipeline Pre-processing
pipeline = Pipeline(steps=[
('scaler',StandardScaler()),
('bm_preprocessing',Reweighing()),
('model', LogisticRegression()),
])
# Pipeline In-processing
pipeline = Pipeline(steps=[
('scaler',StandardScaler()),
('bm_inprocessing', model),
])
# Pipeline Post-processing
pipeline = Pipeline(steps=[
('scaler',StandardScaler()),
('model', LogisticRegression()),
('bm_postprocessing', CalibratedEqualizedOdds())
])
You can find the full tutorial for this here.
Check it out
Holistic AI’s library is a valuable tool for ensuring the reliability and fairness of AI systems. With its easy-to-use interface and graphics to analyse bias, the library offers a comprehensive approach to AI assessment. If you are interested in ensuring the quality of your AI-based solutions, you should look at the Holistic AI library.
Appendix
HAI Bias Metrics
There are several metrics that can be used to measure bias depending on the type of model being used. The Holistic AI library offers a range of metrics that are suitable for these different systems, as can be seen in the table below.
HAI Mitigation Strategies
There are several strategies that can be used to mitigate bias depending on the type of model being used:
Written by Cristian Muñoz, Machine Learning Researcher at Holistic AI, and Kleyton da Costa, Researcher at Holistic AI.
Originally published at https://www.holisticai.com.