Uncovering model bias using Explainable AI

Published in

Credera Engineering

7 min readSep 5, 2022

Authored by Tadhg Davey and Lewis P Battersby

Decision makers are increasingly demanding accountability and interpretability from the tools they use. How much trust we can have in machine learning models is now a crucial part of the building and using AI.

In this blog, we’ll look at a technical example of using three algorithms to detect bias and understand decisions made by our model.

The scenario

A bank uses machine learning to automate the process of a loan. The model is used to classify whether someone should be accepted or rejected for a loan based on a set of parameters like age, employment status, or marital status. They want to ensure that this model is making fair, non-biased decisions in order to build trust with their customers and comply with incoming EU regulations.

This is where Explainable AI (XAI) comes in. Using techniques like LIME, SHAP, and DiCE, we can generate human interpretable explanations of a model’s decision making process, as well as informing users of changes they can make to later be accepted for a loan.

These kind of explanations are called “local” explanations as they only explain a single event of classification — a rejected or accepted loan. “Global” explanations refer to explanations about the whole model. These will pick up trends or patterns that appear from the model as a whole.

The model and dataset

We will start with an open source dataset from Kaggle. This dataset contains 614 outcomes for loan acceptance/rejection based on a set of input parameters. To show how bias can be discovered using Explainable AI, we will tamper with the dataset by swapping the “has_credit_history” and “gender” columns to make our model extremely biased towards a certain gender.

The model will try to determine if a person should be accepted or rejected from a loan application with 70% accuracy.

This model was built using sklearn. If you’re interested in finding out how the model was built, more details can be found here.

LIME

LIME stands for Local Interpretable Model-Agnostic Explanations. It explains single predictions and can generate explanations, regardless of target model.

LIME informs its decision about which features are most important by generating synthetic data that deviates slightly from the original data running it through the model and observing the change in prediction. It does this a number of times to understand which features are impacting the change the most. For LIME to generate synthetic data correctly, we need it to know which of our features are categorical and which are continuous. Since the categorical columns are ordinally encoded, we have to tell LIME what the original names of the features are by passing the categorical_names parameter.

# Identify categorical columns so that we can reverse the ordinal encoding.categoricalCols = data.select_dtypes(['object']).columns
categorical_columns  = [X_train.columns.get_loc(column_name) for column_name in categoricalCols]
encoded_feature_names = {x:ordinal_encoder.categories_[i] for i,x in enumerate(categorical_columns)}from lime.lime_tabular import LimeTabularExplainer#create the explainer object
lime = LimeTabularExplainer(X_train.values,
categorical_features=categorical_columns,
categorical_names=encoded_feature_names,
feature_names=X_train.columns)#explain a specific instance
exp = lime.explain_instance(X_test.values[5], clf.predict_proba)#show result in notebook
exp.show_in_notebook()

What this shows us is feature attributions — i.e. how each feature affected the decision made. In our binary classification, 0 means a loan has been rejected, whilst 1 means it has been accepted. We can see that ‘Married=Yes’ made the model more likely to decide to accept the application and ‘Gender=Female’ made the model much more likely to decide to reject the loan (NOTE: This is only because we swapped this column with has_credit_history, but this could feasibly happen in a real situation). What this shows is that LIME has correctly highlighted our model as biased and we can see that from the diagram above.

DiCE

Next, we’re going to use DiCE (Diverse Counterfactual Explanations). This algorithm answers the question “What would I need to change in the input in order to change the outcome of the model?” This is perfect for when someone wants to know what to change in their circumstance in order to be accepted for their loan.

DiCE makes perturbations to the data that changes the outcome of the model and can use input parameters to decide the diversity (how different the counterfactuals are from one another) and proximity (how close the counterfactual is to the original input) of the counterfactuals generated. In addition, we can put constraints on features to ensure any counterfactuals generated are feasible. For example, this could mean putting a constraint on the Age feature so that it doesn’t change as this is not actionable.

Using these parameters, DiCE creates a loss function to find counterfactuals that lie within a certain threshold. More information on the DiCE works can be found here. https://arxiv.org/pdf/1905.07697.pdf

We will have to specify which features are continuous or categorical for DiCE to perturbate the data. Unlike LIME, DiCE is framework-dependent, meaning we have to specify that we have built our model with the sklearn framework (Tensorflow & Pytorch are also supported). Lastly, we specify the method used to generate the synthetic data. We’ve chosen the ‘random’ method, but there are alternative methods available which are suited to different use cases.

import dice_ml
from dice_ml.utils import helpersbackend = 'sklearn'#initialize the model object
m = dice_ml.Model(model=clf, backend=backend)#initialize the data object
d = dice_ml.Data(dataframe=data, continuous_features=list(numericalCols), 
outcome_name='Loan_Status')#initialize the interface
exp_random = dice_ml.Dice(d, m, method="random")#generate counterfactuals
dice_exp_random = exp_random.generate_counterfactuals(X_train[17:18], 
total_CFs=3, 
desired_class="opposite", 
verbose=False,
random_seed=123)#show as dataframe in notebook
dice_exp_random.visualize_as_dataframe(show_only_changes=True)

The datapoint above shows someone who was accepted for a loan. The first line is the attributes of the person, and the three lines below are what would have to change for the decision of the model to change. For example the highlighted line shows us that if the gender was 0 rather than 1 (Female rather than Male), then the outcome of the loan application would be rejected rather than accepted.

Let’s take a look at a different example — someone who was predicted to be rejected for a loan.

dice_exp_random = exp_random.generate_counterfactuals(X_train[19:20],
total_CFs=3,
desired_class="opposite",
verbose=False,
random_seed=123)
dice_exp_random.visualize_as_dataframe(show_only_changes=True)

This datapoint shows a woman who was rejected for a loan. On lines 1 & 3, we see that if the gender was switched and loan term was increased, then the loan application would be accepted!

SHAP

So far, we’ve looked at local explanations - i.e. why a specific decision was made. We can also look at this on a global level, and ask questions such as: “What is the most important feature for the whole dataset?” To demonstrate this, we will use the SHAP algorithm.

The SHAP algorithm has its roots in game theory. It uses Shapley values to calculate the importance of each feature in the model. In game theory, Shapley is used to calculate which players are most impactful on a game. In our case, SHAP is going to calculate the most important features (analogous to the players) of the model (analogous to the game). It works by creating synthetic data to use as input for the model and observes the change in model output in order to inform its decisions. In a similar way to LIME, we just need to pass it some real data that it can use to generate synthetic data, and the prediction method of the model.

import shap
explainer = shap.Explainer(clf.predict, X_test)
shap_values = explainer(X_test)
shap.plots.beeswarm(shap_values)

In this graph, the colour denotes the feature value. For example, a loan amount of $100,000 might be bright red, whereas a value of dark blue might be $100. Each row of the graph represents a different feature and the x-axis shows the importance of the feature for each of our datapoints. The feature at the bottom is least important in general, and the feature at the top is most important in general. Most of these seem to match our intuition — one would expect the loan outcome to be strongly impacted by the amount of money being borrowed, and Education probably not so important. But we can see that “gender” is the most important feature in our model — even more important than Loan amount, which is not a good sign!

Conclusion

Although our dataset has been changed, you can easily see how these two techniques could highlight biases in a situation where you do have a biased model. With LIME, we saw how “gender” was contributing the most to our models decisions and how in the case of our counterfactuals, it wanted us to change “gender” in order to be accepted for a loan. This only goes to highlight that data or how we train our model needs some work!

Interested in joining us?

Credera is currently hiring! View our open positions and apply here.

Got a question?

Please get in touch to speak to a member of our team.