Explainable AI: Diverse Counterfactual Explanations (DiCE)

A blog series where each blog introduces practical tools for explaining machine learning models

6 min readMar 14, 2022

Before we dive into DiCE, let’s build an intuitive understanding of what counterfactuals are.

Counterfactuals is one of the most widely used tool in the world of explainable AI or XAI.

A counterfactual explanation describes a causal situation in the form: “If X had not occurred, Y would not have occurred” . For example: “If I hadn’t taken a sip of this hot coffee, I wouldn’t have burned my tongue”. Event Y is that I burned my tongue; cause X is that I had a hot coffee - Susanne Dandl & Christoph Molnar

Let’s break that down in the context of machine learning.

Say you are a data savvy real estate investor who is interested in finding attributes of apartments in a region that can maximize your rental yield. To do this, you built a model, which given a set of apartment attributes (features) will be able to predict the monthly rental return with reasonably high accuracy. In this hypothetical situation, let’s say you found a property, which has 3 bedrooms, a study, and an east-facing balcony with no air conditioning. When you put those features into your model, your monthly yield was $1,500, which is not ideal as you were expecting at least $2,000. This is where counterfactuals come into play, using counterfactual explanation you can generate new set of instances which will start altering the features with the objective of yielding a monthly rent of $2,000. Let’s say your counterfactual explanation tells you that an apartment that has 3 bedrooms, a study, a north-facing balcony with air-conditioning will yield a rental income of $2,000. Actionable insights like this are useful because now you, as an investor, can go out there a look for property that fulfills these criteria. The use of counterfactuals is not limited to just this use-case, and is widely applicable in a range of sectors.

Counterfactuals are also quite useful to identify biases in the model. Say for instance, in the scenario discussed above, we add ‘Sex’ as an additional feature. Assuming all the other feature remains the same, when you go from ‘M’ to ‘F’, if the predicted rental yield increases/decreases, it highlights the presence of gender bias in our model.

A drawback of counterfactuals is that they suffer from ‘Rashomon effect’. In other words, we could have multiple counterfactual explanations. For example in the case above, we could have a case where 5 bedroom, no study, no balcony with air-conditioning also yielding $2,000 rental income. But there are some practical tools that you can use to address this, which I mention in passing towards the end of this article.

Visual representation of how counterfactuals work in a model trained for classifying loan approval status. Orange circle represent a counterfactual instance. Source - github.com/interpretml/DiCE

Building on this high level understanding of counterfactuals, we can now proceed to its implementation. However, if you are interested in the math that sits behind it, you can find a summary of it here [1] and more detailed explanation in these papers [2,3].

Diverse Counterfactual Explanations (DiCE)

DiCE is a python library implemented by Mothilal et al [4] that can be used to generate counterfactual explanations.

Before we use DiCE, we need a model. To keep things simple, we’ll use the Titanic Dataset for our model. As always with these blog series, I will not be going into the details of how I am building the model as it is outside the scope. Hopefully, the comments are informative to follow the logic.

About the dataset

One of the most widely used beginner friendly data that can be used to predict which passengers survived the Titanic shipwreck. In total, we have 10 features in the dataset. In our model, we will only be using the following 7:

Pclass (categorical variable) - ticket class, which takes the value 1, 2 or 3
Sex (categorical variable)- Male or Female
Age (numerical variable) - continuous variable
SibSp (categorical variable) - number of siblings or spouses on board
Parch (categorical variable) - number of parents or children on board
Fare (numerical variable) - continuous variable
Survived (categorical variable) - not survived or survived, which takes the value 0 or 1

Model building

We have our baseline model, which will use in the following section where we use to calculate the counterfactuals.

Counterfactual Calculations

Initializing DiCE can be done in just 3 lines. You need a dataset, model and target label. You will need to specify the continuous features as they perturbed differently. There are also few other library specific explanation methods that you can use. Here is an example script of a DiCE initialization of our trained model.

Once we have our DiCE instance initialized, we can run queries to generate counterfactuals. Let’s look at our first set of counterfactuals. Here, we are requesting 5 counterfactual explanation for the first entry in our test dataset and we are interested in understanding the changes to feature values that result in classifying the chosen instance as if it belongs to the opposite class. Our counterfactuals are only going to display the features that are being changed because we left the option show_only_changes as True.

There you go, we are starting to see how changing few feature values and keeping everything else the same is affecting the value of ‘Survived’.

What if we are interested in only seeing the effect of perturbing just one feature, say ‘Age’. We can implement that like so.

Counterfactual explanations with just ‘Age’ being varied

We are starting to notice a trend with Age here. Survival rate is poor among older citizens, maybe? Its not conclusive as we are only looking at a single data point here. But you get the idea.

In our first counterfactual explanations, we can see that ‘Fare’ is all over the place. What if we want to constrain the ‘Fare’, meaning what if we only want to change the ‘Fare’ between 10 - 50. DiCE has a tool for that.

Counterfactual explanation with ‘Fare’ restricted between 10 - 50

As you can see, now we do not have ‘Fare’ values above 50. It may not be obvious here, but this can be quite handy. Say you are handling a housing price prediction dataset with number of bedrooms as one of the feature. In this instance, you do not want to generate counterfactual explanations that are unrealistic such a house with 100+ bedrooms. That example is an extreme case, but the point is, as a domain expert, you can use this functionality to constrain the counterfactual explanations to yield realistic solutions.

In the effort of introducing the library, in this article, I covered only few basic operations of DiCE. In addition to those mentioned above, you can also calculate local and global importance scores, play around with proximity and diversity weights of the features and many more. If you are interested in going deeper into the models, I would highly recommend you to checkout the documentation.

References

Molnar, C. Interpretable Machine Learning. Available from: https://christophm.github.io/interpretable-ml-book/counterfactual.html.
Wachter, S., Mittelstadt, B. & Russell, C. Counterfactual explanations without opening the black box: Automated decisions and the GDPR. Harv. JL & Tech. 31, 841 (2017).
Dandl, S., Molnar, C., Binder, M. & Bischl, B. in International Conference on Parallel Problem Solving from Nature. 448–469 (Springer).
Mothilal, R. K., Sharma, A. & Tan, C. in Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency. 607–617.