Crash Course in Causality: Diverse Counterfactual Explanations (DiCE)

Published in

AI Skunks

9 min readApr 27, 2023

By Swathi Sharma & Saketh Ram Gangam

Working of Counterfactuals in a model trained for classifying loan approval status. Orange circle represent a counterfactual instance. Source — github.com/interpretml/DiCE

With recent trends and fast adaption to the latest inventions in Generative AI & Deep Learning space, the need to make the predictive models more explainable became even more relevant. The perception about utilizing Neural Networks for sensitive decision making in industries such as healthcare is not always welcoming. It has to do more about the nature of Neural Networks and the fact that they are black-boxed and uninterpretable.

Counterfactuals help implement Explainability for Machine Learning models. The concept is widely adopted by industry and academia in the field of Explainable AI.

What is a counterfactual?

A counterfactual is a hypothetical statement that asserts that something would have happened if a different circumstance had been true. For example, “If I had studied harder, I would have gotten a better grade on the test.” Counterfactuals are often used in thought experiments and in discussions about hypothetical situations.

Diverse counterfactual explanations are a type of post-hoc explanation that can be used to understand and act on algorithmic predictions. They are hypothetical examples that show people how to obtain a different prediction, by changing just a few of the input features.

Diversity vs Proximity

Here is a more detailed explanation of each of these terms:

Proximity: This refers to how close the counterfactual explanation is to the original input. For example, if the original input is a sentence, a counterfactual explanation that is close to the original input might be a sentence that is very similar to the original sentence.
Diversity: This refers to the range of different counterfactual explanations that are generated. For example, if you generate 10 counterfactual explanations for a single input, the diversity of the explanations would be high if the explanations are all very different from each other.

The DiCE GitHub README provides more information on these topics, as well as how to use DiCE to generate counterfactual explanations.

Putting in to context:

Visualization by Mahantesh Pattadkal from the article “Codeless Counterfactuals for Deep Learning”

Let’s say that we have a machine learning model that predicts whether or not a patient will have a heart attack. The model is trained on a dataset of patients who have had heart attacks and patients who have not had heart attacks. The model is able to predict whether or not a patient will have a heart attack with 80% accuracy.

Let’s say that we have a patient who the model predicts will have a heart attack. We can use counterfactuals to understand why the model made that prediction. For example, we might find that the patient has high blood pressure, high cholesterol, and smokes cigarettes. This information can be used to help the patient improve their health and reduce their risk of having a heart attack.

Here are some specific counterfactuals that we could generate for this patient:

If the patient lowers their blood pressure, their risk of having a heart attack will decrease.
If the patient lowers their cholesterol, their risk of having a heart attack will decrease.
If the patient quits smoking, their risk of having a heart attack will decrease.

These counterfactuals can be used to help the patient make changes to their lifestyle that will reduce their risk of having a heart attack.

Applications of Counterfactual Explanations:

Counterfactual explanations can be useful in a variety of applications. Below are some of the applications:

Improving trust and transparency in machine learning systems. By providing users with an understanding of how a system made a decision, counterfactual explanations can help to build trust and transparency. This is especially important in applications where users may have a high stake in the outcome, such as loan approval or criminal sentencing.
Helping users make better decisions. Counterfactual explanations can help users to make better decisions by providing them with insights into how their choices can affect the outcome. For example, a user who is applying for a loan may be able to use counterfactual explanations to see how increasing their income or reducing their debt would impact their chances of approval.
Debugging machine learning systems. Counterfactual explanations can be used to debug machine learning systems by identifying features that are driving the system’s predictions. This can be helpful in identifying biases or errors in the system.

Different methods for generating Counterfactual Explanations:

Image from ServiceNow Research published in the article

There are a number of different methods for generating counterfactual explanations. Some of the most common methods include:

Feature perturbation: This method involves perturbing the input features of a data point until the model’s prediction changes.
Inverse optimization: This method involves optimizing a loss function to find a set of feature values that would change the model’s prediction.
Generative modeling: This method involves using a generative model to create new data points that are similar to the original data point but would have a different prediction.

The choice of method for generating counterfactual explanations depends on the specific application. For example, feature perturbation is a simple and efficient method, but it may not be able to generate counterfactuals that are feasible or diverse. Inverse optimization is a more powerful method, but it can be computationally expensive. Generative modeling is a promising new approach, but it is still under development.

Diverse counterfactual explanations can be a valuable tool for understanding and acting on algorithmic predictions. They can help to build trust and transparency, improve decision-making, and debug machine learning systems.

Worked Examples

The Titanic dataset is a dataset of passengers on the Titanic who survived or died in the disaster.

In our model, we will use following 7 features:

Pclass (categorical variable) — ticket class, which takes the value 1, 2 or 3
Sex (categorical variable)- Male or Female
Age (numerical variable) — continuous variable
SibSp (categorical variable) — number of siblings or spouses on board
Parch (categorical variable) — number of parents or children on board
Fare (numerical variable) — continuous variable
Survived (categorical variable) — not survived or survived, which takes the value 0 or 1

Model Building

We can use the Titanic dataset to train a machine learning model to predict whether a passenger survived or died. For example, we could train a logistic regression model to predict the probability of survival based on the passenger’s age, gender, and socioeconomic status.

df = pd.read_csv('titanic.csv') #reading data
df = df.drop(['PassengerId', 'Name', 'Ticket', 'Cabin'], axis=1) #dropping columns that are not useful for classifcation
df = df.dropna(axis=0) #dropping nan rows

le = preprocessing.LabelEncoder() #encoding the categorical variables into numericals
df['Sex'] = le.fit_transform(df['Sex']) #{'female': 0, 'male': 1}
df['Embarked'] = le.fit_transform(df['Embarked']) #{'C': 0, 'Q': 1, 'S': 2}

X = df.iloc[:, 1:7] #training features
y = df.iloc[:, 0] #label

train_dataset, test_dataset, y_train, y_test = train_test_split(df, y, test_size=0.2, random_state=42, stratify = y) #train test split
X_train = train_dataset.drop('Survived', axis=1)
X_test = test_dataset.drop('Survived', axis=1)

#model training
model = LogisticRegression(max_iter=500)
model.fit(X_train, y_train)

Once we have a model, we can use DiCE to generate counterfactual explanations for the model’s predictions. For example, we could use DiCE to generate counterfactual explanations for why a particular passenger survived or died.

Counterfactual Calculations

You need a dataset, model and target label. You will need to specify the continuous features as they perturbed differently. There are also few other library specific explanation methods that you can use. Here is an example script of a DiCE initialization of our trained model.

d = dice_ml.Data(dataframe=train_dataset, continuous_features=['Age', 'Fare'], 
                 outcome_name='Survived')
m = dice_ml.Model(model=model, backend="sklearn")
exp = dice_ml.Dice(d, m, method="random")

Once we have initialized a DiCE instance, we can run queries to generate counterfactuals. Let’s look at our first set of counterfactuals. Here, we are requesting 5 counterfactual explanations for the first entry in our test dataset. We are interested in understanding the changes to feature values that result in classifying the chosen instance as if it belongs to the opposite class. Our counterfactuals will only display the features that are being changed because we left the option show_only_changes as True.

e = exp.generate_counterfactuals(X_test[0:1], total_CFs=5, desired_class="opposite")
e.visualize_as_dataframe(show_only_changes=True)

We are starting to see how changing a few feature values and keeping everything else the same affects the value of ‘Survived’.

What if we are only interested in seeing the effect of perturbing one feature, say ‘Age’? We can implement that like so:

e = exp.generate_counterfactuals(X_test[0:1], total_CFs=5, desired_class="opposite", 
                                 features_to_vary=['Age'])
e.visualize_as_dataframe(show_only_changes=True)

Counterfactual explanations with just ‘Age’ being varied

We are starting to notice a trend with age here. The survival rate is poor among older citizens, maybe? It’s not conclusive as we are only looking at a single data point here. But you get the idea.

In our first counterfactual explanations, we can see that the fare is all over the place. What if we want to constrain the fare, meaning what if we only want to change the fare between 10 and 50? DiCE has a tool for that.

DiCE allows us to constrain the values of the features that we are changing. This is useful for a number of reasons. First, it can help us to generate more realistic counterfactual explanations. Second, it can help us to focus on the features that are most important for affecting the model’s predictions.

To constrain the values of the features that we are changing, we can use the features_range parameter. For example, to constrain the fare between 10 and 50, we would use the following code:

e = exp.generate_counterfactuals(X_test[0:1], total_CFs=5, desired_class="opposite", 
                                 permitted_range={'Fare': [10,50]})
e.visualize_as_dataframe(show_only_changes=True)

Counterfactual explanation with ‘Fare’ restricted between 10–50

Here are some other examples of when you might want to constrain the values of the features that you are changing:

You might want to constrain the values of the features to be within a certain range. For example, you might want to constrain the age of a person to be between 18 and 65.
You might want to constrain the values of the features to be realistic. For example, you might want to constrain the number of bedrooms in a house to be between 1 and 10.
You might want to constrain the values of the features to be consistent with other features. For example, you might want to constrain the age of a person to be consistent with their income.

By constraining the values of the features that you are changing, you can generate more realistic and informative counterfactual explanations.

Conclusion

Diverse counterfactual explanations are a powerful tool that can be used to improve the accuracy, fairness, and explainability of machine learning models. They can help people understand why a machine learning model made a particular prediction, identify features that are important for making accurate predictions, and generate new data points that can be used to train the model.

However, there are also some challenges associated with using diverse counterfactual explanations. They can be difficult to generate, they can be difficult to understand, and they can be biased.

Despite these challenges, diverse counterfactual explanations are a promising tool for improving the accuracy, fairness, and explainability of machine learning models. As research on diverse counterfactual explanations continues, we can expect to see even more powerful and effective techniques for generating these explanations.