DiCE -ML models with counterfactual explanations for the sunk Titanic-

Yuya Sugano
Analytics Vidhya
Published in
12 min readMar 11, 2020

Introduction

There have been some foreseen and ongoing trends in AI/ML for 2020 such as AutoML, MLOps, AI ethic those are to democratize AI/ML in the industries. AutoML can automate deploying and optimizing a ML model as an orchestration in AI/ML lifecycle with MLOps. It reflects the one of IT aspects that abstracts a certain area into unconscious layer of the system and people don’t need to care about that underneath layer anymore once it’s embedded in the system. For example, with the emergence of cloud services some people don’t need to care about network, server and storage to some extent. In the same way, with the spread of AI/ML in general, the word “AI” itself will be fading from the trend scene eventually. “AI”, “IoT”, “Blockchain” are current buzzwords apparently but the next decade these words would be not seen so frequentcy like we use and hear in daily life.

Another big trend is obviously “Explainable AI” (XAI) that referes to methods and techniques in the application of “AI” the results of the solution can be understood and interpreted by human perceptions. While recent progressive techniques are said to generate “black box” models such as deep learning (deep neural network), the relatively classical methods such as decision-tree, linear regression and some statistics ways are called “white-box” models, giving understandable reasons for influencing the result from given features in that model. Deep learning (deep neural network), boosting and random forest, highly non-linear regression models have not been so tranparent for humans, which means people can not understand easily why one result is obtained from given model in an interpretable way. [1]

Image by adriano7492 from Pixabay

“Explainable AI” (XAI) has been the trend and might be essential.

Why has it been focused and been thought crucial for AI/ML recently? There might be two aspects, one which falls under ethical reason when some rules applied by a model seem to produce unfair and undesiable results for our ethical and moral thoughts and the other one which falls under business reason that AI/ML should reveal why its regression or classifycation (or something as output) has resulted that particular answer. These have been becoming more essential when AI adoptions spread widely through business and get embedded in our society everywhere. Let’s take a look at those aspects in detail.

  • Ethical reason— Unconscious bias caused by AI/ML

We’re usually unconscious about how “Black-box” models give results to us and how those have algorithmic bias and unfairness. Cambridge Analytica Scandal and Amazon scrapping its secret AI recruiting tool that showed bias against women are the famous incidents we have to recall here. It’s worse if we didn’t notice those bias and unfairness that had brought harmful disparity and inequalities among certain group of people. The latter Amazon case, it’s found that the system rated high score for men candidates in the hiring process with not so gender-neutral way because of the reflection of high men dominance in the tech industry at that time. From this angle, any system can look skeptical if they have such biases because the systems could devise other ways of evaluating people or things just by collected data automatically in AI/ML lifecycle nowadays. The fairlearn is the project for evaluations to assess your system’s fairness and mitigate the observed unfairness in the development of AI/ML. [4]

  • Business reason—Unknown reasons caused by AI/ML

Let’s think about a retirement prediction case (or a churn case). We’ve got a bunch of data about our employee and need to predict who will leave the company to prevent your company from letting your talented people go. Let’s think we generated a model from data and the model revealed that one of you engineers might leave the company soon, but wait why? OK maybe he will leave but what we need to know here is not what he will leave but why he will leave the company. The model can predict who will likely leave the company from given data easily, however the model doesn’t tell you what measures we can take to prevent him from leaving the company at a glance. To circumvent the possible consequence we would want to ask the model “What if” he had pay-raise within 2 years or “What if” his overtime work was less than 30 hours monthly, etc, etc.

Harnessing AI/ML models including “black-box” that are generated in such as Deep learning(deep neural network), boosting and random forest was thought to reduce reliance on subjective opinions by humans but here’s a different distinct hurdle that we have to overcome when we adopt such sorts of AIs. Those explainable and interpretable reasons have been invisible in “black-box” models outputs. In recent research and study, it’s now doable to obtain what conditions would have flipped model prediction with counterfactual explanations like “What if” hypothetical examples that we wanted to cast a question for the same input. We’ll go through one of the DiCE (Diverse Counterfactual Explanations) implementation as following to pursue counterfactual explanations for the ML models.

DiCE (Diverse Counterfactual Explanations)

Microsoft Research Ramaravind Kommiya Mothilal, Amit Sharma, Chenhao Tan published their recent study “Explaining Machine Learning Classifiers through Diverse Counterfactual Example” and DiCE implementation in github this January 2020. What is DiCE first of all? DiCE is one of the counterfactual explanations implementation by their research. This implementation is based on their recent research that generates diverse counterfactual explanations for any ML model. [5]

DiCE implements counterfactual (CF) explanations that provide such information by showing feature-perturbed versions of the same case… In other words, it provides "what-if" explanations for model output and can be a useful complement to other explanation methods, both for end-users and model developers.

It’s delve into it further more. DiCE is useful and a nice-to-have library with the implementation of counterfactual (CF) explanations that provide those “What if” cases by showing feature-perturbed versions of the same cases that would have had a different prediction. This implementation also has come with feasibility of the counterfactual actions given user context and constraints, and diversity among the counterfactuals presented by tunable parameters to generate different kinds of explanations. There are some arguments we can give for feasibility and diversity in counterfactual explanations, which is nice.

  • proximity and diversity weight

proximity_weight (default: 0.5) and diversity_weight (default: 1.0) are changeable values when we generate counterfactual examples (we will cover sample usage later). It’s a littel vague how far from default the value (0.5 and 1.0 respectively) should be configured for proximity and diversity. One idea is just to put an iteration variable for those arguments and use a loop to generate different sets of various counterfactual explanations how it varies.

# change proximity_weight from default value of 0.5 to 1.5
dice_exp = exp.generate_counterfactuals(query_instance, total_CFs=4, desired_class="opposite", proximity_weight=1.5, diversity_weight=1.0)
  • feature_weight

feature_weight is a dictionary argument we can give for each numerical features to configure its difficulty to change the feature value for counterfactual explanations. By default, DiCE computes the inverse of MAD internally and divides the distance between continuous features by the MAD of the feature’s values in the training set.

# assigning new weights
feature_weights = {'age': 10, 'hours_per_week': 5}
# Now generating explanations using the new feature weights
dice_exp = exp.generate_counterfactuals(query_instance,
total_CFs=4, desired_class="opposite",
feature_weights=feature_weights)
  • features_to_vary list

Some of the generated explanations suggest changes in features that cannot be varied easily (such as age), or sensitive attributes like race or gender. Hence, DiCE allows feeding in a list of features that are allowed to vary through the features_to_vary parameter. This listing can give only realistic set of counterfactual explanations for actionable alternative profiles for the same case.

# assign varying features as a list
dice_exp = exp.generate_counterfactuals(query_instance,
total_CFs=4, desired_class="opposite",
features_to_vary=['age','workclass','education','occupation','hours_per_week'])

I built Anaconda3 container with some additional packages in the past article. Let’s wrap that container with DiCE and the additional required packages again for this purpose. Anaconda3 has a lot of benefits for data scientists to develop, test, and train a ML model without doing tedious installation works. It’s all-in-one package for people as it’s kept maintained and usable. You don’t need to get lost in the maze of of library dependencies as long as installed libraries satisfy your needs. [6]

Directly from the platform and without involving DevOps, data scientists can develop and deploy AI and machine learning models rapidly into production. Anaconda provides the tools needed to easily.

Docker container is a loosely isolated environment on host machine run directry within host kernel level. Docker enables us to deliver infrastructures in the same way as we manage applications. This context has brought CI/CD, DockerHub registry (like code repository), and versioning in infrastructure world. With container technology it became doable to manage and maintain our infrastructures like we do for our applications, which is amazing. Docker images can be shared among developers for re-use purpose and they don’t need to build environment with hands-on from scratch anymore.

  • Docker container is lightweight and potable because it doesn’t need any extra layer of hypervizor and can be reproducible as an Docker image
  • Docker platform provides a tool to manage containers and containers are good for CI/CD (Continuous integration and Continuous delivery).
  • Docker images are potable and can be pulled from repositories (DockerHub or cloud/private repositories) to reproduce the same
  • Docker images can be defined as text file that is called “Dockerfile”, this means this file also can be managed and maintained like a code
  • Docker images are customizable, it consists of layers based on other images or it can be created from scratch with a “Dockerfile”

Containers are disposable in the context of orchestration or CI/CD process. For instance, containers that are used for automated test and validation phase in test environment can be built and push to production environment in deployment pipeline. It must be valid because testing and validation pass with that test container’s environment. Consistent platform and isolation environment characteristics are affinity for application development to streamline the lifecycle of both infrastructures and applications with Docker platform. In this article we treat a Docker container as portable environment which is reproducible in any environment such as local PC, remote servers or client environements. As prerequisite, the code (Dockerfile) or the image should be stored in repository or registry beforehand.

Create DockerFile

The Dockerfile is based on the official Anaconda3 Dockerfile with bootstrapped DiCE installation. If you’re not interested in creating this Dockerfile you can run just these commands to build an image locally and generate a container with the example notebooks. Please note that the default port for Jupyter Notebook is set to 8888 in this image. [7]

$ git clone 
$ cd dice-test
$ docker build -t dice .
$ docker run -p 3000:8888 -v ${PWD}/notebooks:/opt/notebooks dice
DiCE container image based on ContinuumIO/docker-images

The image invokes docker-entrypoint.sh entrypoint shell script to run Jupyter Notebook in the snippet as below. You can change the specified port number with --port== option.

DiCE docker-entrypoint.sh

You should see the library dice-ml is installed succcessfully in console by running docker build -t dice . command.

Successfully built dice-ml
Installing collected packages: numpy, dice-ml
Found existing installation: numpy 1.16.4
Uninstalling numpy-1.16.4:
Successfully uninstalled numpy-1.16.4
Successfully installed dice-ml-0.2 numpy-1.16.0
Collecting package metadata (current_repodata.json): ...working... done
Solving environment: ...working... done

conda list in the container shows library versions below as of writing.

numpy                     1.16.0                   pypi_0    pypi
scikit-learn 0.21.2 py37hd81dba3_0
scikit-image 0.15.0 py37he6710b0_0
pandas 0.24.2 py37he6710b0_0
h5py 2.9.0 py37h7918eee_0
tensorboard 2.1.1 pypi_0 pypi
tensorflow 2.1.0 pypi_0 pypi
tensorflow-estimator 2.1.0 pypi_0 pypi

If you’d like to use these fixed versions for the required libraries, do not create an image yourself but pull the built image from DockerHub. [8]

Jupyter Notebook is now accessible locally or on remote server with the specified port. The three example notebooks must be available to confirm at least.

DiCE Notebook examples

How could Mr survive from the sinking of the Titanic?

The Titanic accident happened in 1912. The RMS Titanic sank after colliding with an iceberg in the North Atlantic Ocean, four days into the ship’s maiden voyage from Southampton to New York City. Unfortunately, there weren’t enough lifeboats for everyone onboard, resulting in the death of 1502 out of estimated 2224 passengers and crew. Now it’s interesting in applying DiCE for this dataset “Titanic” train set and verify what sorts of people more likely were to survive from the sunk ship. We use only train.csv of dataset because it has the column survived. [9]

You can sign up Kaggle to get train.csv data yourself.

As you might already know, the survival rate for title Mr is relatively low compared to other titles such as Mrs and Miss. So my question here is what conditions could save one Mr. with “what if” hypothetical examples you would be able to consider especially if you we were a men (for instance if you were the first row person Braud, Mr. Owen Harris who was 22 years-old man in pclass 3 with 1 sibling. See raw csv data in dataframe format below.

Titanic train.csv with survived column
Suvival rates for each title

Under a typical feature engineering we can cope with the missing values and added the column “family_size” from siblings and parents information. After training a model with Keras we had to give our trained model to DiCE model object as below. Now we’re ready to do DiCEing with data object d and model object m for instantiating a DiCE class for generating counterfactual explanations.

# provide the trained ML model to DiCE's model object
backend = 'TF'+tf.__version__[0]
m = dice_ml.Model(model=ann_model, backend=backend)

Please note that you need to give backend argument for the type of model library, TensorFlow 1.x, TensorFlow 2.x or PyTorch respectively. [10]

The variable backend below indicates the implementation type of DiCE we want to use. We use TensorFlow 1.x in the notebooks with backend=’TF1'. You can set backend to ‘TF2’ or ‘PYT’ to use DiCE with TensorFlow 2.x or with PyTorch respectively. We want to note that the time required to find counterfactuals with Tensorflow 2.x’s eager style of execution is significantly greater than that with TensorFlow 1.x’s graph execution.

It’s quite straightforward to initiate DiCE class and generate counterfactual samples. From this random result, the first and the second example remained male with the same age of the sample person “Braud, Mr. Owen Harris” himself. The third case was in paradox with the misleading of sex and title. It shows that if his title was Master or he had 8 siblings then he might have survived from sinking of the ship. Let’s not think changing the title or the number of siblings/parents for counterfactual set but look for more similar alternative profiles of him by using the features_to_vary argument now. Because the number of Master people’s sample was relatively low and having 8 siblings looked so extreme to consider those results as hypotheses.

Generated counterfactual set

I configured the proximity 1.5 from default value 0.5 and features_to_vary to only 4 features (pclass, age, fare, embarked). As a result counterfactual examples will have not changed sex, family thing and title. This optimization seems working fine to some extent. The third example shows that if he could pay more higher 14.01 (which is close to the pclass 2 median value) and the embarkation place was Cherbourg not from Southampton, he might have been saved. The rest three cases explains that a baby or child was potentially rescuee obviously under such circumstances.

Optimized counterfactual set

Here’s the notebook of this Titanic DiCE counterfactual set implementation.

https://github.com/yuyasugano/dice-test

--

--

Yuya Sugano
Analytics Vidhya

Cloud Architect and Blockchain Enthusiast, techflare.blog, Vinyl DJ, Backpacker. ブロックチェーン・クラウド(AWS/Azure)関連の記事をパブリッシュ。バックパッカーとしてユーラシア大陸を陸路横断するなど旅が趣味。