Interpretable ML

Making Machine Learning Models Interpretable

Methods to interpret any Machine Learning model

Benedict Neo

Published in

bitgrit Data Science Publication

11 min readAug 30, 2021

The real world is complex, with a flood of variables interacting with each other. This is why most machine learning models applied in the real world are made up of complex architectures compared to a simple model like linear regression.

Most of the time, more complex models depending on the data have higher predictive power and are more accurate than simple models. With that, deep learning models have been widely used in many applications due to their accuracy — self-driving cars, facial recognition, stock price prediction, etc.

However, the cost of having higher accuracy is lower interpretability. This means we don’t know why the model generated certain results.

Below is a graph that portrays the trade-off between interpretability and accuracy.

Accuracy and Interpretability trade-off (source)

You see neural networks way on top of the accuracy axis but have the least interpretability.

This is why many call neural networks black-box models, which means we know the inputs, and can see the outputs, but everything in between isn’t interpretable.

The dangers of black-box models

Machine learning models are increasingly ubiquitous, and them being black boxes poses some risks.

In some cases, where the model is applied in a low-risk environment, such as recommending movies to users, this isn’t a big issue since the cost of an erroneous prediction is just giving a bad prediction.

However, when applied in many high-risk environments (whether a person has cancer, should get a loan, is or isn’t a potential criminal, or even driving cars), making a mistake in these areas can have huge consequences.

If an individual or an organization cannot explain the model’s conclusion, people won’t trust that system due to the lack of transparency.

If a machine learning model performs well, why do not we just trust the model and ignore why it made a certain decision? “The problem is that a single metric, such as classification accuracy, is an incomplete description of most real-world tasks.” — Doshi-Velez and Kim 2017

Another danger of the black-box model is that if it picked up bias or prejudices from the humans that designed the model or the data, it would lead to unfair and prejudiced decisions.

One quintessential example of this is the issue of racial discrimination in face recognition technology.

The main idea is, the model should not only tell us the what but also the why.

This engendered a need for interpretability.

Interpretable machine learning

What is interpretability

The paper “Towards A Rigorous Science of Interpretable Machine Learning” defines it as:

the ability to explain or to present in understandable terms to a human

Why interpretability matters

Fairness — An interpretability model can tell you why a decision was made, stating which factors were critical in making that decision, ultimately ensuring unbiased predictions.
Trust — Compared to black-box models, humans can trust a transparent model that explains its decision easier.
Decision making — When insight is more important than a prediction, having interpretable models helps inform decision-making.
Debugging — Peeking into the inner workings and reasoning of the black box model allows engineers to identify whether the model is aligned with their expectations. If there exist any inconsistencies, they can fix them accordingly.

Now that you have an idea of why interpretability matters, how do we actually achieve it?

Two main ways to achieve interpretability

Using intrinsically interpretable models such as linear regression, GLMs, decision trees, etc.
Using model-agnostic methods are interpreting methods that can be applied to any model that has been trained, aka post hoc.

In this article, we’ll take a brief look at a couple of model-agnostic methods that allow us to interpret machine learning models and an example of their visualizations.

Before we dive in, if you want updates on our AI competitions, workshops, and latest articles, AI news, and the best resources to learn DS and ML, be sure to subscribe to our newsletter! [Check out our last issue]

1. Partial Dependence Plots (PDP)

PDPs are plots that show the marginal effect of one or two features on a model's response variable(outcome).

That means taking one or two features in your dataset while controlling for other features and measure how they affect the outcome.

PDP is handy when determining whether a relationship between a specific feature and a target is linear or complex.

Recipe

Start with a feature
For every row in the dataset, change the value of the selected feature and make a prediction.
Take the average prediction of different values of the feature, plot those predictions.

For categorical features, things are a little different. Let’s say you have categories A, B, and C; partial dependence would mean substituting all values within the feature with A, calculate the average prediction, and do the same with B and C. The result would be three average predictions.

Let’s make this concrete with an example.

Example:

Here we have a dataset of the number of bikes rentals on a given day. The plot above shows three weather features and how it affects the predicted number of bikes and gives us useful information.

There’s a huge increase in bike rentals as temperature approaches around 15 degrees Celsius and then plateaus as the weather gets hotter. This tells us that bike rentals are higher on average for good weather (not too warm and not too cold).

We also understand that as humidity exceeds around 62.5% and as wind speed increases, the average predicted number of bike rentals falls.

An interesting observation is that the predicted number stays constant at a certain point, even though the idea is that the higher the wind speed, the lower the number of bike rentals.

We’ve only touched partial dependence on a single feature. What happens when we use two features?

Here’s a different dataset containing two features — weight and height. For a two-dimensional PDP, it gives us information about how two features (height and weight in this case) will interact to affect predictions.

An alternative to PDP is Accumulated Local Effects (ALE) plots, which are faster and unbiased. Read more about them here.

Code

https://towardsdatascience.com/how-to-peek-inside-a-black-box-model-understand-partial-dependence-plots-17d1b673aafc

2. Individual conditional expectation (ICE) plot

ICE is similar to PDP, but what separates the two is ICE plots will show the average effects of the features of interest, along with visualizing the dependence of the prediction on a feature for each sample separately.

That means the plot will contain every instance of prediction change when a feature changes instead of the overall effect in PDP.

Another difference is for ICE plots, only one feature is supported.

What is the benefit of the ICE plot, you ask? It can help us catch an important relationship called the heterogeneous relationship created by interactions. That basically means we can tell whether the individual instances differ from the overall instance (average).

This relationship is obscured by PDP since it’s taking an average. Thus, ICE plots will provide us more insights into the relationship between the feature and the target.

Example

Above is an example of ICE plots. The thick blue line shows the average (PDP), and the other thinner lines are the individual instances.

To showcase why ICE plots may be useful, let’s take a look at the Average Occupant (AveOccup) plot. We see a negative linear relationship between this feature and the house price. However, looking at some of the ICE curves, the effect of the relationship is smaller, and some lines even show the house price staying constant.

Another more obvious example is the HouseAge plot. The average PDP is showing house age having a weak influence on house price. However, the ICE curves are showing some positive relationships around the 20 age mark.

Thus, ICE plots show us individual effects of different values, and taking the average will diminish those effects.

Code

https://scikit-learn.org/stable/auto_examples/inspection/plot_partial_dependence.html

3. Permutation Feature Importance

Permutation feature importance measures the increase in the model prediction error by shuffling a feature’s values and doing so for each feature.

The word permutation means rearrangement, which in our case applies to the rearrangement of a column’s values.

Recipe

Shuffle values in a specific column, and make predictions.
Use the predictions to calculate the evaluation metric. Any drop in performance signifies the importance of the variable and vice versa.
Move on to the subsequent columns and repeat steps 1 and 2. Do this until you have the importance of every column.

Example:

Above is a sample output from performing permutation feature importance on a football player dataset. The goal is to predict the best features that determine the quality of a player.

Weights here tell us the drop in accuracy from random shuffling. This means the feature is ordered from most important from the top to the last in terms of permutation feature importance.

You might be wondering why there’s the ± sign for each of them. This is because the shuffling process for each column is repeated to produce a more accurate and unbiased estimate of the weight. Since each shuffle won’t produce the exact value, what that is showing is the variations between one shuffle to next.

That said, for this dataset, the feature Reaction was the most important feature, which makes sense in a fast-paced sport like football.

Code

https://towardsdatascience.com/how-to-find-feature-importances-for-blackbox-models-c418b694659d

4. Local interpretable model-agnostic explanations (LIME)

LIME is a concept that trains local surrogate models around the predictions of a black-box model.

What are local surrogate models? They are interpretable models, i.e., linear regression or decision trees, that can explain the predictions of a black-box model.

This means you can use models like decision trees as explanations without actually having to use them to make predictions.

Compared to the previous methods, which are global interpretations, LIME provides local interpretability for a single prediction. This means you zoom in on a single instance, examine what the model predicts for this particular input, and explain why.

Recipe (source)

Select your instance of interest for which you want to have an explanation of its black box prediction.
Perturb your dataset and get the black box predictions for these new points.
Weight the new samples according to their proximity to the instance of interest.
Train a weighted, interpretable model on the dataset with the variations.
Explain the prediction by interpreting the local model.

Example

The dataset used is a mushroom dataset, and the task is to predict if a mushroom is edible or poisonous based on categorical features.

For a particular row in the dataset, performing LIME will give us something like the above visualization.

What it tells us is that the model is 100% confident that the mushroom is poisonous and the features — odor, stalk-surface-above-ring, spore-print-color, and stalk-surface-below-ring and its particular values — increases the chances that the mushroom being classified as poisonous, and the feature gill-size is the only one decreasing it.

This was LIME for tabular data. LIME also works for text and images by turning them into vectors.

Read the paper that introduces the concept of LIME — “Why Should I Trust You?” Explaining the Predictions of Any Classifier by M. T. Ribeiro, S. Singh, and C. Guestrin, SIGKDD 2016.

Code

https://towardsdatascience.com/lime-how-to-interpret-machine-learning-models-with-python-94b0e7e4432e

5. Shapley Additive Explanation (SHAP)

SHAP and LIME are both surrogate models, but the difference lies in how they work.

SHAP is a game-theoretic approach that uses Shapley values, which measure each feature's contribution to the model.

How it works

To understand how SHAP works, you should definitely read up about Shapley values first.

But the main idea is to get the marginal contribution of a single feature in all possible ordering and then average them. In other words, we’re calculating the contribution that each feature has towards the prediction based on Shapley values.

Shapley values are computed from coalitional game theory, and if you remember game theory, has players and a payout. When applied to ML, the players are the feature or groups of features, and the payout is evenly distributed among features through Shapley values.

Notice the word additive in SHAP; this comes from the property of Shapley values, where they always sum up to the difference between the game outcome when all players are present and when no players are present. In ML context, this means the SHAP values of all input features will always sum up to the difference between baseline (expected model output) and the actual model output.

There are more technical details to it, but let’s move on to an example, so you get an idea of what it does.

Example

The above shows a sample visualization with SHAP.

f(x) is our prediction for a specific observation, and E[f(x)] is our base value, which is the average of all predictions over the training data.

Each feature value is either blue or red in this case and is negative and positive, respectively. The negative and positive values are forces that either increases or decreases the prediction, and their size is the magnitude of the effect.

What the features are all doing is pushing the model output from the baseline towards the model output.

Looking at the values, it’s evident the feature LSTAT with a value of 4.98 has the biggest impact. Since it has a positive value, this means it’s higher than the average value of LSTAT feature in the training data.

On the other hand, the feature RM has a negative effect on the prediction, pushing it towards the left.

SHAP isn’t limited to explanations on an individual level. It can also combine the explanations into global interpretations to provide feature dependence, feature importance, summary plots, etc.

Code

https://towardsdatascience.com/shap-how-to-interpret-machine-learning-models-with-python-2323f5af4be9

You’ve now unlocked the secret to opening up the black box that is the complex machine learning models that now power many operations throughout the world.

This article barely scratched the surface of these methods. If you want a technical dive into how these methods work behind the scenes, I highly recommend reading Chapter 5 of Interpretable Machine Learning.

Thanks for reading!

References

Doshi-Velez, Finale, and Been Kim. “Towards a rigorous science of interpretable machine learning,” no. Ml: 1–13. http://arxiv.org/abs/1702.08608 ( 2017)
Interpretable Machine Learning by Christoph Molnar

Liked this article? Here are some articles you may enjoy 👇

Follow the bitgrit Data Science Publication for more!

Find us on these platforms 👇📱

Interpretable ML

Making Machine Learning Models Interpretable

Methods to interpret any Machine Learning model

The dangers of black-box models

Interpretable machine learning

What is interpretability

Why interpretability matters

Two main ways to achieve interpretability

1. Partial Dependence Plots (PDP)

Recipe

Example:

Code

2. Individual conditional expectation (ICE) plot

Example

Code

3. Permutation Feature Importance

Recipe

Example:

Code

4. Local interpretable model-agnostic explanations (LIME)

Recipe (source)

Example

Code

5. Shapley Additive Explanation (SHAP)

How it works

Example

Code

Thanks for reading!

Further readings / insights

References

Liked this article? Here are some articles you may enjoy 👇

Written by Benedict Neo