AI Explainability — Explained

Artificial Intelligence is often referred to as “the next big thing”, but it is important to understand the different parts of Artificial Intelligence. The current buzzword “Machine Learning” does not fully explain what AI really is. This blog post aims to explain the different parts of Artificial Intelligence in simple terms that everyone can understand.

Machine learning (ML) is powerful. Its models and their interpretability have been the subject of increasing attention over the last few years, as they have grown more powerful and widely used. With the right data, machine learning models can predict new data extremely well with little to no interpretability, but interpretability is important for many reasons.

Model interpretability allows us to address some of our most fundamental questions about the predictions that a model makes: What features did you learn? Why did you make this prediction? What are your assumptions? What do your results tell us about the world, and what conclusions can we draw from them? Why is this solution better than another?

Interpretable Machine Learning Models — “Black Box” Methods

How does interpretability work in practice, especially when it comes to deep neural networks (DNN)? There are several categories of interpretability methods, each with trade-offs. We begin by examining the first “black box” interpretability approach taken on any problem: visualizing the model’s decision boundaries!

Model-agnostic interpretability methods

The term “model-agnostic interpretability” refers to approaches that are not specific to a particular algorithm or network architecture. It is an umbrella under which several interpretability techniques fall, including e.g., Local Interpretable Model-Agnostic Explanations (LIME) and partial dependence plots (PDPs).

Problem Definition: How interpretable are our models?

1. Which methods are interpretable model-agnostic methods that can provide insights into a model regardless of which algorithm it uses?

2. How to interpret the results of interpretability evaluation methods (e.g., ROC curve)?

Model interpretability tools work by mining features from the training data associated with the model’s predictions. The trouble is that most problems do not have interpretable feature sets! For example, consider an image recognition problem where no interpretable features exist for labeling objects in images because there are too many classes to meaningfully describe each one. Model interpretability would only be possible by adding interpretable features, which is often quite expensive.

Different Methods of Interpretability

The most common methods for interpretability are local interpretable model-agnostic explanations (LIME) and partial dependence plots (PDPs). While both are model-agnostic techniques that work on any interpretable model, they differ in the types of insights they provide.

Method 1: Partial Dependence Plots (PDP)

The idea behind PDPs is relatively simple: given a data point, we would like to know what features of that datapoint (or what other data in the training set) most influence the model’s answer when only a subset is given. To interpret “most influence”, we have to define it mathematically. That leads us to partial derivatives! Partial derivatives allow us to measure how much a change in some input impacts a model parameter or output. In this way, PDPs provide insight about which variables in our data are most useful in making predictions — they simply show us where and by how much certain features matter.

Model interpretability tools work by mining features from the training data associated with the model’s predictions. The trouble is that most problems do not have interpretable feature sets! For example, consider an image recognition problem where no interpretable features exist for labeling objects in images because there are too many classes to meaningfully describe each one. Model interpretability would only be possible by adding interpretable features, which is often quite expensive.

Method 2: Individual Conditional Expectation (ICE)

ICE is similar to LIME in some ways: both interpret features around new inputs, and they both require interpretable versions of deep learning models. However, ICE allows us to interpret each individual output produced by a black-box model separately — not just one interpretation for the entire output. This means you might get different explanations for different outputs produced by your model!

Method 3: Permuted Feature Importance

The permuted feature importance approach randomly re-weights the function of every feature in an input vector. It then re-calculates the model loss on the re-weighted example, which is interpreted with permuted interpretability. This approach can be useful to separate interpretable and non-interpretable features.

Method 4: Global Surrogate

Global surrogate models (GS) use a local region around each input for interpretability — unlike LIME and ICE which interpret only 1 new input at a time. GS builds interpretable linear models for each output of a black-box model by optimizing their predictive accuracy over all locations in the training set to best explain predictions from the black box. The interpretability of these linear models can also be represented as a ROC curve, whose area under the curve (AUC) represents how well they perform! GS calculates an interpretable model for each output of a black-box model, so it provides interpretability for every prediction from the original model.

This is powerful because outputs with an interpretable linear model can be decomposed into linear combinations of features that explain them — just like you might have learned in your high school math class. In particular, interpretable models provide us with individual conditional expectations, but without needing access to code or weights. This means we can interpret and even optimize our black box models by adding new interpretable features without having to modify the black box itself.

GS requires less engineering effort than other approaches because we do not need interpretable versions of deep learning models; we only need the black-box version (e.g., xgboost). However, the interpretability of the resulting interpretable model is only as good as the black box itself.

Method 5: Local Interpretable Model-Agnostic Explanations (LIME)

LIME is an approach that lets you explain each prediction made by a black-box model on a new example, based on local regions around that example in the original feature space. The aim is to explain each prediction for a new input using only the labels and values of local regions of that input’s feature space. LIME requires access to interpretable versions of deep learning models, specifically, any model interpretability techniques already mentioned (Pacman, PDP). However, it can be used on top of any model trained with regularization.

Method 6: Shapley Value [SHAP]

SHAP (SHapley Additive exPlanations) values are similar to GS because they use interpretable linear models for each output. SHAP builds interpretable linear models based on decomposing deep learning outputs using feature interactions (i.e., how individual features interact with other individual features).

SHAP calculates an interpretable model for each output of a black-box model, so it provides interpretability for every prediction from the original model. This means we can also optimize our black box models by adding new interpretable features without having to modify the black box!

GS and SHAP both produce informative explanations but might be difficult to interpret at first. It is much easier to interpret individual conditional expectations because they are interpretable, but not as informative about different outputs of the model.

ICE (Individual Conditional Expectations) determines interpretability by determining how each feature contributes to each output of a black box model — without needing interpretable versions of deep learning models. ICE calculates interpretable models for each output of a black-box guide model, so it provides interpretability for every prediction from the original model. This means we can also optimize our black box models by adding new interpretable features without having to modify the black box! ICE provides partial dependence plots and local linear models for interpretability.

Global Surrogate Vs Shapley: Which one is better?

Shapley is interpretable, but GS has more power since it accounts for interactions between features. It also doesn’t require interpretable deep learning models as ICE does. Shapley can be limited by how many feature interactions exist — specifically, the number of possible interactions will quickly grow exponentially (i.e., 2^n). We can make this problem tractable by first building an interpretable model to give us reasonable initial weights for the global surrogate before shifting focus onto interpretability.

GS and SHAP both produce informative explanations but might be difficult to interpret at first because they are based on linear combinations (although GS at least has interpretable local linear models). It is much easier to interpret individual conditional expectations because they are interpretable, but not as informative about different outputs of the model.

Shapley value is a way to estimate how important a feature was in explaining a certain prediction made by an interpretable model, while also taking into account interactions between features. Like Global Surrogate, Shapley value has interpretable local linear models — which makes it more interpretable than ICE. LIME (Local Interpretable Model-Agnostic Explanations) uses small interpretable models to explain individual predictions of a black-box model by approximating how the output was generated.

This method can also be used to help interpret each element in an ensemble of interpretable models. Unlike GS and SHAP, LIME requires interpretable feature importance scores for all inputs so that interpretability is only computed on the original input space (whereas GS and SHAP both compute interpretation at test time, where inputs are given).

Interpretability Evaluation Methods

There are two broad methods of evaluating the interpretability, that are:

  • human annotation of interpretable models
  • interpretability prediction on a test set independent from model building data

The first option is best for interoperability second, but the is better at demonstrating overall power. This can be demonstrated by showing interpretability decreases as more data is given to training a black-box model.

However, interpretability isn’t always underfitting — sometimes it’s not powerful enough since humans often interpret things that are happening in the marginal distribution over just one feature at a time.

Wrapping Up

A critical part of utilizing machine learning for enterprise or consumer-facing applications is interpretability. The most interpretable models are those that build directly on well-founded, scalable forms of human inference.

Needless to say, interpretability has become a core consideration in the design and application of machine learning systems. And yet, even as interest in interpretability has grown exponentially, so too have concerns around what exactly interpretability means-and how it can be measured. This article aimed to provide guidance on some key considerations around interpretability based on various experimentation with different approaches.

Originally published at https://www.cliently.com.

AI Predictions and Sales Engagement tool for your Sales Teams