Interpretable Machine learning : Part II

Published in

Walmart Global Tech Blog

7 min readJan 15, 2019

Interpretable Machine learning : Part I talked about the need of explanations in machine learning domain and the white-box explanation approach towards this. This post covers the black-box and out-of-box explanation approaches for ML-interpretability.

Black-Box Explanations

Understanding complex machine learning models has become cardinal to their adoption in diverse and critical fields. The black-box approach to ML-interpretability aims to provide model-agnostic explanations, assuming the model internals to be hidden as a black-box.

Such an explanation system is flexible and can be applied to any model, existing models as well as future models. This saves the effort required to build model-specific (white-box) explanation systems whenever a new model is adopted.

2. This also allows easy comparison of behaviors of different models.

The explanations could be either instance-specific, explaining the prediction of a particular instance or global, highlighting the broad model behavior.

Partial dependence plots

One of the basic ways to understand a model globally is plotting partial dependence plots (PDP) for each feature. This helps to understand the marginal effect of a feature on model predictions. PDP is obtained by plotting the marginal average predictions for various values of the feature. But these plots do not capture the effect of feature interactions.

Explaining a survival-prediction model on Titanic data: PDP for features age & fare

Shapley value feature contribution

The Shapley value concept of coalition game theory can be used to calculate contributions of various features to a particular prediction. In case of a cooperative game, Shapley values give the fair distribution of the game payoff/reward among the players of the team.

To compute the contribution of feature ‘i’ in the prediction of instance ‘x’ comprising of total ’N’ features,
Let, i = 2; N = 5
S: an ordered permutation of the features = {4, 1, 2, 3, 5}

p(S_till_i_exc): prediction probability of x when only the features upto ‘i’ (excluding ‘i’) in the permutation are known, ie {4, 1}
p(S_till_i_inc): prediction probability when only the features till ‘i’(inclusive) in the permutation are known, ie {4, 1, 2}

Marginal contribution of knowing value of feature ‘i’ in case of this permutation S = p (S_till_i_inc) — p(S_till_i_exc)

The contribution of the feature ‘i’ to total prediction =
Avg of the marginal contributions of feature ‘i’ in all the possible N! feature permutations.

Another interesting way to understand a black-box model is by fitting an interpretable proxy model (like logistic regression or decision tree) on the predictions of the model (either locally or globally). While the main model provides the exact predictions, the proxy model provides the corresponding approximate explanations & helps understand the main model.

LIME

Based on interpretable proxy model approach, Local Interpretable Model-agnostic Explanations (LIME) approach:

Explains a model in terms of interpretable data representations, as opposed to raw instance data. For eg. Explaining image classification in terms of presence & absence of different image patches.

Explaining image classification for the top 3 classes predicted by Inception network. Source

Learns an interpretable model (like linear model) in the neighborhood of the instance. Such a proxy model is locally faithful to the prediction model.

For eg. in the following figure, the complex classification-prediction function gives positive output for pink-colored feature space, and negative output otherwise. To explain the instance marked by bold red cross, LIME learns a linear model represented by dashed line, over the predictions of the neighboring instances.

Another route towards explaining black-box models is highlighting the influential training data or the training points similar to the instance being explained.

Out-of-box Explanations

The out-of-box interpretability approach aims at developing inherently interpretable models. Most of these approaches are motivated by basic interpretable models — decision lists, decision trees, linear models, naïve bayes etc.
Linear models provide explanations in the form of monotonic relationships between features and prediction. Feature interactions can also be used as predictors to improve the model performance. Models like decision trees, decision lists etc. provide explanations in the form of feature subspaces mapped to prediction values.

Interpretable Decision Sets

Decision set is a rule-based classifier, comprising of collection of independent if-then rules. This, being free from any structure, has more natural and interpretable decision boundaries as compared to decision list or tree.

An interpretable decision set can be learned by finding an optimal set of rules from a candidate rule pool, with an aim to maximize both accuracy and interpretability. The candidate rules can be extracted from the training data using association rule mining. The objective function for this combinatorial optimization problem would be as follows:

f-interpretability comprises of different interpretability objectives: penalizing number of rules in decision set, penalizing the length of each rule, penalizing rule applicability overlaps over training data and rewarding number of classes covered by rules.
f-accuracy penalizes incorrect prediction and rewards correct class prediction over training data.
Since the objective function is non-negative non-monotone & submodular, smooth local search can be used to find the optimal set.

RuleFit

In order to combine the advantages of rules and linear models, a sparse linear combination of rules can be learned for predictions.

For instance, r1 = (age < 50 & gender = M) ; r2 = (BMI > 0.2)

The candidate rules can be generated by first learning an ensemble of trees on the data, and then extracting rules from nodes of these trees. The rule complexity can be limited by specifying the size of the trees.

A linear combination of these rules is learnt by a regularized linear regression on the training data, by optimizing the following cost function. The model complexity is penalised by the lasso-penalty term.

Global as well as the local behavior of the model can be interpreted using the rule coefficients.

Bayesian Program Learning

This framework aims to learn simple visual concepts (like, handwritten characters) from limited training data & generalize them successfully (eg. produce new drawings of a character, similar to human handwritten drawings). It can learn a concept from just a single training example, as opposed to the existing machine learning algorithms which require huge training data to give acceptable accuracy.

This makes use of
* compositionality of handwritten characters from simpler parts & sub-parts
* causal (procedural) structure of handwriting process
* priors developed based on training data, to ease learning of new characters

It represents character-concepts as simple probabilistic generative models, that can sample new characters by combining parts & subparts in new ways.

A generative model of handwritten characters. Source

The classification and generative tasks can be interpreted easily by visualizing the underlying generative programs.

Handwritten character classification. Source

With the growing use of machine learning in newer and higher-stake disciplines, interpretability has become a desirable aspect for models. This has driven attempts to unveil the existing complex models, and has also catalyzed development of new models optimising a joint goal of accuracy and interpretability. Some fields have already started using these novel interpretable models, usually within human-in-the-loop systems where humans validate the predictions before being used. Owing to the extensive ongoing research in ML-interpretability, most of the fields will start adopting these gradually.

References:

Partial Dependence Plots
Shapley value feature contribution: An Efficient Explanation of Individual Classifications using Game Theory
LIME: “ Why Should I Trust You?”: Explaining the Predictions of Any Classifier
Influential training instances: Understanding Black-box Predictions via Influence Functions
Interpretable decision sets: Interpretable Decision Sets: A Joint Framework for Description and Prediction
Two level decision set: Interpretable & Explorable Approximations of Black Box Models
RuleFit: Predictive learning via rule ensembles
Bayesian program learning: Human-level concept learning through probabilistic program induction