Demystifying AI: Unraveling the mysteries of black box models — Part 2

Josephlyr
d*classified
Published in
13 min readJul 20, 2023

Joseph Low, Data Scientist, Enterprise Digital Services (EDS) Programme Centre dives into the application of two widely utilized model-agnostic interpretability methods, LIME and SHAP. He also discusses the challenges associated with the usage of these techniques, and explore how Tool-Ally can be leveraged to (1) address such challenges and (2) enhance the interpretability of black box models. This article is part 2 of a series on Explainable Artificial Intelligence (XAI). Part 1 introduces the concepts of XAI and provides a brief overview of the various model-agnostic XAI methods. This was developed as part of Tool-Ally (a Data Analytics Toolkit). Tool-Ally offers a set of customizable and reusable components to streamline and automate various aspects of the data analytics lifecycle, thereby supporting data scientists in performing common data science tasks more efficiently and overcoming challenges in the data science lifecycle. More details about Tool-Ally can be found here.

Source: MIT CSAIL

Explainability techniques walk-through

We begin by constructing a Random Forest classifier, which serves as the black box model to be explained. Next, we will explain the predictions of the Random Forest classifier using the model-agnostic interpretability methods LIME and SHAP. We then demonstrate how Tool-Ally further enhances the usage of LIME and SHAP to provide more reliable and comprehensive explanations.

Note: In the walk through, we chose the Random Forest classifier as the black box model as it requires less computational time when making predictions as compared to deep neural networks. In practice, any models would be applicable since the interpretability methods are model-agnostic.

Process of Training a Random Forest Model

First, we load in the IBM attrition dataset. The objective is to predict employee attrition (binary classification) given a set of numerical, categorical (nominal and ordinal) features.

Here, we provide a brief description of the nominal features within the dataset:

Nominal Features in the IBM Attrition Dataset

Next, we apply one-hot encoding on the nominal features and train a Random Forest classifier as the black box model whose predictions are to be explained.

categorical_transformer = Pipeline(steps=[
('encoder', OneHotEncoder(drop=None))
])

numerical_transformer = Pipeline(steps=[
('scaler', StandardScaler())
])

preprocessor = ColumnTransformer(transformers=[
('categorical', categorical_transformer, make_column_selector(dtype_include="category")),
('numerical', numerical_transformer, make_column_selector(dtype_exclude="category"))
])

X_train_processed = preprocessor.fit_transform(X_train)
X_test_processed = preprocessor.transform(X_test)

label_encoder = LabelEncoder()
y_train_processed = label_encoder.fit_transform(y_train)
y_test_processed = label_encoder.transform(y_test)

RF_clf = RandomForestClassifier(class_weight='balanced', random_state=42)
RF_clf.fit(X_train_processed, y_train_processed)

For an employee with the following features,

the Random Forest classifier predicts the employee’s attrition with a probability of 0.61. How can we explain why the model made such a prediction?

LIME (Local Interpretable Model-Agnostic Explanations)

Let’s now walk-through the steps involved in using the open-source library LIME to explain the predictions of the Random Forest classifier. It is worth noting that while the library is easy to use in general, using it directly can present some challenges, as we shall see below.

Using LIME to generate explanation

To instantiate a LIME explainer, we pass in the following arguments to LimeTabularExplainer:

  • X_train_processed (training data in which nominal features are one-hot encoded)
  • lime_feature_names corresponding to the columns of X_train_processed
  • lime_categorical_features corresponding to the indexes of the categorical features in X_train_processed
  • class names corresponding to the encoded labels
from lime.lime_tabular import LimeTabularExplainer

LIME_explainer = LimeTabularExplainer(
training_data=X_train_processed,
mode="classification",
feature_names=lime_feature_names,
categorical_features=lime_categorical_features,
class_names=['No', 'Yes'],
random_state=42
)

We then use the instantiated LIME explainer to generate an explanation for the Random Forest classifier’s prediction for the instance of interest.

LIME_explanation = LIME_explainer.explain_instance(
data_row=X_test_processed[idx],
predict_fn=RF_clf.predict_proba,
)

Finally, we visualize the (local) explanation.

LIME_explanation.show_in_notebook()
Figure 1.1: LIME’s results for the black box model’s prediction.

Voila! Here we see LIME’s explanation for the Random Forest classifier’s prediction for the instance. The left panel displays the predicted probability for each class, the center panel displays each feature’s contribution, and the right panel displays the features used in the explanation.

All good, right? Well, except that it is not.

Caveat on the direct usage of LIME when nominal features are one-hot encoded

Notice that in the center panel, we have the features OverTime_Yes and OverTime_No. In general, each category of the one-hot encoded nominal features will be present in the explanation.

This is because in instantiating LimeTabularExplainer, X_train_processed (nominal features are one-hot encoded) is passed in as an argument to the parameter training_data.

Why is this an issue?

To see why this is an issue, recall (from here) that LIME trains an interpretable model on the perturbed data around the instance.

From LIME’s documentation:

For categorical features, perturb by sampling according to the training distribution, and making a binary feature that is 1 when the value is the same as the instance being explained.

Thus, when X_train_processed (nominal features are one-hot encoded) is passed in as training data to LimeTabularExplainer, it is possible that the perturbation results in erroneous data points (e.g. OverTime_Yes=1, OverTime_No=1). Consequently, this results in incorrect explanations generated.

We illustrate below the perturbation of one-hot encoded nominal features results in erroneous data points. For readers who are interested in understanding how LIME perturbs data, refer to the source code of LimeTabularExplainer’s private method __data_inverse.

# size of neighborhood, default value = 5000
num_samples = 5000
neighborhood_whitebox, neighborhood_blackbox = LIME_explainer._LimeTabularExplainer__data_inverse(X_test_processed[idx], num_samples)

neighborhood_blackbox = pd.DataFrame(neighborhood_blackbox, columns=lime_feature_names)
neighborhood_blackbox[(neighborhood_blackbox['OverTime_Yes'] == 1) & (neighborhood_blackbox['OverTime_No'] == 1)][['OverTime_Yes', 'OverTime_No']]

It turns out that usage of LIME is not that straightforward when it comes to explaining black box models that require nominal features to be one-hot encode, due to the following challenges:

Challenge #1: Need to pass training data that are not one-hot encoded

Because LIME samples categorical features independently, it is not aware of the relationship between categories of a nominal feature (OverTime_Yes=1 and OverTime_No=1 cannot simultaneously exist). Thus, instead of passing X_train_processed when instantiating LimeTabularExplainer, one needs to pass the training data in which nominal features are not one-hot encoded.

Challenge #2: Need to pass training data in which categorical features are represented by integer values

However, the categorical_features parameter of LimeTabularExplainer requires values of categorical features to be integers. Refer to the documentation here.

Challenge #3: Incompatibility between LimeTabularExplainer’s training data format and data format of black box model trained with one-hot encoded nominal features

A quick solution to meet LimeTabularExplainer’s requirement of categorical features to be integer values is to apply sklearn’s OrdinalEncoder to the nominal features.

# create X_train_lime as a copy of X_train with the nominal features encoded ordinally
X_train_lime = X_train.copy(deep=True)

from sklearn.preprocessing import OrdinalEncoder
ordinal_encoder = OrdinalEncoder(dtype=int)
X_train_lime[nominal_features] = ordinal_encoder.fit_transform(X_train_lime[nominal_features])
X_train_lime.head()

However, this raises an issue — the black box model that was originally trained on the one-hot encoded nominal features (X_train_processed) would not be able to make predictions on the ordinal encoded nominal features (X_train_lime) due to a mismatch in feature dimension between training and prediction data.

RF_clf.predict_proba(X_train_lime)
ValueError: X has 30 features, but RandomForestClassifier is expecting 49 features as input.

Note: while one could train another Random Forest classifier on the ordinal encoded nominal features to resolve the incompatibility in training data format, doing so leads to another concern — the model will be trained on nominal features as though there is natural ordering.

The figure below summarizes the main challenges associated with the direct usage of LIME when explaining models that require nominal features to be one-hot encoded.

Figure 1.2: Challenges of using LIME’s LimeTabularExplainer.

As LimeTabularExplainer requires categorical features to be encoded, one can either perform one-hot encoding (approach 1) or ordinal encoding (approach 2) on the nominal features. One-hot encoding (approach 1) is not desirable because it results in perturbation of erroneous data points while ordinal encoding (approach 2) is not viable for models trained on one-hot encoded nominal features. Collectively, these challenges prevent the direct usage of LIME when explaining models trained on one-hot encoded nominal features.

How Tool-Ally addresses the challenges

This is where Tool-Ally comes into play. In addition to solving the above challenges, Tool-Ally also simplifies and enhances the usage of LIME by providing seamless integration capabilities. With Tool-Ally, you can leverage the strengths of LIME while streamlining the implementation process and seamlessly integrating it into your data analytics pipeline.

Figure 1.3: Illustration of Tool-Ally’s Solution.

Here’s how Tool-Ally solves the challenges above. Tool-Ally requires the practitioner to pass in the following arguments to Tool-Ally’s LIME_Tabular_Explainer:

  • training_data: X_train (training data before undergoing preprocessing)
  • preprocess_function: preprocessor.transform (function that preprocesses the input training data)
  • predict_function: RF_clf.predict_proba (function that takes as inputs a numpy array and outputs prediction probabilities)

Internally, given the above 3 inputs (training_data, preprocess_function, predict_function):

  1. Tool-Ally creates a LIME format training data (X_train_lime) by applying sklearn’s Ordinal Encoder to the nominal features of X_train.
  2. Tool-Ally also creates a LIME predict function that takes input data having the same format as X_train_lime. The LIME predict function performs the following:
    (a) Recover the original training data format (X_train) by inverse transforming the nominal features of the input data (X_train_lime).
    (b) Recover the model input data format (X_train_processed) by applying the preprocess function (one-hot encode nominal features) to the output from (a).
    (c) Use the black box model to make predictions on the output from (b).

Let’s take a look at how Tool-Ally is used to generate LIME explanation for the Random Forest classifier’s prediction for the instance. Using Tool-Ally to generate LIME explanation involves running only several lines of codes:

from toolally.explainable_ai import LIME_Tabular_Explainer

toolally_LIME_explainer = LIME_Tabular_Explainer(
training_data=X_train,
preprocess_function=preprocessor.transform,
predict_function=RF_clf.predict_proba,
mode="classification",
class_names=["No", "Yes"],
categorical_features=nominal_features + ordinal_features,
ordinal_features_mapping=ordinal_features_mapping,
random_state=42
)

toolally_LIME_explanation = toolally_LIME_explainer.explain(
instance=X_test.iloc[[idx]]
)

toolally_LIME_explainer.local_explanation(
explanation=datk_LIME_explanation,
verbose=True
)
Figure 1.4: Comparison of LIME’s result for the prediction of an instance with one-hot encoded nominal features. (top) direct usage of LIME vs (bottom) Tool-Ally.

Notice the difference in the center plot of the LIME explanation generated by Tool-Ally?

The nominal feature OverTime is represented by its original feature value (OverTime=Yes) instead of the one-hot encoded representation (OverTime_Yes=1, OverTime_No=0). Such a difference, although subtle, is crucial in ensuring the explanation generated is correct.

How should we interpret the values in LIME’s explanation?

This is a frequently asked question by users of LIME in which Tool-Ally aims to shed light on. Tool-Ally further enhances LIME’s explanation by providing customized interpretations of the results to offer a more comprehensive understanding of the explanation. This ensures the insights gained are communicated effectively with stakeholders.

Figure 1.5: Tool-Ally provides interpretations of LIME’s explanation results.

SHAP (SHapley Additive exPlanations)

Next, let’s use another well-known model-agnostic technique, SHAP, to explain the Random Forest classifier’s prediction. Similar to the previous section, we’ll walk through the steps involved in using the open-source library SHAP to interpret black box models. We’ll also demonstrate how Tool-Ally enhances the usage of SHAP and offers a seamless integration of explainability into data analytics pipeline.

Using SHAP to generate explanation

To instantiate a SHAP explainer, we pass in the following:

  • RF_clf.predict_proba (function that takes as inputs a numpy array and outputs prediction probabilities)
  • X_train_processed (training data in which nominal features are one-hot encoded)
  • preprocessor_feature_names corresponding to the columns of X_train_processed
  • class names corresponding to the encoded labels
SHAP_explainer = shap.Explainer(
model=RF_clf.predict_proba,
masker=X_train_processed,
feature_names=preprocessor_feature_names,
algorithm='auto',
output_names=['No', 'Yes'],
seed=42
)

We then use the instantiated SHAP explainer to generate explanations for the Random Forest classifier’s predictions for a set of instances.

SHAP_explanation = SHAP_explainer(X_test_processed)

Finally, we can visualize the global and local explanations using a variety of plots.

Caveat on the direct usage of SHAP when nominal features are one-hot encoded

Unlike LIME in which passing in one-hot encoded nominal features results in generating incorrect explanations, passing in one-hot encoded nominal features to SHAP does not yield erroneous explanations (in fact, the explanations are still theoretically sound).

Instead, the concerns are twofold:

Concern #1: Difficulty in interpreting the relationship between SHAP values of a nominal feature and each of its categories.

Figure 2.1: A separate scatter plot for each category of the nominal feature ‘Department’ makes it difficult to compare SHAP values collectively amongst the different categories (Human Resources, Research & Development, Sales).

The scatter plot shows the effect of a single feature on the model’s predictions. For each category of the nominal feature ‘Department’ (Human Resources, Research & Development, Sales), the scatter plot allows us to compare the SHAP values of employees who belong to that category, and employees who do not.

For example, in the above scatter plot of Department_Sales, employees who belong to the Sales department (Department_Sales=1) generally have positive SHAP values (more likely to attrite), while those who do not belong to the Sales department (Department_Sales=0) generally have negative SHAP values (less likely to attrite). The concern is that we are unable to compare the SHAP values collectively amongst the different categories (Human Resources vs. Research & Development vs. Sales). Such a comparison can be important for stakeholders to derive insights.

Concern #2: Seemingly lowered relative importance of nominal features with high cardinality

Figure 2.2: Waterfall plot showing the contribution of each category of nominal features instead of the aggregated contribution of nominal features may incorrectly imply a lowered relative importance of the nominal features.

The waterfall plot does not show the features ‘Department’ and ‘MaritalStatus’ as one of the top 10 features contributing towards the model’s prediction for the instance.

There is however, a possibility that these nominal features (‘Department’ and ‘MaritalStatus’) are within the top 10 features, but are not shown because each category of the nominal feature (e.g. Department_Human_Resources, Department_Research&Development, Department_Sales) is viewed as an individual feature. If the categories are viewed collectively as the original nominal feature (e.g. ‘Department’), the overall magnitude of the SHAP values may increase, potentially displacing some of the other features in terms of the relative importance.

Unfortunately, because each category of a nominal feature is viewed as an individual feature, it could result in stakeholders misinterpreting a nominal feature is seemingly less important in the contribution towards the model’s prediction for the instance.

These root cause of these concerns (#1 & #2) is because the training data (X_train_processed) passed into shap.Explainer comprises of one-hot encoded nominal features. Thus, each of the resulting binary one-hot encoded features would have their corresponding SHAP values.

In that case, how about passing in training data in which nominal features are not one-hot encoded (X_train) as a quick fix?

Unfortunately, this would not work since the model was trained on one-hot encoded nominal features.

How Tool-Ally addresses the challenges

In addition to offering enhanced integration capabilities, Tool-Ally also allows for a holistic interpretation of nominal features alongside SHAP explanations. With Tool-Ally, you can ensure that the interpretation of nominal features aligns with their actual importance in contributing to the model’s interpretations, mitigating any potential misinterpretation risks by stakeholders.

Here’s how Tool-Ally resolves the concerns above. Tool-Ally requires the practitioner to pass in the following arguments to Tool-Ally’s SHAP_Tabular_Explainer:

  • training_data: X_train (training data before undergoing preprocessing)
  • preprocess_function: preprocessor.transform (function that preprocesses the input training data)
  • predict_function: RF_clf.predict_proba (function that takes as inputs a numpy array and outputs prediction probabilities)

Internally, Tool-Ally utilizes the linearity (aka additivity) property of Shapley values as a guiding principle for recovering the SHAP value of the original nominal feature. Refer to the blog posts by Statistics Canada and C3.ai for more details on the linearity property of Shapley values.

Let’s take a look at how to use Tool-Ally to generate SHAP explanation for the Random Forest classifier’s predictions for a set of instances. Using Tool-Ally to generate explanations involves running only several lines of codes:

from toolally.explainable_ai import SHAP_Tabular_Explainer

toolally_SHAP_explainer = SHAP_Tabular_Explainer(
model=RF_clf.predict_proba,
training_data=X_train,
preprocess_function=preprocessor.transform,
features_ordered=features_ordered,
one_hot_encoded_features=nominal_features,
class_names=["No", "Yes"]
)

toolally_SHAP_explanation = toolally_SHAP_explainer.explain(data=X_test)

Tool-Ally offers several key benefits in terms of the explanation results generated:

Benefit #1: Holistic interpretation of nominal features

By recovering the SHAP value of the original nominal feature, Tool-Ally enables the scatter plot (amongst others) to facilitate easy comparison of SHAP values between different categories of the nominal features.

Figure 2.3: Comparison of SHAP values amongst different categories of a one-hot encoded nominal feature. (top) direct usage of SHAP vs (bottom) Tool-Ally.

The waterfall plot also correctly displays the relative importance of nominal features. In particular, ‘Department’ and ‘MaritalStatus’ now appear in the top 10 features contributing towards the model’s prediction for the instance.

Figure 2.4: Relative importance of one-hot encoded nominal features. (top) direct usage of SHAP vs (bottom) Tool-Ally.

Benefit #2: Interpretation of explanation plots

How should we interpret SHAP’s explanation plots? Apart from providing a holistic interpretation of the nominal features, Tool-Ally also enhances SHAP’s explanations by providing customized interpretations of the plots such as:

  1. Overview of the plot,
  2. Characteristics of the plot to examine,
  3. Interpretation of the plot

These guide stakeholders to derive and extract meaningful insights from the explanation plots. Below, we provide an example of Tool-Ally’s interpretation of the beeswarm plot.

Figure 2.5: Beeswarm plot.
Figure 2.6: Tool-Ally provides interpretations of SHAP’s explanation results.

Benefit #3: Consolidation of global and local explanations

Tool-Ally also offers the ease of consolidating all plots for global and local explanations with just one line of code. This enables users of the Tool-Ally to quickly generate a suite of comprehensive explanations for their model’s predictions.

toolally_SHAP_explainer.global_explanation(
explanation=toolally_SHAP_explanation,
class_names=['Yes'],
plot_summary=True,
plot_bar=True,
plot_beeswarm=True,
plot_scatter=True,
plot_heatmap=True,
verbose=True
)
Figure 2.7: Tool-Ally’s consolidation of SHAP’s global explanation plots.
toolally_SHAP_explainer.local_explanation(
explanation=toolally_SHAP_explanation,
idx=idx,
class_names=['Yes'],
plot_waterfall=True,
plot_decision=True,
verbose=True
)
Figure 2.8: Tool-Ally’s consolidation of SHAP’s local explanation plots.

Conclusion

In this article, we demonstrated how Tool-Ally enhances the usage of the open-source libraries LIME and SHAP to explain a machine learning model’s prediction on tabular data comprising nominal features.

In the next post, we will share about how these model-agnostic explainability techniques fit in the realm of Natural Language Processing (NLP). We will also dive into methods to evaluate the explanations generated by the different explainability techniques.

Stay tuned!

--

--