Explainable AI in Practice

Arun Maiya
6 min readAug 27, 2019

Deep neural networks are sometimes called “black boxes” in that it is not always clear how such models are using data to make a decision or prediction. Explainable AI (XAI) involves methods and techniques to help understand how an AI model reaches a particular conclusion.

SOURCE: DARPA XAI Program

Although Explainable AI is an open and actively researched problem, there are some existing methods that can be practically applied now. To demonstrate this, we will be using ktrain, a fastai-like interface to Keras that helps build and train Keras models with less time and coding. ktrain is open-source and available on GitHub here. To install ktrain, simply type the following:

pip3 install ktrain

In our previous article on ktrain, we saw that, for both image classification and text classification, we can invoke the ktrain.get_predictor function and obtain a Predictor object to easily make predictions on new raw data. For instance, with text data, one can make predictions from the raw, unprocessed text of a document as follows:

predictor = ktrain.get_predictor(learner.model, preproc=preproc)
predictor.predict(document_text)

In this notebook, we show how one can invoke the explain method of Predictor objects to help understand how those predictions were made. This is particularly useful in understanding misclassifications. We start with image classification.

Explaining Image Classification

Let us begin by using ktrain to train an image classifier for a single epoch on the publicly available Kaggle Dogs vs. Cats dataset, as we did in our previous article. We train by invoking the fit_onecycle method, which employs a 1cycle learning rate policy. The objective here is to predict whether each picture depicts a dog or a cat.

Our validation accuracy is 98.74% at the end of a single epoch:

(Note: Higher accuracy can be obtained with more epochs.)

The view_top_losses method in ktrain identifies those examples in the validation set that were misclassified and sorts them by validation loss such that examples at the beginning of the list are the most severely misclassified. The size of the list returned is controlled by the parameter n. Let us invoke view_top_losses to see the top most misclassified image by our model.

The top most misclassified image depicts both a dog and a cat but is labeled as belonging to only the cats category. Our classifier’s prediction for this image is “dogs” presumably because it is focusing mostly on the dog in this image. This can be verified by invoking the explain method.

The explain method displays the image and highlights the area on which the classifier focuses to make the prediction. As expected, our model is focusing on the dog in this picture when predicting the label dogs. In this case, Explainable AI methods were not critical to understanding why our classifier generated its prediction. However, such techniques can be helpful in other cases where the decision-making process is not as obvious.

Consider this image in the validation set, for example:

The image above (i.e., cat.92.jpg) is interesting in that there are no real dogs or cats in the picture. Nevertheless, the image has a ground truth label of cats. This is evidently due to the fact that there is a stuffed cat in the photo. However, there is also what appears to be a cartoon dog in the background.

Our classifier is understandably confused and predicts the category of dogs for this image:

But, why has our classifier predicted the label of “dogs” for this photo? Is it mistaking the stuffed cat for a dog or is it focusing on the cartoon dog in the background? The former explanation seems more likely, but we can find a definitive answer by creating a Predictor object and invoking the explain method.

It is clear from the visualization that our classifier is focusing on the stuffed cat and mistaking it for a dog. Moreover, it appears to be particularly focused on the ears of the stuffed animal, which some might consider to resemble the ears of certain dog breeds.

These visualizations are based on the Grad-CAM technique and is supported in ktrain via the eli5 library. See the eli5 documentation for more information.

Explaining Text Classification

We can also visualize which words text classifiers tend to focus on when making predictions. As before, let us use ktrain to quickly build a text classifier to classify [IMDb movie reviews] as positive or negative. We employ a simple FastText-like model copied from the Keras examples and train using autofit , which employs a triangular learning rate policy.

The validation accuracy after two epochs is 88.90% using this simple model. (Note: Higher accuracy can be obtained by other models in ktrain.)

As we did above, we invoke view_top_losses to view the single most misclassified review in the validation set.

The review is evidently about a film starring Mickey Rourke, but it is difficult to follow. This is partly because the text displayed is preprocessed such that punctuation and rare words are omitted (including “Killshot”, the actual title of the movie). Let’s use the ID of this document (i.e., id=8244 as shown above) to retrieve the original text and supply that as input to predictor.explain.

This visualization is generated using a technique called LIME and may be confusing at first. To generate the visualization, the input is randomly perturbed to examine how the prediction changes. This is used to infer the relative importance of different words to the final prediction using an interpretable linear model. The GREEN words are inferred as contributing to the incorrect classification. The RED (or PINK) words detract from our final prediction. (Shade of color denotes the strength or size of the coefficients in the inferred linear model.)

In looking at the GREEN words, we can see that the review is overall positive despite being assigned a ground truth label of negative. In fact, the last line recommends this movie as a rental. The review is also odd in that it spends a lot of time praising past performances of the actors in previous films with positive GREEN-highlighted terms like “wonderful”. The PINK words show us that there is not very much there in terms of negative words and phrases that would lead us to conclude that this is a negative review. (You can learn more about the output of LIME here.) In this case, we can forgive our simple model for misclassifying this particular review as positive, as many humans might classify it as positive, as well.

While there is clearly more work needed to advance the state of Explainable AI, existing method can still help to shed more light on your neural network models.

Article Source Code: The source code for this article is available in the form of the following Jupyter notebook: tutorial-A2-explaining-predictions.ipynb. Feel free to try it out on your own datasets.

More Information: For more information on ktrain, see the tutorial notebooks on ktrain and our previous TDS Medium publication:

ktrain: A Lightweight Wrapper for Keras to Help Train Neural Networks

REFERENCES

--

--