Explainable Artificial Intelligence

Interpreting Deep Learning Models for Computer Vision

Interpreting Convolutional Neural Network Models built with TensorFlow

Dipanjan (DJ) Sarkar
Aug 15 · 10 min read
What does a deep learning model really see?


Artificial Intelligence (AI) is no longer a field restricted only to research papers and academia. Businesses and organizations across diverse domains in the industry are building large-scale applications powered by AI. The questions to think about here would be, “Do we trust decisions made by AI models?” and “How does a machine learning or deep learning model make its decisions?”. Interpreting machine learning or deep learning models has always been a task often overlooked in the entire data science lifecycle since data scientists or machine learning engineers would be more involved with actually pushing things out to production or getting a model up and running.

Convolutional Neural Networks

The most popular deep learning models leveraged for computer vision problems are convolutional neural networks (CNNs)!

Source: becominghuman.ai

Interpreting CNN Models — What does a deep learning model really see?

Here’s the interesting part, can we really unbox the opacity presented to us by a seemingly black-box CNN model and try and understand what’s really going on under the hood and what does the model really see when it looks at an image? There are a wide variety of techniques and tools for interpreting decisions made by vision-based deep learning models. Some of the major techniques covered in this article are depicted as follows.

SHAP Gradient Explainer

This technique tries to combine a multitude of ideas from Integrated Gradients, SHapley Additive exPlanations (SHAP) and SmoothGrad. This technique tries to explain model decisions using expected gradients (an extension of integrated gradients). This is a feature attribution method designed for differentiable models based on an extension of Shapley values to infinite player games. We will use the shap framework here for this technique.

[['n02999410', 'chain'], 
['n01622779', 'great_grey_owl'],
['n03180011', 'desktop_computer'],
['n02124075', 'Egyptian_cat']]

Interpreting CNN Models built with TensorFlow 2.0

For the remaining four techniques, we will leverage a pre-trained model using TensorFlow 2.0 and use the popular open-source framework tf-explain. The idea here is to look at different model intepretation techniques for CNNs.

Load Pre-trained CNN Model

Let’s load one of the most complex pre-trained CNN models out there, the Xception model which claims to be slightly better than the Inception V3 model. Let’s start by loading the necessary dependencies and our pre-trained model.

Model Predictions on Sample Image

We will reuse the sample image of my cat and make the top-5 predictions with our Xception model. Let’s load our image first before making predictions.

[[('n02124075', 'Egyptian_cat', 0.80723596),
('n02123159', 'tiger_cat', 0.09508163),
('n02123045', 'tabby', 0.042587988),
('n02127052', 'lynx', 0.00547999),
('n02971356', 'carton', 0.0014547487)]]

Activation Layer Visualizations

This technique is typically used to visualize how a given input comes out of specific activation layers. The key idea is to explore which feature maps are getting activated in the model and visualize them. Usually this is done by looking at each specific layer. The following code showcases activation layer visualizations for one of the layers in Block 2 of the CNN model.

Occlusion Sensitivity

The idea of interpretation using occlusion sensitivity is quite intuitive. We basically try to visualize how parts of the image affects our neural network model’s confidence by occluding (hiding) parts iteratively. This is done by systematically occluding different portions of the input image with a grey square, and monitoring the output of the classifier.


This is perhaps one of the most popular and effective methods for interpreting CNN models. Using GradCAM, we try and visualize how parts of the image affects neural network’s output by looking into the class activation maps (CAM). Class activation maps are a simple technique to get the discriminative image regions used by a CNN to identify a specific class in the image. In other words, a class activation map (CAM) lets us see which regions in the image were relevant to this class.

  • Compute the gradients of the target function, with respect to the convolutional layer outputs. This can be done efficiently with backpropagation


This technique helps us visualize stabilized gradients on the inputs towards the decision. The key objective is to identify pixels that strongly influence the final decision. A starting point for this strategy is the gradient of the class score function with respect to the input image. This gradient can be interpreted as a sensitivity map, and there are several techniques that elaborate on this basic idea.


This should give you a good idea of how you can not only leverage pre-trained complex CNN models to predict on new images but to even try and make an attempt to visualize what the neural network models are really seeing! The list of techniques here are not exhaustive but definitely cover some of the most popular and widely used methods to interpret CNN models. I recommend you to try these out with your own data and models!

Google Developers Experts

Experts on various Google products talking tech.

Dipanjan (DJ) Sarkar

Written by

Data Science Lead @Applied4Tech, @Google Developer Expert — Machine Learning, Author, Consultant, Mentor @Springboard, Connect: http://bit.ly/djs_lin

Google Developers Experts

Experts on various Google products talking tech.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade