Integrated Gradients for Deep Neural Networks

3 min readJun 1, 2019

The Black Box Problem

Interpretability in Deep Learning is a big challenge tackled by researchers since the inception of it. The question asked today is ,‘Why did the CNN make this prediction?’’ rather than “How did the CNN make this prediction?”

Considering the fact that the use of Deep Neural Networks is prevalent in sensitive domains like Healthcare (for e.g Chest disease detection), the ‘interpretability’ of such networks matters a lot. The ‘why’ of the diagnosis is of the prime importance. The slow acceptance of Deep neural networks in the domain of Finance is also due to less interpretability. Since Financial companies, banks etc follow a proper structure and regulations, a preference is given to classical machine learning techniques like tree-based models, Bayesian models which are easy to interpret.

Solving the problem of Interpretability also opens new doors for improving such networks. To change this notion that “Deep learning models are Black Boxes”, many methods have surfaced up. One of the techniques is Integrated Gradients.

Integrated Gradients

Let’s start of with an example of this image. A pre-trained inception model correctly classifies the following image as a ‘fireboat’.

“What pixels in this image are responsible for this classification?”

Here is the Integrated Gradients technique-

Consider a black image (each pixel 0) as a baseline image.
Now , interpolate a series of images , increasing in intensity, between the baseline image and the original image.

3. The scores of these images (softmax output ) when plotted , will look like the following.

4. Our region of interest lies where the slope of the score vs intensity graph doesn't remain stagnant . We call these gradients -interesting gradients.

5. Gradients of the output with respect to these series of interpolated images , when calculated gives us the following

6. The integration of these series of gradients give us the Integrated gradients of the image.

Following is the implementation -

Improvements

While the paper supports the usage of a black image as a baseline image , I tried out a series of experiments with different baseline images . These were mostly noisy images having low intensity and in some of the cases , the noisy baseline proved to be better.

The Integrated gradients with the noisy baseline are more spread-out than the black baseline case. This can provide a better insight in some of the cases.

Therefore, averaging out the gradients computed using different baselines can provide a more ‘robust’ output.

Thanks for reading! Follow me on GitHub — https://github.com/kartikeyab for repos and paper implementations.

Cheers!

Integrated Gradients for Deep Neural Networks

The Black Box Problem

Integrated Gradients

Improvements

Written by Kartikeya Bhardwaj