A technique for making Convolutional Neural Network (CNN)-based models more transparent by visualizing the regions of input that are “important” for predictions from these models — or visual explanations
This visualization is both high-resolution (when the class of interest is ‘tiger cat’, it identifies important ‘tiger cat’ features like stripes, pointy ears and eyes) and class-discriminative (it shows the ‘tiger cat’ but not the ‘boxer (dog)’).
The gradient of the loss (for category cat) wrt the input pixels gives,
It’s pretty noisy!
Deconv and Guided Backprop
This gives much cleaner results
Now lets take a different picture, which contains 2 categories
lets visualize the important regions for each of these 2 categories using Guided Propagation, which gives:
This is bad. The visualization is unable to distinguish between pixels of cat and dog. In other words, the visualization is not class-discriminative.
Modifying the base network to remove all fully-connected layers at the end, and including a tensor product (followed by softmax), which takes as input the Global-Average-Pooled convolutional feature maps, and outputs the probability for each class.
Note that this modification of architecture forces us to retrain the network.
Can we get these visualizations without changing the base model, and without any re-training?
Let’s see how Grad-CAM discovers these weight of importance without any training.
To obtain the class-discriminative localization map, Grad-CAM computes the gradient of yc (score for class c) with respect to feature maps A of a convolutional layer. these gradients flowing back are global-average-pooled to obtain the importance weights αck:
Similar to CAM, Grad-CAM heat-map is a weighted combination of feature maps, but followed by a ReLU:
If the architecture is already CAM compatible — the weights learned in CAM are precisely the weights computed in Grad-CAM. Other than the ReLU, this makes Grad-CAM a generalization of CAM. This generalization is what allows Grad-CAM to be applicable to any CNN-based architecture.
While Grad-CAM visualizations are class-discriminative and localize relevant image regions well, they lack the ability to show fine-grained importance like pixel-space gradient visualization methods (Guided Backpropagation and Deconvolution). For example take the case of the left image in the above figure, Grad-CAM can easily localize the cat region; however, it is unclear from the low-resolutions of the heat-map why the network predicts this particular instance is ‘tiger cat’. In order to combine the best aspects of both, we can fuse Guided Backpropagation and the Grad-CAM visualizations via a pointwise multiplication. GradCAM overview figure above illustrates this fusion.
Original source code : https://github.com/ramprs/grad-cam (pytorch)
Keras version : https://github.com/eclique/keras-gradcam (Jupyter Notebook)
Tensorflow : https://github.com/insikk/Grad-CAM-tensorflow (VGG ,resnet50,resnet101)
A live demo on Grad-CAM applied to image classification can be found at http://gradcam.cloudcv.org/classification.
a quick video showing some of its functionalities.
Arxiv Paper link: https://arxiv.org/abs/1610.02391