Gagana B
3 min readOct 2, 2019

Improvised Gradient weighted Class Activation Map(Grad-CAM++)

This work is a part of the AI Without Borders Initiative.

Co-contributors: Ninad Shukla, Chinmay Pathak, Kevin Garda, Tony Holdroyd, Daniel J Broz.

Read about CAM here.
Read about Grad-CAM here.

Visualizations generated by Grad-CAM explains CNN based localization of different parts of the image but limitations of Grad-CAM include it’s inability to determine multiple occurrences of the same class and accurate coverage of class regions in an image.

Generic enhancement over the Grad-CAM architecture provides improvised visual explanations of model predictions while overcoming above mentioned limitations of Grad-CAM with better object localization extended to multiple object instances in a single image frame using weighted partial derivatives of previous convolutional layer feature maps with respect to specific confidence score as generating visualizations of corresponding class labels for positive combinations.

A generic illustration of the intuition behind Grad-CAM++ is as follows:

The ability of Grad-CAM++ to accurately localize scattered occurrences of the same class in an image is especially helpful in multi label-classification problems. Also, different weightage is given to different pixels which captures the importance of each pixel in the gradient feature map.

Figure 1. Comparison of Heat-maps of GRAD-CAM and GRAD-CAM++

From Figure 1, we can decipher that Grad-CAM heat maps cause inaccurate localization around the patches within the image while Grad-CAM++ is perfectly able to localize the patch on skin.

The mathematical structures of CAM and the Grad-CAM are reformulated in Grad-CAM++ to consider weighted average of gradients.

Grad-CAM defines weights w for a class c with particular feature map as:

Without loss of generality, we can assume that:

Value of partial differentiation depends on the value of particular feature map.

Considering the structure of weights of Grad-CAM++:

Where Yᶜ is the final classification score and ReLU refers to the non-linear activation function.

Accounting for the fact that each set of pixels has a different weighting scheme:

This also mathematically proves that Grad-CAM++ is a generalization of the Grad-CAM algorithm.

The assumption which Grad-CAM++ is constrained by is that the score of particular class must be a smooth function and hence, the exponential function which is always differentiable is passed through the penultimate layer. The higher cross order derivatives are ignored henceforth and class discriminative saliency maps are treated as a linear combination with RELU forward activations. Point-wise multiplication of up-sampled saliency maps with pixel visualization which is generated by the guided back propagation leads to the generation of class-discriminative saliency maps.

In order to achieve the best salience map construction, Grad-CAM++ uses the weighted combination of gradients.

Improved ability of Grad-CAM++ enhances troubleshooting of these black box models. Grad-CAM++ also produces heat maps of all regions which works well with scattered/fused/occluded objects or multiple instances of single object. That is to say, Grad-CAM++ generates heat maps which are more specific to the class rather than CAM or Grad-CAM.

References:

  1. https://www.groundai.com/project/grad-cam-generalized-gradient-based-visual-explanations-for-deep-convolutional-networks/