Implementation of Class Activation Map (CAM) with PyTorch
Before going to implementation of CAM, let me give an general overview of what is CAM.
What is Class Activation Map (CAM) ?
The paper Learning Deep Features for Discriminative Localization introduce the concept Class Activation Map. A Class Activation map for a particular category indicates the particular region used by CNN to identify the output class.
The CNN model is composed of numerous convolutionary layers and we perform global average pooling just before the final output layer. To get the desired output, the resulting features are fed to a fully connected layer with softmax activation. By projecting the output layer weights back into the convolutionary maps derived from the last Convolution Layer the importance of the image regions is identifiable. This technique is referred to as Class Activation Mapping[1].
Therefore let us get started. I am going to use the VGG16 model to implement CAM. There are a few things we need to import:
Next, we load the data set folder and also transform the images inside the folder by:
After transforming the data, we load the dataset to the DataLoader
Let us remove the fully connected layer of VGG 16. To achieve this we can add a single line of code that is given below to remove the fully connected layer.
The last convolutional layer of VGG 16 gives an output shape of [512,7,7] where 512 is the dimension and 7x7 is the height and width. Therefore we have to reshape the last convolutional layer to get our output class. There we need to convert 512,7,7 to 512,49 from there we convert it to 512x1 matrix where 512 is the number of rows and 1 is the number of columns, we can achieve this by doing mean(1). To get our c ( c here means the number of the class you have) output, we need to convert 512x1 to 1x512 and use a Linear layer where it will take 512 input feature and output c number of features. The code to this is given below.
To add this model to the end of VGG 16 we can just add this simple line of code.
We need to just optimize our last layer parameters, the layer that we added (fc =nn.Linear()). To get the last layer parameters we can simply do:
We using cross Entropy as our loss function.
After that we start our training process:
After the training is done we need to extract the weight from the parameters.
Let us take a look into CAM function, CAM function takes 3 parameters and those are feature_conv, weight, and class_idx. feature_conv contains the feature map of the last convolutional layer, the weights that have been extracted from the trained parameters and class_idx contain the label of the class which has the highest probability.
return_CAM function is up-sampling the feature map and multiplying with the weight of that class to get the heatmap.
The above code is used to normalize and preprocess the test images.
Now lets take a look at the testing process:
I am reading an image from a folder, where IMG_URL contains the names of all of the images in the folder. After preprocessing the image, the image is sent to the model to get the output which is stored in logit. Here logit shape is torch.size([1,c]) (c is the number of classes you have). Therefore we did F.softmax(logit, dim=1).data.squeeze() to convert the shape to torch.size([c]). So that we can sort and get the class label which has the highest probability. The variable features_blobs contains the last convolutional layer feature map, which is then converted to NumPy because it’s easy to read NumPy values.
Here are some results:
Reference