Ch 7. Decoding Black Box of CNNs using Feature Map Visualizations

How to ask CNN architectures useful questions to get insights about their behaviours

Published in

CodeX

16 min readOct 19, 2021

**Edit on February 10, 2020 : I have released an open-source library for visualization tools covered in this post using PyTorch, which you can install with pip install FeatureMapVisualizer. You can also check out:

Colab notebook example (step-by-step instructions)
Github repository
PyPi page for FeatureMapVisualizer package

How can we find the key 🔑 to get inside the black box of CNN architectures?

Motivation: CNNs are NOT inherent black boxes

Have you ever wanted to ask an animal a question? 🐈 Since we haven’t figured out a precise way to communicate with other animals, they often feel like a black box. Human-to-human communication feels more transparent since we can ask questions and exchange thoughts. Thus something cannot inherently be a black box; we just haven’t found a way to ask questions and obtain answers in a comprehensible way.

So why do deep learning models often become a target of the black box problem? Because due to the human-computer language barrier, we don’t know how to ask them appropriate questions. In this post, I will discuss how feature map visualizations can be used to ask CNN architectures useful questions in order to get insights about their behaviours. Here are the five questions I asked CNNs for this Q&A session :

Why muscles are you using to recognize each target class?
Can you draw me a knife?
Which parts of the image are you looking at?
Do you really have no idea how a gun looks like?
Are top feature maps for a class really unique for that class?

Q1 — Which muscles are you using to recognize each target class?

Translation : Which specific feature maps from the last convolutional layer are activated the most when the model sees images of a particular class?

(1.1) Specialized muscles for an action 💪

From a young age, we have trained our muscles to control different body parts to produce a wide range of movements. When we learn a new physical task, we selectively use and train muscles required for that particular task. For example, if an adult is learning to play the piano, muscles in the fingers, wrists, and arms will be mostly activated. If the person experiences pain or stiffness while playing the piano, it would make more sense to get a checkup on fingers or wrists rather than doing a general health screening.

(1.2) Specialized feature maps for a class

The case is similar for a CNN architecture that is :

pre-trained on ImageNet dataset that contains 1000 classes (wide range of movements)
fine-tuned on a new dataset containing much fewer classes than 1000 (new physical task)

When an input image propagates through the CNN architecture, each convolutional layer produces hundreds or even thousands of feature maps (i.e. output activations). Instead of looking at all of them, we can investigate a subset of specialized feature maps whose activations maximize for each class of images. This is more effective than some of the popular visualization techniques such as Grad-CAM that take the average of all feature map activations.

(1.3) Feature maps from the last convolutional layer

In particular, I found the most activated feature maps from the model’s last convolutional layer, because I was more interested in the high-level features (e.g. object parts or shapes) captured by the last few layers. Earlier layers capture low-level features like edges and textures.

(1.4) Procedure

Here is the procedure for finding feature maps from the last convolutional layer whose activations maximize uniquely for each target class. I will refer to them as “top feature maps” for each class.

Set N1: # of top feature maps to select for each image and N2: # of top feature maps to select for each class of images.
Organize images by class.
For each class, pass each class image through a model one by one and aggregate a list with N1 top activated feature maps for each image. For example, if class cat has 3 images, N1 is 3, and the lists of top 3 activated feature maps for each image are [1,2,4], [1,3,4] and [1,5,6], the aggregated list becomes [1,1,1,2,3,4,4,5,6].
From each class’s aggregated list, select N2 most frequent feature maps. With the same example, if N2 is 2, the list of top feature maps for class cat becomes [1,4]. The output of this step is n_classes lists of top feature maps, each of length N2.
Remove any feature maps from each class’s top feature maps list that are also present in another class’s top list. If the top feature maps list for class dog is [1,10], the lists for cat and dog will be reduced to [4] and [10] since 1 is present in both lists.

(1.5) Pseudocode

Here is the pseudocode for this algorithm :

Pseudocode for Finding Top N Feature Maps for Each Class

I used N1 = 25 and N2 = 25, meaning that I looked at 25 unique top feature maps for each image and also each target class. In the case of ResNet50 fine-tuned for binary classification, the number of feature maps to investigate was reduced from 2,048 (= # of filters in ResNet50’s last convolutional layer) to 50 (= 25*2).

Q2 — Can you draw me a knife?

Translation : What kind of patterns or shapes are captured by the top feature maps for knife class?

(2.1) Basic idea

After narrowing down the specialized feature maps for each class, I investigated which prominent shapes or object parts were captured by each INDIVIDUAL top feature map. Inspired by this article, the idea is to first generate a small random image (I used 33 by 33 pixels) using np.random.uniform as shown below, then iteratively adjust its pixels in a direction that maximizes the selected feature map’s activation. This is done by minimizing loss equal to the negative of the sum of the feature map’s activation. At each iteration, after optimizing, the image is also upscaled by a small factor using cv2.resize. These optimizing and upscaling steps are repeated until certain patterns become visible in the image.

(2.2) Pseudocode

Here is the pseudocode for this algorithm :

Pseudocode for Visualizing Patterns Captured by A Single Feature Map (inspired by : https://towardsdatascience.com/how-to-visualize-convolutional-features-in-40-lines-of-code-70b7d87b0030)

I used 20 optimizing steps, 20 upscaling steps, and an upscale factor of 1.2. I will share some of the generated images using a model pre-trained on ImageNet and fine-tuned for gun vs. knife binary classification. They were very interesting, resembling Inceptionism AI art.

(2.3) Draw me a knife.

This image shows patterns captured by a top feature map for knife class :

Patterns captured by a top feature map for class “knife”

Cool, huh? It’s interesting that this feature map repeatedly captures the pointy triangular shape of the knife blade. Also if you look closely, there are tiny repeated square-ish shapes containing something that looks like an eye 👁 . As a guess, this could be an artifact related to an animal class of the ImageNet dataset, meaning that for the 1000-class classification with ImageNet, this feature map could have been specialized in recognizing an animal class by looking at its eyes.

This image shows patterns captured by another top feature map for knife class :

Patterns captured by another top feature map for class “knife”

This one also captures the patterns of a pointy knife tip, but this time a little longer and curvier than the previous one. Maybe this feature map is more specialized in recognizing knives with curvy blades.

(2.4) Draw me a gun.

Now here’s an image showing patterns captured by a top feature map for gun class :

Patterns captured by a top feature map for class “gun”

This feature map seems to capture the cylindrical shape of the gun barrel. I’m also guessing that the repeated round shapes resembling wing feathers might be the shape of bullets that are often present in the images of gun.

This image is generated with another top feature map for gun class :

Patterns captured by another top feature map for class “gun”

This feature map captures more of the right-angled shape of the entire gun and possibly the shape of a double barrel. It’s worth noting that the repeated gun shapes in different orientations and sizes reflect the model’s robustness in detecting the object regardless of its orientation or size in the image.

Following images show the patterns captured by other top feature maps for gun and knife classes :

Patterns captured by other top feature maps for class “gun”

Patterns captured by other top feature maps for class “knife”

(2.5) Visual Perception of Human vs. AI

With this visualization method, the model seems to be answering the question “Can you draw me a knife (gun)?”. The generated images are quite different from what we would draw, most probably a single object’s entire shape rather than a repetition of specific parts. But when we picture of an object, we do remember its silhouette or prominent shapes (blades, barrels) captured by some of the top feature maps above. Thus it’s worth thinking about how the visual perceptions of humans and CNNs are getting close, regarding the widely-spoken notion that a neural network is invented to imitate our brain.

Q3 — Which parts of the image are you looking at?

Translation : Which parts of the image are a feature map paying attention to when classifying the image as a particular class?

(3.1) Single image, Multiple feature maps

Another useful technique is finding feature maps whose activations maximize for a single image and highlighting each map’s most attended regions of the image by overlaying its activation map on top of the image. Previously mentioned Grad-CAM visualization algorithm answers a similar question of which parts of the image the entire last convolutional layer is paying attention to, but it doesn’t go as specific as to looking at each individual feature map.

(3.2) Procedure

Forward an image through a model.
Save the forward propagation activations of all feature maps from the last convolutional layer using : torch.nn.modules.module.register_forward_hook(hook_function) (PyTorch).
Pick N feature maps with largest activations (aka top N feature maps).
Upscale the top N feature maps (usually something like 7 by 7) to match the input image size (e.g. 224 by 224) using cv2.resize.
Overlay each of the N upscaled feature maps on top of the image and plot them in order from the largest to the smallest activation.

*Important Note: Top N feature maps (for a single image) from step 3 are not the same as the top feature maps (for an entire class) found in Section 1. Here, we are finding top feature maps whose activations maximize for a single image, whereas in Section 1 we found top feature maps for a single class from the aggregated list of top feature maps for ALL images belonging to that class.

(3.3) Top feature maps for a knife image

Here I visualized top 60 feature maps’ activations for an image containing knife and other objects using the same gun vs. knife binary classification model :

**Top 60 feature maps for a knife image.** The two feature maps enclosed in red boxes indicate the top two feature maps for knife class found using the algorithm in Section 1.

Some feature maps highlight the knife in the image, while others highlight carrots, potatoes, or the cutting board. The two red boxes (top 4th and 9th feature maps) indicate the top two feature maps for knife class found in Section 1. The fact that these two feature maps are highlighting the knife, especially the top pointy part, confirms that they are indeed sensitive to the shape of the knife tip.

(3.4) Top feature maps for images with large white background

Here I visualized top feature maps using the same gun vs. knife binary classification model, this time for images with a large white background:

**Top 90 activated feature maps for a knife image.** The two feature maps enclosed in red boxes indicate the top two feature maps for knife class found using the algorithm in Section 1.

**Top 60 activated feature maps for a gun image.** The top two feature maps enclosed in red boxes are also the top two activated feature maps for gun class found using the algorithm in Section 1.

(3.5) Strangeness of the top feature maps for the knife image

While looking at the top feature maps’ attended regions for the knife image, I spotted something strange. Many of them highlight parts of the white background instead of the knife, while most top feature maps for the gun image do highlight the gun rather than the white background. In addition, knife class’s top two feature maps from Section 1 (enclosed in red boxes) are further down in the plot (top 53th and 90th feature maps), compared to gun class’s top two feature maps from Section 1 (also enclosed in red boxes) that coincide with top two feature maps for this gun image. What’s going on?

(3.6) Why do you classify an image of a basketball as knife with high confidence?

A gun vs. knife binary classification model confidently classifies most images containing neither classes as knife.

In my previous Ch.4 post, I mentioned the gun vs. knife binary classification model’s unintuitive behaviour of confidently classifying most images containing neither classes as knife as shown above. It was difficult for me to understand why the model would gear so much towards the knife when classifying a completely unrelated image.

I believe that the observation from Section 3.5 suggests a possible explanation for such strange behaviour. The model may have learned to perceive the solid background as one of knife class’s characteristic features (maybe not necessarily a feature of the “knife object” as humans know it but of “class 1”. I discussed this black-and-white binary decision boundary in my Ch.5 post). So when it sees an image that does not contain a gun and has a large solid background, it classifies it as knife.

So why is the solid background recognized as one of knife class’s characteristic features? Let’s think about the data, i.e. what would be different between the training images of knife and gun. It may be that knife images generally contain a larger background space. Since all input images have square dimensions, a square image makes a knife with a skinny, long shape fill the image space less than a gun with a thicker, curved shape. This is an example of how feature map-specific visualizations provide us some hints about how CNNs perceive images: quite literally, pixel-wise, and paying fair attention to both background and foreground.

Q4 —Do you really have no idea how a gun looks like?

Translation : When given a bunch of images of the same class, can a top feature map for that class spot the object in all of them?

(4.1) Multiple images, Single feature map

This method is similar to the previous one (Section 3), but this time we are overlaying a SINGLE feature map’s activation maps on a group of images of the same class. I used top feature maps for each class found in Section 1.

(4.2) Target domain confusion matrix

Before getting into the visualization, here’s the target domain confusion matrix of a VGG16 model trained for gun vs. knife binary classification :

This “target domain” confusion matrix shows the performance of the model trained on source domain and tested on target domain. In this case, source domain refers to normal camera images of gun and knife (“web” images since they were scraped from the web), while target domain refers to Xray images of baggages containing gun and knife.

Web images (source domain) and Xray images (target domain) containing knife and gun

The confusion matrix shows poor performance due to the apparent texture shift from web images to Xray images :

Gun recall is only 24%.
Knife recall is 100%, but the model is classifying 100% of benign Xray images (that contain neither gun nor knife) and 87% of Xray gun images as knife as well.

Thus if we only looked at the confusion matrix, we will mostly probably conclude that the model has failed to learn the shape of gun or knife well.

(4.3) Activation maps of a gun class top feature map for gun images

I found top feature maps for gun class using the source domain (web) images from the procedure in Section 1. The figure below shows what a top feature map for gun class highlights in a group of Xray gun images. It’s interesting to see that this particular feature map actually does a good job in highlighting guns in the images!

Activations of a gun-class’s top feature map for Xray gun images

If we judged the model by only looking at the confusion matrix showing 24% recall for gun, we would have doubted if the model could detect gun at all in Xray images. However, this visualization illustrates that there exist some very specific feature maps from the model’s last convolutional layer that are particularly sensitive to the shape of gun. This feature map is almost acting as an object recognizer for gun.

Here is the same visualization with another model with improved gun recall. You can see how this model highlights the gun in the Xray images more clearly than the previous one.

Activations of a gun-class’s top feature map for Xray gun images for another model with improved gun recall

(4.4) Alternate prediction algorithm

With above observation, a light bulb💡 went on my head about a possible new prediction algorithm. To classify an Xray image as gun, instead of going through the fully connected and softmax layers after the last convolutional layer, we could check if this gun-detecting feature map gets activated above a certain threshold. This idea led me to the fifth visualization technique.

Q5—Are top feature maps for a class really unique for that class?

(5.1) Plotting the SUM of activations of class top feature maps for EACH image

Here, for each class, I added up the activations of all 25 class-wise top feature maps (found in Section 1) for EACH test image, and plotted them in the same graph with different colours representing different classes. I used top feature maps found using web images from Section 1 and added up their activations for Xray images for plotting. This was useful for checking if the top feature maps for each class get maximally activated for that class only even with web → Xray texture shift.

I will present the plots for three different models for comparison. All three models were tested with images of three different classes: benign, gun, and knife. I will reveal the identities of the models after comparing the plots. The plots’ axes represent the following :

x-axis : # of images for each class (300 benign, 100 gun, and 30 knife images in the examples)
y-axis : sum of activations of feature maps

(5.2) Model #1

Plot of the sum of activations of top feature maps for **gun class (left)** and **knife class (right)** using **Model #1**

Left plot for gun class top feature maps shows blue lines (gun Xray images) standing higher than red and pink lines (knife and benign Xray images). In contrast, right plot for knife class top feature maps shows large overlap among all three colours. Perhaps this model has a better recall for gun than knife.

(5.3) Model #2

Plot of the sum of activations of top feature maps for **gun class (left)** and **knife class (right)** using **Model #2**

Model #2 shows improvement for knife class with the right plot showing the red lines (knife Xray images) standing higher than blue and pink lines. Left plot; however, shows some overlap between blue (gun Xray images) and other lines.

(5.4) Model #3

Plot of the sum of activations of top feature maps for **gun class (left)** and **knife class (right)** using **Model #3.** The dotted line indicates a possible threshold for which if the sum exceeds the threshold, the model can raise alarm for the possible presence of the threat object.

Model #3 shows great improvement for both plots with distinctly higher sum of activations for both gun and knife classes. This model must have the highest recalls for both classes.

(5.5) Identities of the three models

Here are the identities of the three models :

Model # 1

VGG16 pre-trained on ImageNet
fine-tuned for gun vs. knife binary classification

Model #2

ResNet50 pre-trained on Stylized ImageNet
fine-tuned for benign vs. gun vs. knife multi-label classification

Model #3

ResNet50 pre-trained on Stylized ImageNet
fine-tuned for benign vs. gun vs. knife multi-label classification
trained with Adversarial Discriminative Domain Adaptation (ADDA) with web images (source domain) & Xray images (target domain)

You can see how data or model tweaks such as using multi-labels and domain adaptation make the final layer feature maps more sensitive to the unique characteristics of each class object, enhancing the model’s class-discriminating ability.

(5.6) Alternate Threat Detection Algorithm

These plots also propose an alternative threat detection algorithm for gun and knife. Given a test image, we can first compute the sum of top feature map activations for each threat class . Then if the sum is above a certain threshold (such as ones indicated by dotted lines in Model #3 plots), the model can raise alarm for the possible presence of the threat object.

Closing: ML Research is an ITERATIVE Process

How was the Q&A session? Did the feature map visualizations help you better understand the inner workings of a CNN architecture? They did for me tremendously. They became a part of the iterative process in ML problem solving by making me think about :

How I recognize objects in images — through visual cues: shape, outline, texture, colour and non-visual cues: thinking about the object’s physical use, unconsciously applying the laws of physics (a floating knife above a table is weird vs. a floating knife with solid white background is OK)
How to design the data, model architecture, and loss function to train the model to effectively detect the visual cues (i.e. narrowing the gap between the visual perception of humans and CNNs: discussed in my Ch.6 post)
How to improve the design after looking through the visualizations that help understand the mechanisms used by the model to classify images

Before doing masters, I was more result-oriented, finishing one deep learning project after another every few weeks. But as I worked on a single research project for a whole year during masters, I became curious about the model behaviours and spent more time in studying why the model was giving good or bad results. So what was my key🔑 to get inside the black box of CNN architectures? It was the curiosity and persistence to keep investigating what I don’t understand about the model.

Again, the step-by-step instructions and code (in PyTorch) for methods discussed in this post are in my Colab notebook and Github repository. You can contact me for any questions or feedbacks 😊. Thanks for reading!🌸

- L ☾₊˚.