Demystifying Hidden Units in Neural Networks through Network Dissection

Researchers at MIT’s CSAIL elucidate the thought process behind neural network predictions through their fascinating paper “Network Dissection: Quantifying Interpretability of Deep Visual Representations”.

Published in

Analytics Vidhya

6 min readMay 16, 2021

Have you ever wondered how Neural Networks (NN) arrive at predictions once it’s trained? Wouldn’t it be interesting to dissect NN and find out what the hidden units have learned? How do you think hidden units contribute to NN predictions post training? Well, one has plenty of time to think of such intricacies of deep networks when one’s model goes on training. Alas, how can a novice in deep learning put a probe on the hidden units and interpret them. So, I naturally discarded these thoughts until I stumbled upon the paper “Network Dissection: Quantifying Interpretability of Deep Visual Representations”.

About the Paper:

Researchers from MIT’s CSAIL propose a technique called “Network Dissection” where they evaluate every individual convolution unit in CNN on a binary segmentation task to characterize a unit’s behavior. In other words, this method interprets networks by providing meaningful labels to their hidden units. They have shown that interpreted units can be used to provide an explanation for the individual image predictions given by a classifier.

In the past, observations of hidden units have shown that human-interpretable concepts sometimes emerge in individual units within networks. For example, object detector units have been observed within scene classification networks and part detectors have emerged in visual recognition tasks.

Using ‘Network Dissection’, the authors evaluate the emergence of such concept detectors in deep networks, quantify the interpretability of individual units in CNNs and attempt to answer the question - ‘Do CNNs learn disentangled features? ’.
Note: Disentangled features are narrowly defined hidden units that encode specific real world concepts.

Network Dissection Method:

The interpretability of individual units is quantified by measuring the alignment between a hidden unit’s response and a set of visual concepts. Human-interpretable concepts include low-level concepts like colors and high-level concepts such as objects. By measuring the concept that best matches each unit, Net Dissection can break down the types of concepts represented in a layer.

Quantifying interpretability for individual units using Network Dissection proceeds in three steps:

1: Gather images with human-labeled visual concepts.

To identify ground truth exemplars for a broad set of visual concepts, the authors assembled a new heterogeneous dataset called Broden.

The Broadly and Densely Labeled Dataset (Broden) unifies several densely labeled image data sets: ADE , Open Surfaces , Pascal-Context , Pascal-Part and Describable Textures Dataset. These data sets contain examples of a broad range of objects, scenes, object parts, textures, and materials in a variety of contexts.

Figure: A sample of the types of labels in the Broden dataset. ( Figure from Bau & Zhou et. al (2017) )

There are around 60,00 images in the Broden dataset and annotations spanning 1197 visual concepts. Images are pixel-wise labelled for most visual concepts, except texture and scene where individual labels are given for the full image. Additionally, every image pixel is labelled with one of the 11 common color names. This way every image will get an annotation mask, L_c for every visual concept, c.

2: Retrieve individual units’ activations.

To gather the response of individual units to concepts, images from the Broden dataset are fed into the CNN and a forward pass is performed.

For the each convolutional unit (k), feed each input image (x) from the Broden dataset to the CNN and compute the activation map, A_k(x).
Activation maps are the outputs of the unit post a convolution operation.
Note: In a unit, a kernel or filter is convolved with the image volume.
Calculate the distribution of activation, a_k, over all images. a_k is a real valued map.
To convert it into a binary map, compute a top quantile threshold T_k , such that P(a_k >T_k)=0.005. This means 0.5% of all activations of unit ‘k’ for image x is greater than T_k.
Generally, deeper into the NN , smaller the size of the activation map. In order to obtain a binary segmentation map, use bilinear interpolation to scale the lower-resolution activation maps, A_k(x) to the image resolution resulting in S_k(x).
Binarize the activation map: A new mask, M_k(x)=S_k(x)≥T_k(x), is obtained such that a pixel is on or off depending on whether it exceeds the activation threshold T_k.
Note: These activation masks mark the highly activated areas.

3: Quantify activation−concept alignment.

Now we have human-labelled concept mask, L_c (from step 1) and activation mask, M_k (from step 2). Next, we need to identify the visual concept that activates a particular node. In other words, we try to identify which concept each node is “looking for”.

This is done by comparing the activation masks with all labeled concepts. We quantify the alignment between activation mask, M_k and concept mask, L_c with the Intersection over Union (IoU) score.

Figure: **Intersection over Union (IoU)** score formula.

IoU score =( Number of pixels identified by both the masks as concept c ) /
( Total number of unique pixels identified as concept c)

Figure: Example of how IoU Score is computed. (Source: Interpretable Machine Learning)

The value of IoU_(k,c) is the accuracy of unit k in detecting concept c. We consider k as a detector of concept c if the IoU score exceeds a threshold.

The authors chose 0.04 as the threshold to classify a unit as a particular concept detector. One unit could detect multiple concepts and for analysis the top ranked label was chosen.

To quantify the interpretability of a layer, the number unique concepts identified in the layer was noted as number of unique detectors.

Experiments:

With the framework set out, the authors tested Net Dissection on different network architectures (AlexNet, GoogLeNet, VGG, ResNet) trained from scratch on different datasets (ImageNet, Places205, Places365). For self-supervised training tasks, AlexNet was trained for tasks such as solving puzzles and tracking.

ImageNet is an object-centric dataset with 1.2 million images from 1000 classes. Places205 and Places365 is a scene-centric dataset with 205 and 365 categories respectively. Places205 contains 2.4 million images and Places365 contains 1.6 million images from categories like kitchen and living room.

Following are some of their findings:

The authors found detectors of high-level concepts at higher layers and low-level concepts at lower layers (i.e. low-level concepts like color and texture dominated at conv1 and conv2, while more object and part detectors emerged in conv5).
Networks trained on supervised tasks have more unique detectors than those trained on self-supervised tasks.
The number of unique concept detectors increases with the number of training iterations.
Batch normalization reduces the number of unique concept detectors while increasing the number of units in a layer increases the number of interpretable units.
Interpretability of ResNet > VGG > GoogLeNet > AlexNet. Interpretability of models trained on Places > ImageNet.

Conclusion:

Network Dissection helps us understand what emergent concepts appear in a NN, allowing us to quantify its interpretability. Though concept detectors emerged within the network, not all the units in a NN were interpretable which proved a partial disentangled representation within the network.

Figure: Total Number of interpretable units in a layer. ( Figure from Bau & Zhou et. al (2017) )

The authors also used Network Dissection for Generative Adversarial Networks (GANs). You can find the project here.

Hope you found this article helpful. Thankyou for Reading!