Experiment Result — Explainable AI and Visualization (Part 13)

Published in

DeepViz

6 min readJul 5, 2023

This article continues the research and design for the study ‘Explainable Deep Learning and Visual Interpretability.’

In this section, I provide qualitative research results and essential findings from prototype user testing spanning multiple iterations over the course of thesis research and prototype development.

Visual Analysis

The critical finding from the research highlights that the representations learned by the image recognition models are highly receptive to visualization, largely because they are representations of visual concepts. I used the image localization technique and gradient computation to visualize the relevant heat map, which substantiates image attribution and feature activation graph highlighting learned representations. For both methods, I use the VGG16 model, a convolutional neural network model trained on the ImageNet database.

Sensitivity Analysis

DeepViz tool produces a relevant heat map as visual evidence of the class prediction that justifies the target class by highlighting the discriminating region corresponding to the desired class clearly. This method is useful to understand precisely which part of an image is identified to belong to a specific class or category (class names known to VGG16) and thus allows localizing objects in images.

Figure 3.4 shows the sample explanation, which first declares a predicted class label, which is correctly classified as “bee-eater,” followed by a relevant heat map to elucidate why the model arrived at this decision. The relevance heat map visualizes the importance of each pixel in the given image inconsistent with the predicted class.

In this example, the bee-eater’s beak and neck are the basis for the model’s decision. With the heat map, the user can verify that the model works as intended. This step validated the first premise of my hypothesis: Apply image localization and object detection techniques to distinguish what part of an input image attributed most to the classification decision.

To examine instances where the incorrect label is predicted and identify what mistake a network is making, I collected a set of images that the VGG16 model fails to classify correctly. In this instance, when an example image is served to the tool seen in Figure above. The model predicts the incorrect class. The input image is misclassified as “hare” when the correct label is a cat. The user can see why it predicted a wrong label by inspecting the heat map visualization. Here, the model detected only the tail region and omitted other parts of the cat image. Hence the feature extraction was not correct.

To take this a step further, I break down the attention map at the layer level to visualize the localization of the relevant image region at each hidden layer (Figure above). It helps carefully study the differential aspect of the attention map as it builds up toward the final layer. Users can click through layer buttons to generate the heat map corresponding to that hidden layer.

Feature Activation Graph

DeepViz visualizes the activation output of the intermediate layers as a directed acyclic graph displaying how the network builds up its internal representation. The graph decomposes the feature activation map at each segment into a distribution of channels. The tool visualizes only selected channels considering the browser rendering and computation required.

This visualization helps the user understand how successive layers of the network transform the input image. Visualizing intermediate activations helps display the feature maps produced by various convolution and pooling layers in the network. User can see that nodes in the first layer mainly detects edges, where activations retain most of the information from the input image.

Layer-wise Relevance Heat map and Final Layer Visualization

As it moves deeper, the activations become highly abstract and far less visually interpretable. This is due to the network focusing on higher-level concepts like a dog’s ear or bird’s beak. These high-level activations carry less spatial information about the image and more class-specific information. This shows how the input image is continuously transformed at intermediate layers to distill out irrelevant information and retain useful information specific to the target class.

In summary, DeepViz visualization helped answer two important questions:

Why did the network think this image contained a bee-eater bird?
Where is the bee-eater located in the picture?

In particular, what is most exciting to note is that the head region of the fourth bird, which is the largest of all six birds, is strongly activated: this is probably how the network can tell the difference between a bee-eater and any other bird.

User Testing

Having established that, next, I evaluated whether the visualization can lead an end user to trust the model appropriately. Does it help the user trust the model based on the relevant heat map as the visual evidence for the prediction made by the model? For this experiment, I conducted a human study where users tested several images using the DeepViz tool to compare the heat map visualization and predicted class for correct and incorrect classification.

CONCLUSION

Research in deep learning has traditionally focused on new algorithms, mathematical models, improving quality and performance or the speed of the neural network model. I have studied and investigated a lateral research direction that touches upon the social implication of automated decision-making systems; namely, we have contributed to furthering the understanding and transparency of the decision-making implemented by a trained deep neural network.

I proposed a visual exploration tool for interpreting a visual classifier that provides a visual explanation of the inference decision. The tool is targeted towards non-experts and helps broaden people’s access to an interactive tool for deep learning. The visualization technique helps make an image recognition model more transparent by providing a visual explanation for its predictions. My prototype helped answer two critical questions raised in this research hypothesis: (i) Why did the network think this image contained a specific object (ii) where is the object located in the picture? I used the heat map concept that allowed better intuition about what has been learned by the network.

Further, running deep learning applications entirely client-side in the browser unlocked new opportunities to add rich interaction and user experience. From a user’s perspective, there is no need to install any libraries or drivers. They can directly access the application in their browser. Finally, all the user data stays on the client-side local to the user device, and this helps to maintain a privacy-preserving application.

In summary, I have presented a novel image explanation technique that justifies the class prediction of a visual classifier. This method is a stepping stone and serves as a foundation for building a more robust model interpretability and explainable system. This work is a small contribution toward building a fair, transparent and explainable AI system.

FUTURE PERSPECTIVES

I believe there are several opportunities to extend and enhance DeepViz as a visual exploration tool for the deep learning system. Future work will focus on improving DeepViz with new interaction capabilities, sophisticated visualization and better performance.

More generally, I see an immense opportunity for contributing to this new and rapidly growing body of research on deep learning visualization, with a focus on explainability and interpretability, whose impact spans a broad range of domains. For example, a visual analytics tool for interactive comparison of multiple models to assess transparency and fairness, visualize other models, e.g., auto-encoders, recurrent neural networks (RNNs) and generative adversarial networks (GANs), visualize deep networks in domains such as reinforcement learning, meta-learning and auto-ML.

Further, there are a number of directions for future work, such as creating new interpretable methods and visual representation for the components in deep learning models, developing a rich visual interface with innovative interactions to discover and communicate more in-depth insight about one’s model, a visual exploration tool that combines visual representation, new interaction technique, modern attribution and feature visualization.

There are several directions to make progress on deep learning interpretability. I believe it's time we address the issues of fairness, transparency and accountability in AI technologies and ensure bias from the data doesn’t get embedded in the systems we create.