Ch 5. t-SNE Plots as a Human-AI Translator

t-SNE Plots as a means of communicating with a deep learning model

8 min readSep 22, 2021

History : For my masters research project at University of Toronto, I was given airport Xray baggage scan images containing gun and knife to develop a model that performs an automatic detection of gun and knife in the baggage. Given only a small amount of Xray images, I am using Domain Adaptation by using a large number of normal non-Xray images of gun and knife from the web to train a model and adapting the model to perform well on the Xray images.

Web images (source domain) and Xray images (target domain) containing knife and gun

In this post, I will address a strange behaviour of ResNet50 and show how t-SNE visualizations helped clear up the fog. Here are the list of topics I will discuss :

Why is ResNet50 classifying a basketball as knife?
Feature Encodings
t-SNE
Plotting t-SNE for ResNet50
Decision Boundary (+ 3 important lessons)
Semantic Alignment
Image t-SNE

*All codes in PyTorch for the topics discussed in Chapters 3, 4, and 5 (current) are included in my colab notebooks.

1. Why is ResNet50 classifying a basketball as knife?

In my previous post, I addressed an unintuitive behaviour of ResNet50 that was fine-tuned for gun vs. knife binary classification task using only web images (no Xray images). It classified images unrelated to gun or knife as knife with high confidence (>90% softmax probability). When the same model was tested on benign Xray images not containing gun or knife, it classified most of them as knife, also with high confidence.

Unintuitive behaviours of the model when tested with web images (left) and Xray images (right)

I wished I could just ask the model in English why it classified an image of a basketball as knife. But as much as the model did not understand English, there was no way for me to write down all of its 23 million parameters (let alone the history of them during training) to understand its workings.

2. Feature Encodings

I found an alternative method to communicate with the model by looking at its feature encoding for each input image. For ResNet50, a feature encoding refers to the 2048-dimensional output vector of the final convolutional layer before going into the fully connected classification layer, as illustrated here :

image source : https://developersbreach.com/convolution-neural-network-deep-learning/

The feature encoding is the image information, originally composed of 224*224 = 50,176 highly space-correlational values, compressed in 2,048 serial values. CNN architectures such as ResNet50 are often called feature extractors that extract only the important features of an image and save them into the feature encoding vector.

3. t-SNE

2,048 numbers are better than 23 million, but it’s still hard for me to interpret them at the same time as a human. Is there a way to reduce 2048-dim vectors into a smaller dimension without losing much information? Yes, an effective and popular method for this is called t-SNE. The original paper states that :

t-SNE visualizes high-dimensional data by giving each datapoint a location in a two or three-dimensional map.

With t-SNE, we can visualize the 2048-dimensional feature encoding vectors using a 2D plot. I feel that this is a bridge between a human and a machine learning model that translates a model’s language (uninterpretable 2048-dim vectors) to something that humans can understand (clean 2D visualization).

4. Plotting t-SNE for ResNet50

Using sklearn.manifold.TSNE(n_components=2), I plotted three 2D t-SNE plots for: 2-class web images, 2-class Xray images, and 3-class Xray images.

t-SNE plots for 2-class web images (left), 2-class Xray images (middle), and 3-class Xray images (right). Benign class for Xray images refers to safe baggages that do not contain gun or knife.

I will refer to the points in the plots as “images”, since they represent the feature encodings of the input images. The first plot shows well-separated and tightly clustered features for gun (red) and knife (blue) web images. In contrast, the second plot shows loosely clustered features for Xray images, without a clear separation between the two classes. Such degrading separation quality between the two classes from the first plot (web images) to the second plot (Xray images) is understandable because :

The model was trained only using web images.
There exists a considerable texture shift from web images to Xray images.

Finally, the last plot for Xray images including benign images also shows the three classes of images close together without a clear separation.

5. Decision Boundary (+ 3 important lessons)

Next, using the confusion matrices and softmax probability histograms presented in Chapter 3.2, I made my best guess at drawing a supposed decision boundary used by the model to classify each image :

Supposed decision boundaries for binary classification drawn on t-SNE plots using confusion matrices

The dotted lines represent my supposed decision boundary for binary classification. Images on the left side of the line would be classified into knife, and those on the right would be classified into gun. The further an image is located from the line, the more confident the model would be in classifying that image into the corresponding class. Looking at the supposed decision boundaries, I gained three important intuitions about the model :

Lesson #1 : Binary Decision Boundary makes the model Black-and -White.

I began to understand why the model had such high confidence in classifying benign Xray images as knife. In the third plot for 3-class Xray images, most benign images (light blue dots) are positioned further away from the decision boundary, making the model confidently predict them as the “left side” (not necessarily as the “knife object”). The basketball image should also be found somewhere far left from the boundary, thus being classified into the “left side” with high confidence. Thus the binary decision boundary makes the model black and white, classifying all images into the “left side” and “right side”. It’s not able to think of an additional “neither” class (as humans do) for images not containing any of the class objects.

Lesson #2 : Decision Boundary is a Shortcut

The decision boundary is only a shortcut for distinguishing between gun and knife, not a boundary defining what is gun and knife. It’s like a model asked to distinguish between a green apple 🍏 and red apple 🍎 learns a shortcut of only looking at the colour. If it was given a red sweater, it would confidently classify it as red apple, not because it thinks the sweater is an apple but since the sweater is red.

Lesson #3: AI Learning ≠ Human Learning

Assuming that a model with 100% classification accuracy for a particular class has “accurately learned the shape of the object” is dangerously misleading. It’s true that the model learned a decision boundary from which all images for that class are positioned on one side of it. But we never know and should not assume that the model has learned the exact shape of that object like how humans remember an object.

6. Semantic Alignment

So far, I plotted web and Xray images on separate plots. But I also wanted to see how the feature encodings of web and Xray images are distributed relative to each other. This paper defines Semantic Alignment as a condition where samples from different domains but with the same class label map nearby in the feature space. To answer if web and Xray images were semantically aligned, I plotted both together on a same plot :

t-SNE plot of web and Xray images together

(6.1) Grouping by Domain : Web / Xray

It appears that the Xray images are positioned between the two web image clusters. Since the model’s decision boundary should be located between the two web image clusters, Xray images are located closer to the boundary than web images. This would make the model less sure of classifying them into one of the classes.

(6.2) Grouping by Class : Gun / Knife / Benign

Gun and knife classes appear to be moderately semantically aligned since:

most Xray gun images (yellow dots) are located close to the cluster of web gun images (red dots).
more than half of the Xray knife images (pink dots) are falling towards the cluster of web knife images (blue dots).

However, both Xray gun and Xray knife images are still much close to Xray benign images. The model may be spotting some similarities between web and Xray images of the same class, but they are not strong enough to overcome the texture similarity among all three classes of Xray images. How can I let the model overcome this texture shift from web to Xray images? This is a question of domain adaptation I tried to answer for the rest of the project.

(6.3) Supposed Decision Boundary

I also drew a supposed decision boundary :

Supposed decision boundary for binary classification: points below the line would be classified as knife and those above the line would be classified as gun.

Images below this line would be classified as knife and those above the line would be classified as gun. This explains why most Xray benign images were classified as knife, as they were classified more as “below the line” than “the knife object” (as discussed in Section 5). I would have to work on adjusting this boundary to locate all gun images above the line. I would also have to think about how to deal with Xray benign images, as a binary classification model must inevitably classify them into one of the two classes.

7. Image t-SNE

This article shares a useful method of plotting t-SNE using images instead of coloured dots. Here is the same plot from Section 6.2 plotted with images :

The cluster of Xray images containing gun are located near web images of gun, and some Xray images containing knife are gearing towards web images of knife yet much closer to benign Xray images.

Sigh of Relief

After plotting and analyzing t-SNE plots with ResNet50 feature encodings, I felt much relieved compared to when I first looked at its non-sensical behaviours from Section 1. t-SNE plots served as powerful visual translators that helped me understand the model’s inner workings. To summarize what I learned in this post :

A binary classification model learns a single black-and-white decision boundary. It is not able to think of an additional “neither” class (as humans do) for images not containing any of the class objects.
The model’s decision boundary is a shortcut for distinguishing between classes, not a boundary defining the exact shape of each class object.
Th source and target domains, i.e. web and Xray images, are semantically aligned in a moderate level.

The last remaining question was: how to deal with the benign Xray images that were inevitably classified into one of the two classes by a binary classification model. I will discuss how I approached this problem in the next post.

All codes in PyTorch for the topics discussed in Chapter 3, 4, and 5(current) are included in my colab notebooks. Thanks for reading! 😊

- L ☾₊˚.