Exploratory image analysis — Part 2 : Embeddings on TensorBoard

10 min readFeb 19, 2024

Exploratory data analysis and visualization techniques are essential to get insight from the data. Unlock the full power of AI approaches by understanding and focusing on data quality! 🚀

UMAP embedding on TensorBoard (closest NN selected)

For both TensorFlow and Pytorch users!

In this notebook, we look at projection embeddings, which are useful visualization techniques to uncover the underlying structure of high-dimensional data, such as images. We will use the TensorBoard library, as it allows for interactive visualization and provides out-of-the-box projections.

For an introduction to exploratory image analysis and for advanced density plots, read Exploratory image analysis — Part 1.

Introduction

What are projection embeddings and why are they useful? In machine learning, an embedding is a mapping of a discrete variable to a vector of continuous numbers. This vector space is a convenient data representation where we can perform operations such as distance calculations, clustering, and visualization. The projection implies that we are mapping high-dimensional data, such as images, into a low-dimensional space. What do we expect from the projection embedding? We expect that some of both local structure of the data (similar images are close to each other) and global structure (clusters of similar images) are preserved. By projecting the data into a 2D or 3D space, we can visualize the data and uncover the underlying structure. Thus, we say that these 2D or 3D representation of the data are embedded in the original high-dimensional space.

How would you visualize images in 2D or 3D? One possibility would be to take two or three random pixels in the image and do a scatter plot, but which pixels to choose? A better choice would be to take two or three combination of pixels. There you have your embedding!

In practice, the projection embedding is computed in two steps. 1), We extract the features from the images using a pre-trained model, such as VGG16, ResNet, or MobileNet. 2) We use a dimensionality reduction technique to further reduce the number of dimensions to 2 or 3. We can differentiate between linear and non-linear dimensionality reduction techniques:

Linear dimensionality reduction techniques, such as principal components analysis (PCA) or non-negative matrix factorization (NMF) are widely used for this purpose. For instance, by choosing the first two principal components of the data, we can visualize the data in 2D. These two components would explain the maximum variability of the data, being orthogonal to each other. PCA works well for data that is linearly separable.
Non-linear dimensionality reduction techniques are used when the data is not linearly separable. Linear methods maintain equally local and global distances. On the other hand, non-linear methods favour preserving the local over large distances between points, which allows to keep the local structure of the data while separating different groups into clusters. Standard methods are locally-linear embeddings (LLE), UMAP and t-SNE, which we will explore below.

Installation

First, we’ll install the necessary libraries within an environment.

!pip install matplotlib \
      numpy \
      tensorflow \
      tensorflow_hub

Tensorboard can be used within Tensorflow and Pytorch. For Pytorch, you don’t need to install tensorflow, only tensorboard, as explained in Run TensorBoard.

import os
import numpy as np
import PIL.Image as Image
import matplotlib.pyplot as plt

import tensorflow as tf
import tensorflow_hub as hub
from tensorboard.plugins import projector

np.random.seed(42)

Data

We look at CIFAR-10 dataset, which is a collection 60,000 images of 10 classes, with 6000 images per class. There are 50000 training images and 10000 test images. We will use the testset. Images have 32x32 pixels and 3 channels (RGB).

Data download and loading

Data can be download from the CIFAR-10 website or Kaggle, but the simplest method is to download it using keras or pytorch. We show how to it using Keras, which requires to install tensorflow.

from keras.datasets import cifar10

# Download the CIFAR-10 dataset
(x_train, y_train), (x_test, y_test) = cifar10.load_data()
print(x_train.shape, y_train.shape, x_test.shape, y_test.shape)
# We retain only the test set
images = x_test
labels = y_test

We define CIFAR-10 labels, as given on the website.

# Channels and CIFAR-10 classes
channels = ['r', 'g', 'b']
cifar10_labels = ['airplane', 'automobile', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck'] 
cifar10_labels_idx = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
#cifar10_labels_idx = os.listdir(data_dir)

# Labels names
labels_names = [cifar10_labels[label[0]] for label in labels]

# Name for dataset (for saving results)
result_name = 'cifar10'

# Path for results
result_dir = '../../../Results/cifar10/data_anal/'    
if not os.path.exists(result_dir):
    os.makedirs(result_dir)

We create a sprite plot of the dataset, which is a single image that contains all or a subset of the images in the dataset. We will use the sprite to visualize the embedding in an interactive fashion.

def images_to_sprite(data, invert_colors=False):
    if len(data.shape) == 3:
        data = np.tile(data[...,np.newaxis], (1,1,1,3))
    data = data.astype(np.float32)
    min = np.min(data.reshape((data.shape[0], -1)), axis=1)
    data = (data.transpose(1,2,3,0) - min).transpose(3,0,1,2)
    max = np.max(data.reshape((data.shape[0], -1)), axis=1)
    data = (data.transpose(1,2,3,0) / max).transpose(3,0,1,2)
    # Inverting the colors seems to look better for MNIST
    if invert_colors:
       data = 1 - data

    n = int(np.ceil(np.sqrt(data.shape[0])))
    padding = ((0, n ** 2 - data.shape[0]), (0, 0),
            (0, 0)) + ((0, 0),) * (data.ndim - 3)
    data = np.pad(data, padding, mode='constant',
            constant_values=0)
    # Tile the individual thumbnails into an image.
    data = data.reshape((n, n) + data.shape[1:]).transpose((0, 2, 1, 3)
            + tuple(range(4, data.ndim + 1)))
    data = data.reshape((n * data.shape[1], n * data.shape[3]) + data.shape[4:])
    data = (data * 255).astype(np.uint8)
    return data, n

# Subsample the dataset
num_selected = 25*25
images_2d = np.array(images[:num_selected,:,:,:]).reshape(-1, 32, 32, 3)

# Create the sprite image
sprite, n = images_to_sprite(images_2d)
sprite_path = os.path.join(result_dir, f'{result_name}_sprite.png')
Image.fromarray(sprite).save(sprite_path)
print(f'Sprite image saved at: {sprite_path}')
#Image.fromarray(sprite).show()
fig = plt.figure(figsize=(10,10))
plt.imshow(sprite)
plt.axis('off')

Projection embeddings

We load a pretrained network that will serve as a feature extractor. We choose mobilenetV2, which is a small and efficient network trained for classification on Imagenet. To use the network as a feature extractor, we remove the last layer (classification layer) and use the output of the previous layer. Other networks such as resnet and vgg can be used.

Load embedding model

# Load the pre-trained image feature embedding model
embed = hub.load("https://tfhub.dev/google/imagenet/mobilenet_v2_100_224/feature_vector/5")

# required size for MobileNet v2
target_size = (224, 224)  
normalization = 255.0

Create embeddings

To create the embeddings, first, we need to prepare the inputs as required by the model. For MobileNetv2, input images are required to be 224x224 pixels and normalized to the range [0,1]. Then, embeddings are created by projecting images through the model.

def normalize_images(images, normalization=255.0):
    images = images / normalization
    return images

def resize_images_to_tensors(images, image_size):
    # Convert to tf.float32 and expand the dimensions
    images_tensor = tf.cast(images, tf.float32)
    # Resize image
    images_tensor = tf.image.resize(images_tensor, size=image_size)
    return images_tensor

def create_embeddings_from_tensors(images_tensor, model):
    # Generate the image embedding
    embeddings = model(images_tensor)
    return embeddings

def create_embeddings_from_images(images, model, model_type='mobilenet'):
    if model_type == 'mobilenet':
        images = normalize_images(images, normalization=255.0)
        image_size = (224, 224)
    images_tensor = resize_images_to_tensors(images, image_size)
    embeddings = create_embeddings_from_tensors(images_tensor, model)
    return embeddings

# Create embeddings from a randomly selected number of images
idx = np.random.choice(len(images), num_selected, replace=False)
images_selected = images[idx]
labels_ids_selected = labels_ids[idx]
labels_selected = np.array(labels_names)[idx]

embeddings  = create_embeddings_from_images(images_selected, model=embed, model_type='mobilenet')
print(f'embeddings.shape: {embeddings.shape}, images.shape: {images_selected.shape}')

Embedded images have size (N, 1280) where N is the number of images and 1280 is the number of features in the last layer of the model. Thus, the embedding is a 1280-dimensional space.

Log to TensorBoard

TensorBoard provides experiment tracking, visualization and profiling. One can track metrics, model graph, projecting embeddings, and other images and results. For an introduction to TensorBoard, see PyTorch TensorBoard Tutorial.

Here, we focus on the TensorBoard Embedding Projector. Once the embeddings are created, we need to save a training checkpoint to a log directory log_dir, together with the medatada (labels) associated to each embedded data point. This is done in several steps:

Create a tf.Variable that holds the embedding (with name ’embedding’).
Create a checkpoint, `tf.train.Checkpoint`, with this variable and save it with the same name (’embedding.ckpt’). We also need to write the labels as metadata.
Set up the projector config and add the embedding.
Add tensor_name and metadata_path and write the labels to the metadata file.
Add sprite.image_path to the embedding.
Call visualize_embeddings.

# Assume embeddings is a 2D numpy array of shape (num_data_points, embedding_dim)
# and labels is a 1D numpy array of shape (num_data_points,) with the labels.

# Create a variable to hold the embeddings
embedding_var = tf.Variable(embeddings, name='embedding')

# Create and save a checkpoint for the embedding
checkpoint = tf.train.Checkpoint(embedding=embedding_var)
checkpoint.save(os.path.join(log_dir, 'embedding.ckpt'))

# Set up projector config
config = projector.ProjectorConfig()
embedding = config.embeddings.add()
# The name of the tensor will be suffixed by `/.ATTRIBUTES/VARIABLE_VALUE`.
embedding.tensor_name = "embedding/.ATTRIBUTES/VARIABLE_VALUE"
embedding.metadata_path = 'metadata.tsv'

# Write labels to metadata file
metadata_path = os.path.join(log_dir, 'metadata.tsv')
with open(metadata_path, 'w') as metadata_file:
    for label in labels_selected:
        metadata_file.write(f'{label}\n')

# Add path to the sprite image
embedding.sprite.image_path = sprite_path
embedding.sprite.single_image_dim.extend([target_size[0], target_size[1]])
projector.visualize_embeddings(log_dir, config)
print(f'Projector config saved at: {log_dir}')

In order to visualize the projection embedding, run tensorboard with the log_dir and access tensorboard on your browser via the provided URL. You may need to update the visualization on the top right. Run on the shell :

tensorboard - logdir $log_dir

Analysis of embeddings

Tensorboard provides an interactive visualization of the embedding. We can zoom in and out, select points, and see the corresponding images. We can also search for specific images and see the nearest neighbors. You can select 2D or 3D visualization, and images or labels. It provides three dimensionality reduction techniques: PCA, t-SNE, and UMAP.

PCA embedding

The image below shows the PCA embedding of the CIFAR-10 dataset. It is a scatter plot of the first three principal components where each data point is an image taken from the sprite image. In this case, we have selected a red car (label 1). On the top right, it displays the 55 closest images using the cosine distance. Selecting only the 5 to 10 closest images, only red cars or red trucks are found. Increasing this number, other red objects or cars are found.

On the top left, we can choose to display labels or images. The closest label to 1:’automobile’ is 9:’truck’. Then, 0:’airplane’ and 8:’ship’. The furthest labels are 7:’horse’ and 4:’deer’. Labels are 0: ‘airplane’, 1: ‘automobile’, 2: ‘bird’, 3: ‘cat’, 4: ‘deer’, 5: ‘dog’, 6: ‘frog’, 7:’horse’, 8: ‘ship’, 9:’truck’.

The image below shows the UMAP embedding of the CIFAR-10 dataset. We can see that the classes are better separated than in the PCA embedding, as it is a non-linear embedding. Check the appendix section for a description of UMAP.

Below, we show the t-SNE embedding, for which you can play with the slider to tune the perplexity and learning rate. The perplexity is a measure of the effective number of neighbors. It is recommended to use a value between 5 and 50. The learning rate is the step size at each iteration. It is recommended to use a value between 10 and 1000. Then, you stop the algorithm at the iteration number you want. As iterations increase, the classes are better separated. It is recommended to let it converge. However, after a certain number of iterations, images start to overlap within the same class. See below the t-SNE embedding at 3620. At 3620 iterations, the classes are well separated and then start to overlap.

In TensorBoard, on the top bottom, there is a link to a recommendation on how to use t-SNE. Check also the appendix section for a description of t-SNE.

Conclusions

In this article, we have explored a feature extractor and several dimensionality reduction techniques for data visualization in TensorBoard, but which one to use? One could generally start with PCA and then use t-SNE or UMAP if clusters are not well separated. However, we should stick to PCA if interpretability is important (singular vectors and singular values have a meaning while the dimensions of non-linear methods don’t). In [McInnes 2020], authors argue that UMAP is a better choice than t-SNE, as it is faster and more robust.

We have chosen visualization in TensorBoard as provides out-of-the box techniques, it allows interacting with data points and is maintained and improved. Other possibility is to compute them using SciKit-learn and then visualize them in TensorBoard or a Python library of your choice.

We got to the end of this tutorial 😊. The full code is available in the link below!

If you like the tutorial, give it a 👍, share it and subscribe for more!

Appendix: Theory 🤓

In this section, we summarize the theory behind the different embedding projection techniques for further insight: see colab Notebook.

References

TensorFlow embedding projector tutorial: Embedding Projector.
I T Jolliffe. Principal component analysis. Springer-Verlag, New-York, 1986.
L McInnes, J Healy, J Melville. UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction, 2020.
L Van der Maaten, G Hinton. Visualizing Data using t-SNE. Journal of Machine Learning Research, 9, 2579–2605, 2008.

Code

A colab notebook for this tutorial: Notebook 📓