Neural Networks: Unleashing the Power of Latent Space Compression

Julien Pascal
9 min readMay 11, 2023

--

Source:Photo by Rakicevic Nenad from Pexels: https://www.pexels.com/photo/creative-photo-of-person-holding-glass-mason-jar-under-a-starry-sky-1274260/

I. Introduction

Neural networks have taken the world by storm, revolutionizing the field of machine learning and opening up new frontiers in artificial intelligence. These powerful models have led to breakthroughs that were once considered impossible, such as ChatGPT, which can generate human-like responses, DALL-E that creates stunning images from text prompts (see the conclusion of this post for an example), and AlphaGo Zero, which mastered the ancient game of Go with no prior knowledge of human expertise.

What makes neural networks so captivating is their ability to learn complex patterns and relationships in data, transcending the limitations of traditional machine learning techniques. In this blog post, we will delve into the heart of neural networks and uncover their secret sauce: learning through latent space compression. By compressing data into a compact, information-rich latent space, neural networks can efficiently learn representations that are valuable for various tasks like dimensionality reduction and classification.

To demonstrate the power and versatility of neural networks, I will pit them against Principal Component Analysis (PCA), a widely used linear technique for dimensionality reduction.

II. PCA, Neural Networks and autoencoders

When thinking about dimension reduction, generally the first technique that comes to mind is PCA. PCA is a linear technique that aims to reduce the dimensionality of data by projecting it onto principal components that capture the most variance in the data. While PCA is simple and efficient, it is limited by its assumption of linearity and can only capture linear relationships in the data.

Neural networks, on the other hand, can learn non-linear relationships and complex patterns, making them often more powerful and flexible than PCA. A common architecture used for dimensionality reduction using neural networks is an autoencoder. The autoencoder is composed of two key elements: an encoder that maps input data to a lower-dimensional latent space representation and a decoder that reconstructs the input data from the latent space.

The autoencoder learns to compress the data in the latent space by minimizing the “reconstruction error”, which measures the difference between the input an output. One essential element of the autoencoder is the “bottleneck”, denoted by “h” on the schematic representation below.

Schema of an autoencoder. Source: https://commons.wikimedia.or/wiki/File:Autoencoder_schema.png

III. Autoencoder with Python

Below is an implementation of an autoencoder using Python and pytorch. To keep things easy and tractable, let’s use the MNIST dataset, which contains images of hand-written digits.

In the next block of code, I load and normalize the data. Then I define classes to specify the architecture for the encoder and decoder. Because we are using images as an input, I use convolutional layers to efficiently compress the information.

import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from torchvision import datasets, transforms
import torch
import torch.nn as nn
from torch.utils.data import DataLoader

# Check if GPU is available and set the device
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Load the MNIST dataset
transform = transforms.Compose([transforms.ToTensor()])
mnist_train = datasets.MNIST(root='./data', train=True, download=True, transform=transform)
mnist_test = datasets.MNIST(root='./data', train=False, download=True, transform=transform)

# Create DataLoader for training and testing sets
batch_size = 128
train_loader = DataLoader(mnist_train, batch_size=batch_size, shuffle=True)
test_loader = DataLoader(mnist_test, batch_size=batch_size, shuffle=False)

import torch.nn as nn
import torch.nn.functional as F

#Architecture choice based on the following post: https://medium.com/dataseries/convolutional-autoencoder-in-pytorch-on-mnist-dataset-d65145c132ac
class Encoder(nn.Module):
def __init__(self, encoded_space_dim,fc2_input_dim):
super().__init__()

### Convolutional section
self.encoder_cnn = nn.Sequential(
nn.Conv2d(1, 8, 3, stride=2, padding=1),
nn.ReLU(True),
nn.Conv2d(8, 16, 3, stride=2, padding=1),
nn.BatchNorm2d(16),
nn.ReLU(True),
nn.Conv2d(16, 32, 3, stride=2, padding=0),
nn.ReLU(True)
)

### Flatten layer
self.flatten = nn.Flatten(start_dim=1)
### Linear section
self.encoder_lin = nn.Sequential(
nn.Linear(3 * 3 * 32, 128),
nn.ReLU(True),
nn.Linear(128, encoded_space_dim)
)

def forward(self, x):
x = self.encoder_cnn(x)
x = self.flatten(x)
x = self.encoder_lin(x)
return x

class Decoder(nn.Module):
def __init__(self, encoded_space_dim,fc2_input_dim):
super().__init__()
self.decoder_lin = nn.Sequential(
nn.Linear(encoded_space_dim, 128),
nn.ReLU(True),
nn.Linear(128, 3 * 3 * 32),
nn.ReLU(True)
)

self.unflatten = nn.Unflatten(dim=1,
unflattened_size=(32, 3, 3))

self.decoder_conv = nn.Sequential(
nn.ConvTranspose2d(32, 16, 3,
stride=2, output_padding=0),
nn.BatchNorm2d(16),
nn.ReLU(True),
nn.ConvTranspose2d(16, 8, 3, stride=2,
padding=1, output_padding=1),
nn.BatchNorm2d(8),
nn.ReLU(True),
nn.ConvTranspose2d(8, 1, 3, stride=2,
padding=1, output_padding=1)
)

def forward(self, x):
x = self.decoder_lin(x)
x = self.unflatten(x)
x = self.decoder_conv(x)
x = torch.sigmoid(x)
return x

class ConvAutoencoder(nn.Module):
def __init__(self):
super(ConvAutoencoder,self).__init__()
self.encoder = Encoder(encoded_space_dim=10,fc2_input_dim=128)
self.decoder = Decoder(encoded_space_dim=10,fc2_input_dim=128)
def forward(self, x):
x = self.encoder(x)
x = self.decoder(x)
return x

The next block of code initiates the training loop and then plot the results. If you try this at home, hopefully you should see an image similar to the next graph, indicating that the autoencoder does quite a good job at reconstructing the input.

Original images (top) and reconstructed images (bottom). Source: author’s calculations based on the code below.
# Instantiate the autoencoder and optimizer
model = ConvAutoencoder().to(device)

optimizer = torch.optim.Adam(model.parameters(), lr=1e-3)
criterion = nn.MSELoss()

# Train the autoencoder
num_epochs = 20

for epoch in range(num_epochs):
for batch_features, _ in train_loader:
batch_features = batch_features.to(device)
optimizer.zero_grad()
outputs = model(batch_features)
loss = criterion(outputs, batch_features)
loss.backward()
optimizer.step()

print(f"Epoch [{epoch+1}/{num_epochs}], Loss: {loss.item():.4f}")

# Visualize the results
model.eval()

with torch.no_grad():
for batch_features, _ in test_loader:
batch_features = batch_features.to(device)
outputs = model(batch_features)
outputs = outputs.cpu()
break

fig, axes = plt.subplots(2, 10, figsize=(20, 4))
for i in range(10):
axes[0][i].imshow(batch_features[i].cpu().squeeze().numpy(), cmap='gray')
axes[1][i].imshow(outputs[i].squeeze().numpy(), cmap='gray')
axes[0][i].axis('off')
axes[1][i].axis('off')

plt.savefig('encoding_decoding.png')
plt.show()

Now, let’s visualize how information is compressed into the latent space. I chose the latent space to have 10 dimensions, because we have 10 digits. Unfortunately, it is hard to visualize 10 dimensions in a single plot.

The next block of code creates a 2d representation of the latent space using t-SNE (t-distributed Stochastic Neighbor Embedding), which is a dimensionality reduction technique that is particularly well-suited for visualizing high-dimensional data in 2D or 3D. It works by maintaining the pairwise similarity of data points from the high-dimensional space in the low-dimensional space.

from sklearn.manifold import TSNE
import seaborn as sns
import pandas as pd
from sklearn.decomposition import PCA

# Function to extract latent vectors
# (projection of input images into the low-dimensional latent space)
def extract_latent_space(model, loader):
model.eval()
with torch.no_grad():
latent_vectors = []
labels = []

for batch_features, batch_labels in loader:
latent_vectors.append(model.encoder(batch_features.to(device)).view(batch_features.size(0), -1).cpu().numpy())
labels.extend(batch_labels)

latent_vectors = np.vstack(latent_vectors)
labels = np.array(labels)

return latent_vectors, labels

latent_vectors, labels = extract_latent_space(model, test_loader)
tsne = TSNE(n_components=2, random_state=42)
latent_2D = tsne.fit_transform(latent_vectors)

# Function to plot the latent space
def plot_latent_space(latent_2D, labels, title):
df = pd.DataFrame(latent_2D, columns=["x", "y"])
df["label"] = labels

plt.figure(figsize=(10, 8))
sns.scatterplot(data=df, x="x", y="y", hue="label", palette="tab10", legend="full", alpha=0.8)
plt.title(title)
plt.show()

plot_latent_space(latent_2D, labels, "2D Visualization of Latent Space (MNIST)")

You should get an image similar to the plot below, showing how digits are well separated in the latent space. Image of hand-written digits, which are very high-dimensional objects, have been successfully compressed to a low-dimensional space.

2d visualization of the latent space using t-SNE. Source: author’s calculations based on the code above.

IV. Autoencoder with Classifier

To further illustrate the power of dimension reduction, we can now replace the decoder with a classifier. The goal now is not anymore to reproduce the input, but to predict the class of the hand-written digit (0 to 9).

The next block of code does exactly that by building a classifier on top of the encoder. Very quickly, we can build a classifier that recognizes digits with a very high accuracy level, as shown in the accuracy plot below.

class Classifier(nn.Module):
def __init__(self, encoder, num_classes=10):
super(Classifier, self).__init__()
self.encoder = encoder
self.classifier = nn.Sequential(
nn.Linear(10, 10),
nn.Softmax(dim=1)
)

def forward(self, x):
x = self.encoder(x)
x = self.classifier(x)
return x

# Instantiate the classifier using the pre-trained encoder
classifier = Classifier(model.encoder).to(device)

criterion_classifier = nn.CrossEntropyLoss()
optimizer_classifier = torch.optim.Adam(classifier.parameters(), lr=1e-3)

num_epochs_classifier = 20
train_losses = []
train_accuracies = []

# Train classifier
for epoch in range(num_epochs_classifier):
running_loss = 0.0
running_correct = 0
total_samples = 0

for batch_features, batch_labels in train_loader:
batch_features = batch_features.to(device)
batch_labels = batch_labels.to(device)

optimizer_classifier.zero_grad()
outputs = classifier(batch_features)
loss = criterion_classifier(outputs, batch_labels)
loss.backward()
optimizer_classifier.step()

_, predicted = torch.max(outputs.data, 1)
total_samples += batch_labels.size(0)
running_correct += (predicted == batch_labels).sum().item()
running_loss += loss.item()

train_accuracy = 100 * running_correct / total_samples
train_loss = running_loss / len(train_loader)

train_losses.append(train_loss)
train_accuracies.append(train_accuracy)

print(f"Epoch [{epoch+1}/{num_epochs_classifier}], Loss: {train_loss:.4f}, Train Accuracy: {train_accuracy:.2f}%")

# Plot the loss and accuracy
plt.style.use('ggplot')
epochs = list(range(1, num_epochs_classifier+1))

plt.figure(figsize=(12, 4))
plt.subplot(1, 2, 1)
plt.plot(epochs, train_losses, label='Train Loss')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.title('Training Loss')
plt.legend()

plt.subplot(1, 2, 2)
plt.plot(epochs, train_accuracies, label='Train Accuracy')
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.title('Training Accuracy')
plt.legend()

plt.savefig("encoder_classifier.png")
plt.show()
Loss and accuracy during training for the encoder-classifier model. Source: author’s calculations based on the code above.

With the parameters listed on this blog post, I obtain 99.80% accuracy on the test set. You should probably get a similar result if you run the next block of code.

correct = 0
total = 0

# Measure accuracy on the test set
with torch.no_grad():
for batch_features, batch_labels in test_loader:
batch_features = batch_features.to(device)
batch_labels = batch_labels.to(device)

outputs = classifier(batch_features)
_, predicted = torch.max(outputs.data, 1)

total += batch_labels.size(0)
correct += (predicted == batch_labels).sum().item()

print(f"Accuracy of the classifier on the test set: {100 * correct / total:.2f}%")
Accuracy of the classifier on the test set: 98.80%

As a final exercise, I replace the neural network encoder with a PCA encoder. With this new architecture, images are projected to another latent space using simple PCA, keeping the dimension of the latent space unchanged (d=10). As illustrated on the plot below, results are less impressive. Keeping the number of epochs constant, I obtain an accuracy of 80.29% on the test set.


# Flatten and normalize the training images
train_images_flat = mnist_train.data.view(-1, 28 * 28).float() / 255

# Perform PCA
encoded_space_dim = 10
pca = PCA(n_components=encoded_space_dim)
pca.fit(train_images_flat.numpy())

class PC_Encoder(nn.Module):
def __init__(self, pca):
super().__init__()
self.pca = pca

def forward(self, x):
x_flat = x.view(x.size(0), -1) # Flatten the input
x_pca = torch.tensor(self.pca.transform(x_flat.cpu().numpy()), dtype=torch.float).to(device) # Apply PCA
return x_pca

# Instantiate the PCA encoder
encoder = PC_Encoder(pca).to(device)

class ConvAutoencoder(nn.Module):
def __init__(self, encoder):
super(ConvAutoencoder, self).__init__()
self.encoder = encoder
self.classifier = nn.Sequential(
nn.Linear(10, 10),
nn.Softmax(dim=1)
)

def forward(self, x):
x = self.encoder(x)
x = self.classifier(x)
return x

# Instantiate the autoencoder
classifier = ConvAutoencoder(encoder).to(device)

criterion_classifier = nn.CrossEntropyLoss()
optimizer_classifier = torch.optim.Adam(classifier.parameters(), lr=1e-3) #only train classifier's parameters

num_epochs_classifier = 20
train_losses = []
train_accuracies = []

# Train the PCA-classifier
for epoch in range(num_epochs_classifier):
running_loss = 0.0
running_correct = 0
total_samples = 0

for batch_features, batch_labels in train_loader:
batch_features = batch_features.to(device)
batch_labels = batch_labels.to(device)

optimizer_classifier.zero_grad()
outputs = classifier(batch_features)
loss = criterion_classifier(outputs, batch_labels)
loss.backward()
optimizer_classifier.step()

_, predicted = torch.max(outputs.data, 1)
total_samples += batch_labels.size(0)
running_correct += (predicted == batch_labels).sum().item()
running_loss += loss.item()

train_accuracy = 100 * running_correct / total_samples
train_loss = running_loss / len(train_loader)

train_losses.append(train_loss)
train_accuracies.append(train_accuracy)

print(f"Epoch [{epoch+1}/{num_epochs_classifier}], Loss: {train_loss:.4f}, Train Accuracy: {train_accuracy:.2f}%")

# Plot the loss and accuracy
plt.style.use('ggplot')
epochs = list(range(1, num_epochs_classifier+1))

plt.figure(figsize=(16, 6))
plt.subplot(1, 2, 1)
plt.plot(epochs, train_losses, label='Train Loss')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.title('Training Loss')
plt.legend()

plt.subplot(1, 2, 2)
plt.plot(epochs, train_accuracies, label='Train Accuracy')
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.title('Training Accuracy')
plt.legend()

plt.savefig("PCA_classifier.png")
plt.show()
Loss and accuracy during training for the PCA-classifier model. Source: author’s calculations based on the code above.
# Test the PCA classifier
correct = 0
total = 0

# Measure accuracy on the test set
with torch.no_grad():
for batch_features, batch_labels in test_loader:
batch_features = batch_features.to(device)
batch_labels = batch_labels.to(device)

outputs = classifier(batch_features)
_, predicted = torch.max(outputs.data, 1)

total += batch_labels.size(0)
correct += (predicted == batch_labels).sum().item()

print(f"Accuracy of the PCA-classifier on the test set: {100 * correct / total:.2f}%")
Accuracy of the PCA-classifier on the test set: 80.29%

V. Conclusion

This blog post illustrates how neural networks can learn by compressing data in a latent space, allowing for more accurate representations and improved performance on tasks like dimensionality reduction and classification.

The ability of neural networks to model complex patterns and non-linear relationships makes them a powerful tool in machine learning, and their inherent capacity for compression in latent spaces allows them to learn efficient representations for a variety of tasks.

To conclude, let me ask an image-to-text model to produce an image to illustrate the idea of “compression into a latent space”. The resulting image, just below, does seem to capture the idea of an encoding into a lower dimensional space.

Author’s creation, using DeepAI text-to-image model: https://deepai.org/ with the prompt “Compression into a latent space of a neural network. Illustrate the idea of dimension reduction.”

Extra resources

If you liked this post, you may want to check some of my other writings on this platform on related topics:

If you are in a benevolent mood, you can even become a Medium member using the link below. I will receive a very small amount (about half an espresso a month), which will encourage me to put more content on this platform:

--

--

Julien Pascal

Economist. In my spare time, I blog on computational techniques, econometrics, machine learning.