Demystifying Neural Networks: Image Compression with AutoEncoder

5 min readJan 26, 2024

Source: https://en.wikipedia.org/wiki/Autoencoder

This article is part of the series Demystifying Neural Networks.

Introduction

In the realm of machine learning, AutoEncoders hold a special place. They are like the magicians of neural networks, capable of incredible feats like compressing data, removing noise, and even generating realistic images. If you’re ready to unlock the secrets of autoencoders, then this blog post is for you!

What is an AutoEncoder?

At their core, AutoEncoders are specialized neural networks designed to cleverly learn how to reconstruct their input data. They achieve this through a process of:

Encoding: An AutoEncoder has an encoder that compresses (or encodes) the input data into a smaller, hidden representation called the code or bottleneck. This code aims to capture the most crucial features or patterns within the data.
Decoding: The decoder part of the AutoEncoder takes the compressed code and attempts to reconstruct the original input data as closely as possible.

The magic of autoencoders lies in this self-supervision. The network learns by trying to reproduce its own input, minimizing the difference (loss) between the original input and its reconstruction.

Why AutoEncoders?

Autoencoders might seem simple, but they pack quite a punch! Let’s explore some of their key uses:

Dimensionality Reduction: If you have massive datasets with vast numbers of features, autoencoders can help simplify. They learn to identify the most important features and discard the less significant ones, effectively reducing the dimensions of your data.
Denoising: The world is full of noisy data — think of distorted images or imperfect sensors. Autoencoders can be trained on data with noise added. Because they aim for perfect reconstruction, they become experts in cleaning up and recovering the original signals.
Feature Extraction: The ‘code’ created by the encoder is a super-condensed representation of your data. This code can be used as highly informative features for other machine learning tasks like classification.
Generative Models: Certain types of autoencoders (such as Variational Autoencoders) can learn to generate new data that resembles your original dataset. Imagine creating entirely new images that look like they belong to your original collection!

How do AutoEncoders Work?

Fundamentally, AutoEncoders are neural networks. Think of them as consisting of interconnected nodes arranged in layers. Here’s a basic breakdown of how they work:

Input: The input data (say, an image) is fed into the input layer of the autoencoder.
Encoder: The data travels through the encoder layers. Each layer applies transformations to the data, compressing the information into the hidden code.
Code: This ‘code’ lives in the middle layer of the autoencoder, representing a distilled essence of your input data.
Decoder: The decoding layers take the code and attempt to expand it back into the original representation.
Output: The output layer produces the autoencoder’s reconstruction of the input data.
Learning: The loss (difference) between the input and the reconstruction is calculated. During training, the autoencoder’s weights are continuously updated to minimize this loss.

Hyperparameters

Let’s dive into the the most important hyperparameters for training AutoEncoders:

Bottleneck Size: The number of neurons in the bottleneck layer (the smallest, middle layer) is crucial. It determines how much the input data is compressed. Fewer neurons mean more compression.
Number of Layers and Layer Sizes: This refers to how many layers are in the AutoEncoder and how many neurons are in each layer. The structure typically narrows down to a smaller ‘bottleneck’ layer in the middle, which is where the data gets compressed.
Activation Functions: These are functions like ReLU or sigmoid used in the neurons. They help the autoencoder learn non-linear patterns in the data.
Loss Function: This is a way to measure how well the autoencoder is rebuilding the input data. Common choices are Mean Squared Error for continuous data or binary cross-entropy for binary data.
Regularization Techniques: These are methods like L1/L2 regularization or dropout. They are used to prevent the autoencoder from memorizing the input data too closely (overfitting), helping it to generalize better.

Example

The example demonstrates the core function of AutoEncoders: learning a compressed and informative representation of data. In this specific case, we’re working with the MNIST dataset, which consists of images of handwritten digits.

Here’s what the AutoEncoder is trying to achieve:

Dimensionality Reduction: The original images have 784 pixels (28 pixels x 28 pixels). The autoencoder’s goal is to identify the most essential features of these handwritten digits and encode them in a much smaller code (e.g., 64 dimensions in our example).
Reconstruction: With just this compressed code, the autoencoder aims to reconstruct the original images as accurately as possible. The success of the reconstruction highlights how well it has managed to capture the essence of the digits.

The code is available in this colab notebook.

import os
import keras
from keras.datasets import mnist
from keras.layers import Dense
from keras.models import Sequential
import matplotlib.pyplot as plt

# Load MNIST dataset
(x_train, _), (x_test, _) = mnist.load_data()

# Preprocess images
x_train = x_train.reshape(-1, 784) / 255.0
x_test = x_test.reshape(-1, 784) / 255.0

# Encoder definition
encoder = Sequential([
    Dense(128, activation='relu', input_shape=(784,)),
    Dense(64, activation='relu')
])
# We can compile the model here, otherwise there will be a warning when it is
# used, but it is optional.

# Decoder definition
decoder = Sequential([
    Dense(128, activation='relu', input_shape=(64,)),
    Dense(784, activation='sigmoid')
])
# We can compile the model here, otherwise there will be a warning when it is
# used, but it is optional.

# Autoencoder definition
autoencoder = Sequential([
    encoder,
    decoder
])

# Compile and train the autoencoder
autoencoder.compile(optimizer='adam', loss='binary_crossentropy')
autoencoder.fit(x_train, x_train, epochs=10, batch_size=256)

# Evaluation
loss = autoencoder.evaluate(x_test, x_test, verbose=0)
print("Test Loss:", loss)

# Save models
encoder_model_path = 'encoder_model.h5'
decoder_model_path = 'decoder_model.h5'

encoder.save(encoder_model_path)
decoder.save(decoder_model_path)

def get_file_size(file_path):
    size = os.path.getsize(file_path)
    for unit in ['bytes', 'KB', 'MB', 'GB', 'TB']:
        if size < 1024:
            return f"{size:.2f} {unit}"
        size /= 1024

encoder_model_size = get_file_size(encoder_model_path)
decoder_model_size = get_file_size(decoder_model_path)
print("Encoder model size:", encoder_model_size)
print("Decoder model size:", decoder_model_size)

# Load the models
loaded_encoder = keras.models.load_model('encoder_model.h5')
loaded_decoder = keras.models.load_model('decoder_model.h5')

# Use loaded models for prediction
#
# In a real deployment, the encoder will be on the server side, the decoder will
# be on the client side, and the encoded images are what's transferred from the
# server to the client.
encoded_imgs = loaded_encoder.predict(x_test)
decoded_imgs = loaded_decoder.predict(encoded_imgs)

# Visualization
n = 10
plt.figure(figsize=(20, 4))
for i in range(n):
    # Original image
    ax = plt.subplot(2, n, i + 1)
    plt.imshow(x_test[i].reshape(28, 28))
    plt.gray()
    ax.get_xaxis().set_visible(False)
    ax.get_yaxis().set_visible(False)

    # Reconstructed image
    ax = plt.subplot(2, n, i + 1 + n)
    plt.imshow(decoded_imgs[i].reshape(28, 28))
    plt.gray()
    ax.get_xaxis().set_visible(False)
    ax.get_yaxis().set_visible(False)
plt.show()

Output:

Test Loss: 0.08285020291805267
Encoder model size: 439.05 KB
Decoder model size: 442.12 KB

Conclusion

AutoEncoders, with their unique architecture, have opened up new pathways in machine learning, offering a powerful tool for understanding and leveraging complex datasets. Whether you’re a seasoned ML practitioner or just starting, exploring AutoEncoders can be a rewarding journey into the depths of neural network capabilities.

References

https://en.wikipedia.org/wiki/Autoencoder

Basics of Autoencoders