Autoencoders Explained

5 min readJun 16, 2024

Part 4: Variational Autoencoders

VAEs are part of a family of generative models that learn a probability distribution over the input data, allowing them to generate new samples that are similar to the training data. In essence, VAEs learn to model the underlying probability distribution of the input data, which allows them to generate new data points that are similar to the training data.

VAEs use a loss function that combines two terms: a reconstruction loss, which measures the ability of the decoder to reconstruct the input from the latent space, and a regularization term, which measures the divergence between the learned distribution and a prior distribution over the latent space. The regularization term is usually implemented using the Kullback-Leibler divergence, which measures the difference between two probability distributions.

- Contractive: This term penalizes the hidden representation for being too sensitive to small changes in the input data, and encourages it to be more invariant and stable.

Contractive autoencoder: In a contractive autoencoder, the regularization term is based on the contractive property of the autoencoder. The contractive property refers to the fact that Contractive autoencoders work on the basis that similar inputs should have similar encodings and a similar latent space representation. It means that the latent space should not vary by a huge amount for minor variations in the input.

A contractive autoencoder (CAE) is a type of autoencoder neural network that is designed to learn a compressed representation of the input data that is insensitive to small perturbations. This can help prevent overfitting and improve the generalization performance of the autoencoder. When an autoencoder is overfitting, it is essentially memorizing the training data instead of learning a useful representation of the data. This can result in poor reconstruction quality and reduced performance on tasks such as data compression or anomaly detection.

In the context of machine learning, perturbations refer to small changes or variations made to the input data in order to study the behaviour of a model or to improve its robustness by testing its ability to generalize to slightly different inputs. These perturbations can be intentional or random, and can be applied to different types of data, such as images, text, and numerical data. For example, in the context of image classification, perturbations can be applied to an image by adding small amounts of noise or changing the brightness, contrast, or orientation of the image. By doing so, we can study how the classification model responds to different types of input variations and assess its robustness to different types of noise or image transformations.

This is achieved by adding a penalty term to the loss function that measures the sensitivity of the encoding layer to small changes in the input data. The total loss function of a contractive autoencoder can be expressed as follows:

L(x, x_hat) = L_recon(x, x_hat) + λ * ||J_f(x)||²

The total loss function of a contractive autoencoder is a combination of two terms: the reconstruction loss (L_recon) and a regularization term (lambda * ||J_f(x)||²). The reconstruction loss is the same as in a standard autoencoder and measures the difference between the input and the reconstructed output. The regularization term, on the other hand, penalizes the encoding layer for being too sensitive to small changes in the input data.

Where: x is the input data

x_hat is the reconstructed data from the encoder-decoder network

λ is a hyperparameter that controls strength of the penalty term which measures the sensitivity of the encoding layer to small changes in the input data.

J_f(x) is the Jacobian matrix of the encoded representation f(x) with respect to the input data x

||.|| is the Frobenius norm

The squared Frobenius norm of the Jacobian matrix measures the sensitivity of the encoding layer to small changes in the input data. A higher norm value implies that the encoding layer is more sensitive to input perturbations, and a lower norm value implies that the encoding layer is more robust to input variations. Therefore, minimizing the squared Frobenius norm encourages the encoding layer to learn a more robust representation of the input data.

A contractive autoencoder is called so because of the contractive nature of its encoding layer. The contractive autoencoder is designed to learn a compressed representation of the input data by forcing the encoder to be robust to small variations in the input. As a result, the encoding layer “contracts” the input data into a compressed representation that captures the essential features of the data.

Thus regularization terms can help the autoencoder learn more robust and generalizable features from the data.

Here’s python implementation of contractive autoencoder:

import tensorflow as tf
import matplotlib.pyplot as plt

# Load the MNIST dataset
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()

# Reshape and normalize the data
x_train = x_train.reshape(-1, 28, 28, 1) / 255.0
x_test = x_test.reshape(-1, 28, 28, 1) / 255.0

# Define the encoder
encoder = tf.keras.Sequential([
    tf.keras.layers.Conv2D(32, 3, activation='relu', padding='same', input_shape=(28, 28, 1)),
    tf.keras.layers.MaxPooling2D(),
    tf.keras.layers.Conv2D(64, 3, activation='relu', padding='same'),
    tf.keras.layers.MaxPooling2D(),
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(64, activation='relu'),
])

# Define the decoder
decoder = tf.keras.Sequential([
    tf.keras.layers.Dense(7 * 7 * 64, activation='relu'),
    tf.keras.layers.Reshape((7, 7, 64)),
    tf.keras.layers.Conv2DTranspose(64, 3, strides=2, activation='relu', padding='same'),
    tf.keras.layers.Conv2DTranspose(32, 3, strides=2, activation='relu', padding='same'),
    tf.keras.layers.Conv2DTranspose(1, 3, activation='sigmoid', padding='same'),
])

# Define the contractive autoencoder loss function
class ContractiveAutoencoderLoss(tf.keras.losses.Loss):
    def __init__(self, penalty_lambda=1e-4):
        super().__init__()
        self.penalty_lambda = penalty_lambda

    def call(self, y_true, y_pred):
        # Calculate the reconstruction loss
        reconstruction_loss = tf.keras.losses.mean_squared_error(y_true, y_pred)

        # Calculate the contractive penalty
        with tf.GradientTape() as tape:
            tape.watch(y_true)
            encoded = encoder(y_true)
        gradients = tape.gradient(encoded, y_true)
        gradients_norm = tf.norm(gradients, axis=-1)
        contractive_penalty = tf.reduce_mean(gradients_norm ** 2)

        # Return the total loss
        return reconstruction_loss + self.penalty_lambda * contractive_penalty

# Define the autoencoder
autoencoder = tf.keras.Sequential([encoder, decoder])

# Compile the autoencoder
autoencoder.compile(optimizer='adam', loss=ContractiveAutoencoderLoss(penalty_lambda=1e-4))

# Train the autoencoder
autoencoder.fit(x_train, x_train, epochs=10, batch_size=128)

# Select an image from the test set
image_index = 0
test_image = x_test[image_index:image_index+1]

# Encode the test image and decode the latent vector
latent_vector = encoder(test_image)
generated_image = decoder(latent_vector)

# Plot the original image and the generated image
plt.subplot(1, 2, 1)
plt.imshow(test_image.squeeze(), cmap='gray')
plt.title('Original Image')

plt.subplot(1, 2, 2)
plt.imshow(generated_image.numpy().squeeze(), cmap='gray')
plt.title('Generated Image')

plt.show()

Closing note — In the final part, we’ll look at Denoising Autoencoders and how they can enhance data quality. You’re almost there; let’s finish strong!

Autoencoders Explained

Written by om pramod