Understanding Autoencoders (Part II)

What is the magic behind autoencoders? Giving code examples!

Jelal Sultanov
AI³ | Theory, Practice, Business
5 min readJul 19, 2020

--

In the previous article,(if you missed it here is the link ) I explained the potential use cases of autoencoders and shortly explained what autoencoders are. In this article, I would like to explain two types of autoencoders and give some code implementation of them in Python and Keras.

Let’s start by explaining how they work.

Undercomplete Autoencoders.

The main task of undercomplete autoencoders are not just copying input to the output, but instead, learn useful properties of input data.

One of the simplest ways to construct autoencoders is to constrain the number of nodes in hidden layers, so that number of nodes in hidden layers is less than in the input layer, by which the flow of information through network is limited. By this, we get latent representation h original input data. Autoencoder whose code(latent representation of input data) dimension is less than the input dimension is called undercomplete. This type of autoencoder enables us to capture the most notable features of the training data.

The loss function of undercomplete autoencoders is just their loss function:

L(x,g(f(x))).

where L is a loss function that penalizes g(f(x)) for being dissimilar from x, such as the mean squared error.

By admitting that neural networks can learn nonlinear relationships in data, undercomplete autoencoders become a more powerful and nonlinear form of PCA for higher-dimensional data, autoencoders are capable of learning complex representation of data, which can be then used in visualization in low-dimension space.

For deep autoencoders, we should also be aware of the capacity of encoder and decoder. Even if we have bottleneck if one node in a hidden layer, it is still possible for encoder and decoder to learn copying input to output, rather than learn important features and representation about input data. Thus, we should also pay attention to how big are our encoders and decoders and control

Variantional Autoencoders

Variational Autoencoders are types of generative models. In order to better understand the concept of VAEs, let's imagine the autoencoder model that was trained on a large image dataset of planes with an encoding dimension of 5. The typical autoencoder will learn the descriptive attributes of planes such as wings, gears, fuselage, and other descriptive attributes so that it could describe an observation in more compressed representation.

In the example above the attributes of an object (in our case it is plane) are described with a single value. But what would that mean for you? In Variational Autoencoders those values can be defined as a range of possible values for each attribute.

With this example, we can now represent each latent attribute as a probability distribution. After we can sample randomly from that distribution for each attribute so it can be then given as input to the decoder.

As the encoder model outputs a probability distribution, we are making a constant, smooth latent space representation. The value that is close to each other on probability distribution can lead to a similar deconstruction of those latent attributes. In other words, closer values are to each other, a more similar output by the decoder we get.

Python & Keras

Don’t forget to try this code out in your IDE!

Setup:

import numpy as np
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers

Create Sampling Layer:

class Sampling(layers.Layer):
"""Uses (z_mean, z_log_var) to sample z, the vector encoding a digit."""

def call(self, inputs):
z_mean, z_log_var = inputs
batch = tf.shape(z_mean)[0]
dim = tf.shape(z_mean)[1]
epsilon = tf.keras.backend.random_normal(shape=(batch, dim))
return z_mean + tf.exp(0.5 * z_log_var) * epsilon

Build Encoder:

latent_dim = 2

encoder_inputs = keras.Input(shape=(28, 28, 1))
x = layers.Conv2D(32, 3, activation="relu", strides=2, padding="same")(encoder_inputs)
x = layers.Conv2D(64, 3, activation="relu", strides=2, padding="same")(x)
x = layers.Flatten()(x)
x = layers.Dense(16, activation="relu")(x)
z_mean = layers.Dense(latent_dim, name="z_mean")(x)
z_log_var = layers.Dense(latent_dim, name="z_log_var")(x)
z = Sampling()([z_mean, z_log_var])
encoder = keras.Model(encoder_inputs, [z_mean, z_log_var, z], name="encoder")
encoder.summary()

Build Decoder:

latent_inputs = keras.Input(shape=(latent_dim,))
x = layers.Dense(7 * 7 * 64, activation="relu")(latent_inputs)
x = layers.Reshape((7, 7, 64))(x)
x = layers.Conv2DTranspose(64, 3, activation="relu", strides=2, padding="same")(x)
x = layers.Conv2DTranspose(32, 3, activation="relu", strides=2, padding="same")(x)
decoder_outputs = layers.Conv2DTranspose(1, 3, activation="sigmoid", padding="same")(x)
decoder = keras.Model(latent_inputs, decoder_outputs, name="decoder")
decoder.summary()

Training Variational Autoencoder:

(x_train, _), (x_test, _) = keras.datasets.mnist.load_data()
mnist_digits = np.concatenate([x_train, x_test], axis=0)
mnist_digits = np.expand_dims(mnist_digits, -1).astype("float32") / 255

vae = VAE(encoder, decoder)
vae.compile(optimizer=keras.optimizers.Adam())
vae.fit(mnist_digits, epochs=30, batch_size=128)

Displaying a grid of sampled digits:

import matplotlib.pyplot as plt


def plot_latent(encoder, decoder):
# display a n*n 2D manifold of digits
n = 30
digit_size = 28
scale = 2.0
figsize = 15
figure = np.zeros((digit_size * n, digit_size * n))
# linearly spaced coordinates corresponding to the 2D plot
# of digit classes in the latent space
grid_x = np.linspace(-scale, scale, n)
grid_y = np.linspace(-scale, scale, n)[::-1]

for i, yi in enumerate(grid_y):
for j, xi in enumerate(grid_x):
z_sample = np.array([[xi, yi]])
x_decoded = decoder.predict(z_sample)
digit = x_decoded[0].reshape(digit_size, digit_size)
figure[
i * digit_size : (i + 1) * digit_size,
j * digit_size : (j + 1) * digit_size,
] = digit

plt.figure(figsize=(figsize, figsize))
start_range = digit_size // 2
end_range = n * digit_size + start_range + 1
pixel_range = np.arange(start_range, end_range, digit_size)
sample_range_x = np.round(grid_x, 1)
sample_range_y = np.round(grid_y, 1)
plt.xticks(pixel_range, sample_range_x)
plt.yticks(pixel_range, sample_range_y)
plt.xlabel("z[0]")
plt.ylabel("z[1]")
plt.imshow(figure, cmap="Greys_r")
plt.show()


plot_latent(encoder, decoder)

Cool Articles:

“MusicVAE: A Hierarchical Latent Vector Model”

Conclusion:

Thanks for reading! If you enjoyed this article, please hit the clap button 👏 as many times as you can. It would mean a lot and encourage me to keep writing stories like this. Let’s connect on Twitter!🐦

--

--