Autoencoders — Guide and Code in TensorFlow 2.0

Published in

Red Buffer

8 min readAug 17, 2019

When we talk about Neural Networks or Machine Learning in general. We talk about mapping some input to some output by some learnable function.

Autoencoders is a class of neural networks where you map the input to an output that is exactly the input itself.

So what exactly is the point of Autoencoders?

Let’s try to find that out in this blog. Along with how it’s used and what are some of its applications.

And code it all in TensorFlow 2.0

Autoencoders

Autoencoders are a class of Neural Networks that try to reconstruct the input itself. They are unsupervised in nature.

A general structure of Autoencoders include an Encoder and Decoder.

An autoencoder is just like a normal neural network. It can be made like a simple neural network with the output layer producing the same output shape of the input.

Encoder

The first part of the autoencoder is the encoder part. The job of the encoder part is to encode the information into a smaller denser representation. The idea is that this dense representation can be used to decode to the original input.

Decoder

The decoder takes this dense representation made by the encoder and tries to reconstruct the output.

Purpose

So what’s the purpose of the auto encoders if it’s going to produce the same output?

There are many reasons to train an auto encoder. One main reason is Dimensionality Reduction. If some input can be represented in some lower spatial resolution without losing it’s meaning. It’s good enough of a reason.

The compression depends on the size of the last encoder layer. If it’s smaller, you might end up losing some information. But more compression. If it’s larger than it’s good, but lesser compression.

The encoded presentation is a dense representation and can be used as an encryption and compression for input.

This encoded layer is also called bottleneck layer.

Autoencoders in Practice

Autoencoders can be made using any setting of Neural Network. Autoencoders can be made using all Fully connected Dense Layers or it can be a Convolutional Neural Network.

Fully connected Autoencoder

A fully connected Autoencoder would look something like this

Let’s try to code some of it in TensorFlow 2.0

Importing basic stuff, enabling eager execution. And loading MNIST data into our memory and scaling them to 0–1 range. Note how we only load x_train and x_test and not their target variables.

import tensorflow as tf
tf.enable_eager_execution()
(x_train, _), (x_test, _) = tf.keras.datasets.mnist.load_data()
x_train = x_train.astype(‘float32’) / 255.
x_test = x_test.astype(‘float32’) / 255.

Let’s try first with a Fully Connected Auto Encoder.

class FullyConnectedAutoEncoder(tf.keras.Model):
    def __init__(self):
        super(FullyConnectedAutoEncoder, self).__init__()
        self.flatten_layer = tf.keras.layers.Flatten()
        self.dense1 = tf.keras.layers.Dense(64, activation=tf.nn.relu)
        self.dense2 = tf.keras.layers.Dense(32, activation=tf.nn.relu)
        
        
        self.bottleneck = tf.keras.layers.Dense(16, activation=tf.nn.relu)
    
        self.dense4 = tf.keras.layers.Dense(32, activation=tf.nn.relu)
        self.dense5 = tf.keras.layers.Dense(64, activation=tf.nn.relu)
        
        self.dense_final = tf.keras.layers.Dense(784)
        
    
    def call(self, inp):
        x_reshaped = self.flatten_layer(inp)
        x = self.dense1(x_reshaped)
        x = self.dense2(x)
        x = self.bottleneck(x)
        x = self.dense4(x)
        x = self.dense5(x)
        x = self.dense_final(x)
        return x, x_reshaped

In the snippet above we’ve created a fully connected autoencoder model. Here we are using the Keras api to define layers. The bottleneck layer has 16 units. Which means we are going to compress our input from 784 to 16.

Now we need to define a loss function and the training flow

def loss(x, x_bar):
    return tf.losses.mean_squared_error(x, x_bar)def grad(model, inputs):
    with tf.GradientTape() as tape:
        reconstruction, inputs_reshaped = model(inputs)
        loss_value = loss(inputs_reshaped, reconstruction)
    return loss_value, tape.gradient(loss_value, model.trainable_variables), inputs_reshaped, reconstruction

Here we’re using the Mean Squared Error term for our loss

model = FullyConnectedAutoEncoder()optimizer = tf.train.AdamOptimizer(learning_rate=0.001)
global_step = tf.Variable(0)num_epochs = 50
batch_size = 4for epoch in range(num_epochs):
    print("Epoch: ", epoch)
    for x in range(0, len(x_train), batch_size):
        x_inp = x_train[x : x + batch_size]
        loss_value, grads, inputs_reshaped, reconstruction = grad(model, x_inp)
        optimizer.apply_gradients(zip(grads, model.trainable_variables),
                              global_step)
        
        if global_step.numpy() % 200 == 0:
            print("Step: {},         Loss: {}".format(global_step.numpy(),
                                          loss(inputs_reshaped, reconstruction).numpy()))

This is the main training loop.

Convolutional Autoencoder

The same approach can be used with a convolutional neural networks. We can use upsampling or deconvolutional layers to decode and use simple convolutional layers to downsample (encode).

A convolutional autoencoder looks like this

Let’s code in TensorFlow 2.0,

x_train = tf.reshape(x_train, (len(x_train), 28, 28, 1))
x_test = tf.reshape(x_test, (len(x_test), 28, 28, 1))

Here we are reshaping to fit in our model.

class ConvNetAutoEncoder(tf.keras.Model):
    def __init__(self):
        super(ConvNetAutoEncoder, self).__init__()
        
        self.conv1 = tf.keras.layers.Conv2D(16, (3, 3), activation='relu', padding='same')
        self.maxp1 = tf.keras.layers.MaxPooling2D((2, 2), padding='same')
        self.conv2 = tf.keras.layers.Conv2D(8, (3, 3), activation='relu', padding='same')
        self.maxp2 = tf.keras.layers.MaxPooling2D((2, 2), padding='same')
        self.conv3 = tf.keras.layers.Conv2D(8, (3, 3), activation='relu', padding='same')
        
        self.encoded = tf.keras.layers.MaxPooling2D((2, 2), padding='same')
        
        self.conv4 = tf.keras.layers.Conv2D(8, (3, 3), activation='relu', padding='same')
        self.upsample1 = tf.keras.layers.UpSampling2D((2, 2))
        self.conv5 = tf.keras.layers.Conv2D(8, (3, 3), activation='relu', padding='same')
        self.upsample2 = tf.keras.layers.UpSampling2D((2, 2))
        self.conv6 = tf.keras.layers.Conv2D(16, (3, 3), activation='relu')
        self.upsample3 = tf.keras.layers.UpSampling2D((2, 2))
        self.conv7 = tf.keras.layers.Conv2D(1, (3, 3), activation='sigmoid', padding='same')
        
    
    def call(self, x):
        x = self.conv1(x)
        x = self.maxp1(x)
        x = self.conv2(x)
        x = self.maxp2(x)
        x = self.conv3(x)
        x = self.encoded(x)
        x = self.conv4(x)
        x = self.upsample1(x)
        x = self.conv5(x)
        x = self.upsample2(x)
        x = self.conv6(x)
        x = self.upsample3(x)
        x = self.conv7(x)
        return xdef loss(x, x_bar):
    return tf.losses.mean_squared_error(x, x_bar)def grad(model, inputs, targets):
    with tf.GradientTape() as tape:
        reconstruction = model(inputs)
        loss_value = loss(targets, reconstruction)
    return loss_value, tape.gradient(loss_value, model.trainable_variables), reconstruction

Training loop is defined as

model = ConvNetAutoEncoder()optimizer = tf.train.AdamOptimizer(learning_rate=0.001)
global_step = tf.Variable(0)num_epochs = 50
batch_size = 4for epoch in range(num_epochs):
    print("Epoch: ", epoch)
    for x in range(0, len(x_train), batch_size):
        x_inp = x_train[x : x + batch_size]
        loss_value, grads, reconstruction = grad(model, x_inp, x_inp)
        optimizer.apply_gradients(zip(grads, model.trainable_variables),
                              global_step)
        
        if global_step.numpy() % 200 == 0:
            print("Step: {},         Loss: {}".format(global_step.numpy(),
                                          loss(x_inp, reconstruction).numpy()))

Autoencoder types

In recent times, people have used autoencoders for various purposes. We can use auto encoders as classification algorithms. We train on one instance of targets. And once trained, we can use pass an input to see if it’s reconstruction is closer to the required input.

Same can be done with Anomaly detection. We can train on normal data without anomalies and then use autoencoders to check if an example is normal or not by passing it through the autoencoder.

But here we are going to discuss in detail two types of Autoencoders,

Namely the Denoising Autoencoder and Variational Autoencoder.

Denoising Autoencoder

The problem with simple autoencoder is that sometimes they tend to learn an identity function, that is highly specialized overfitted learning. To avoid that, we add some noise to the input. This noise can be turning off some bits in the input. Or adding some extra noise in the training examples.

The only thing we’re going to add is

noise_factor = 0.5
x_train_noisy = x_train + noise_factor * tf.random_normal(shape=x_train.shape) 
x_test_noisy = x_test + noise_factor * tf.random_normal(shape=x_test.shape)x_train_noisy = tf.clip_by_value(x_train_noisy, clip_value_min=0., clip_value_max=1.)
x_test_noisy = tf.clip_by_value(x_test_noisy, clip_value_min=0., clip_value_max=1.)

Let’s see how we can add this into our Convolutional Autoencoder training loop,

model = ConvNetAutoEncoder()optimizer = tf.train.AdamOptimizer(learning_rate=0.001)
global_step = tf.Variable(0)num_epochs = 50
batch_size = 4for epoch in range(num_epochs):
    print("Epoch: ", epoch)
    for x in range(0, len(x_train), batch_size):
        x_inp_noisy = x_train_noisy[x : x + batch_size]
        x_inp = x_train[x : x + batch_size]
        loss_value, grads, reconstruction = grad(model, x_inp_noisy, x_inp)
        optimizer.apply_gradients(zip(grads, model.trainable_variables),
                              global_step)
        
        if global_step.numpy() % 200 == 0:
            print("Step: {},         Loss: {}".format(global_step.numpy(),
                                          loss(x_inp, reconstruction).numpy()))

The comparison for computing loss will still be with the reconstructed and the original.
This keeps the function generalized.

Variational Autoencoders

In test time, we only need to use the decoder part.

But the problem here is that there is no way of knowing what to input. There is no distribution defined. If it was trained on MNIST data, we don’t know what to input in the bottleneck layer. If we input some random vector, we could end up with some garbage output.

In Variational autoencoders, we don’t map directly from an input to a bottleneck vector, instead we map it to a distribution.

So the bottleneck vector is replaced by two vectors. Mean and Standard Deviation of the distribution.

So we would want our distribution to be close to the mean of 0 and standard deviation of 1.

So now at test time, we can just sample from the distribution and feed it to the decoder network.

So now we are left with just sampling from the mean mu and standard deviation sigma, but the problem here is that we need to make it so that we can back propagate through it.

So what we do is that we add a fixed standard gaussian with 0 mean and 1 standard deviation. We will call epsilon.

So the equation will now look like

μ + σ * ε

Where μ and σ are learnable parameters.

The loss of variational autoencoders is different than the simple ones that we’ve been using.

The loss function will be a sum of both MSE and KL — Divergence (for comparing distributions)

This makes variational autoencoder a generative model and is just like GANS. To learn more about GANs, read my other blog.

Let’s code a convolutional Variational Autoencoder in TensorFlow 2.0

class VariationalConvNetAutoEncoder(tf.keras.Model):
    def __init__(self):
        super(VariationalConvNetAutoEncoder, self).__init__()
        self.conv1 = tf.keras.layers.Conv2D(16, (3, 3), activation='relu', padding='same')
        self.maxp1 = tf.keras.layers.MaxPooling2D((2, 2), padding='same')
        self.conv2 = tf.keras.layers.Conv2D(8, (3, 3), activation='relu', padding='same')
        self.maxp2 = tf.keras.layers.MaxPooling2D((2, 2), padding='same')
        self.conv3 = tf.keras.layers.Conv2D(8, (3, 3), activation='relu', padding='same')
        
        self.mu = tf.keras.layers.MaxPooling2D((2, 2), padding='same')
        self.sigma = tf.keras.layers.MaxPooling2D((2, 2), padding='same')
        
        self.conv4 = tf.keras.layers.Conv2D(8, (3, 3), activation='relu', padding='same')
        self.upsample1 = tf.keras.layers.UpSampling2D((2, 2))
        self.conv5 = tf.keras.layers.Conv2D(8, (3, 3), activation='relu', padding='same')
        self.upsample2 = tf.keras.layers.UpSampling2D((2, 2))
        self.conv6 = tf.keras.layers.Conv2D(16, (3, 3), activation='relu')
        self.upsample3 = tf.keras.layers.UpSampling2D((2, 2))
        self.conv7 = tf.keras.layers.Conv2D(1, (3, 3), activation='sigmoid', padding='same')
        
    def encoder(self, x):
        x = self.conv1(x)
        x = self.maxp1(x)
        x = self.conv2(x)
        x = self.maxp2(x)
        x = self.conv3(x)
        mu = self.mu(x)
        sigma = self.sigma(x)
        return mu, sigma
    
    def decoder(self, x):
        x = self.conv4(x)
        x = self.upsample1(x)
        x = self.conv5(x)
        x = self.upsample2(x)
        x = self.conv6(x)
        x = self.upsample3(x)
        x = self.conv7(x)
        return x
    
    def sample_from_mu_sigma(self, mu, sigma):
        std = tf.exp(0.5 * sigma)
        eps = tf.random_normal(shape=std.shape)
        return mu + eps * std
        
    def call(self, x):
        mu, sigma = self.encoder(x)
        z = self.sample_from_mu_sigma(mu, sigma)
        y = self.decoder(z)
        return y, mu, sigmadef loss(x, x_bar, mu, sigma):
    return tf.losses.mean_squared_error(x, x_bar) - (0.5 * tf.reduce_sum(1 + sigma - tf.math.pow(mu, 2) - tf.exp(sigma)))def grad(model, inputs, targets):
    with tf.GradientTape() as tape:
        reconstruction, mu, sigma = model(inputs)
        loss_value = loss(targets, reconstruction, mu, sigma)
    return loss_value, tape.gradient(loss_value, model.trainable_variables), reconstruction, mu, sigma

The training loop is the same as a convolutional auto encoder and looks like this

model = VariationalConvNetAutoEncoder()optimizer = tf.train.AdamOptimizer(learning_rate=0.001)
global_step = tf.Variable(0)num_epochs = 50
batch_size = 1for epoch in range(num_epochs):
    print("Epoch: ", epoch)
    for x in range(0, len(x_train), batch_size):
        x_inp = x_train[x : x + batch_size]
        loss_value, grads, reconstruction, mu, sigma = grad(model, x_inp, x_inp)
        optimizer.apply_gradients(zip(grads, model.trainable_variables),
                              global_step)
        
        if global_step.numpy() % 200 == 0:
            print("Step: {},         Loss: {}".format(global_step.numpy(),
                                          loss(x_inp, reconstruction, mu, sigma).numpy()))

This is all the basics of autoencoders. I hope you like the read. Thank you.
This is the link to the github repository.

References

https://www.youtube.com/watch?v=9zKuYvjFFS8

http://ufldl.stanford.edu/tutorial/unsupervised/Autoencoders/

Autoencoder

An autoencoder is a type of artificial neural network used to learn efficient data codings in an unsupervised manner…

en.wikipedia.org

https://www.tensorflow.org/tutorials/eager/custom_training_walkthrough