Autoencoders: Unsupervised Artificial Neural Networks(ANN)

Samyak Kala
Analytics Vidhya
Published in
5 min readJun 7, 2020

--

Welcome to this blog about Autoencoders. In this blog, you will find an explanation of what is an autoencoder, how it works, and see an implementation of an autoencoder in TensorFlow.

Table of Contents

  1. Introduction
  2. Feature Extraction and Dimensionality Reduction
  3. Autoencoder Structure
  4. Performance
  5. Code

1. Introduction

An autoencoder, also known as Autoassociator or Diabolo networks, is an artificial neural network employed to recreate the given input. It takes a set of unlabeled inputs, encodes them, and then tries to extract the most valuable information from them. They are used for feature extraction, learning generative models of data, dimensionality reduction, and can be used for compression.

A 2006 paper named Reducing the Dimensionality of Data with Neural Networks, done by G. E. Hinton and R. R. Salakhutdinov, showed better results than years of refining other types of network and was a breakthrough in the field of Neural Networks.

“Autoencoders, based on Restricted Boltzmann Machines, are employed in some of the largest deep learning applications. They are the building blocks of Deep Belief Networks (DBN).”

Autoencoders

2. Feature Extraction and Dimensionality Reduction

An example is given by Nikhil Buduma in KdNuggets which gave an excellent explanation of the utility of this type of Neural Network.

Say that you want to extract what emotion the person in photography is feeling. Using the following 256x256 pixel grayscale picture as an example:

But when using this picture we start running into a bottleneck! Because this image being 256x256 pixels in size correspond with an input vector of 65536 dimensions! If we used an image produced with conventional cellphone cameras, that generates images of 4000 x 3000 pixels, we would have 12 million dimensions to analyze.:

As you can see, it increases exponentially! Returning to our example, we don’t need to use all of the 65,536 dimensions to classify an emotion. A human identifies emotions according to some specific facial expression, some key features, like the shape of the mouth and eyebrows.

3. Autoencoder Structure

Autoencoders Structure

An autoencoder can be divided into two parts :

1. Encoder : the encoder needs to compress the representation of the input. In this case, we are going to reduce the dimension of the face of our actor, from 2000 dimensions to only 30 dimensions, by running the data through layers of our encoder.

2. Decoder: the decoder works like an encoder network in reverse. It works to recreate the input, as closely as possible. This plays an important role during training because it forces the autoencoder to select the most important features in the compressed representation.

4. Performance

Left: Principal Component Analysis (PCA) and Right: Autoencoders

This image was extracted from the G. E. Hinton and R. R. Salakhutdinovcomparing’s paper, on the two-dimensional reduction for 500 digits of the MNIST, with PCA on the left and autoencoder on the right. We can see that the autoencoder provided us with a better separation of data.

5. Code

#from __future__ import division, print_function, absolute_importimport tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
# Import MINST data
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets(“/tmp/data/”, one_hot=True)

Now, let’s give the parameters that are going to be used by our NN.

learning_rate = 0.01
training_epochs = 20
batch_size = 256
display_step = 1
examples_to_show = 10
# Network Parameters
n_hidden_1 = 256 # 1st layer num features
n_hidden_2 = 128 # 2nd layer num features
n_input = 784 # MNIST data input (img shape: 28*28)
# tf Graph input (only pictures)
X = tf.placeholder(“float”, [None, n_input])
weights = {
‘encoder_h1’: tf.Variable(tf.random_normal([n_input, n_hidden_1])),
‘encoder_h2’: tf.Variable(tf.random_normal([n_hidden_1, n_hidden_2])),
‘decoder_h1’: tf.Variable(tf.random_normal([n_hidden_2, n_hidden_1])),
‘decoder_h2’: tf.Variable(tf.random_normal([n_hidden_1, n_input])),
}
biases = {
‘encoder_b1’: tf.Variable(tf.random_normal([n_hidden_1])),
‘encoder_b2’: tf.Variable(tf.random_normal([n_hidden_2])),
‘decoder_b1’: tf.Variable(tf.random_normal([n_hidden_1])),
‘decoder_b2’: tf.Variable(tf.random_normal([n_input])),
}

Now we need to create our encoder. For this, we are going to use sigmoidal functions. Sigmoidal functions deliver great results with this type of network. This is due to having a good derivative that is well-suited to backpropagation. We can create our encoder using the sigmoidal function like this:

# Building the Encoder
def encoder(x):
# Encoder first layer with sigmoid activation #1
layer_1 = tf.nn.sigmoid(tf.add(tf.matmul(x, weights['encoder_h1']), biases['encoder_b1']))
# Encoder second layer with sigmoid activation #2
layer_2 = tf.nn.sigmoid(tf.add(tf.matmul(layer_1, weights['encoder_h2']), biases['encoder_b2']))
return layer_2

And the decoder: You can see that the layer_1 in the encoder is the layer_2 in the decoder and vice-versa.

# Building the decoder
def decoder(x):
# Decoder first layer with sigmoid activation #1
layer_1 = tf.nn.sigmoid(tf.add(tf.matmul(x, weights['decoder_h1']),biases['decoder_b1']))
# Decoder second layer with sigmoid activation #2
layer_2 = tf.nn.sigmoid(tf.add(tf.matmul(layer_1, weights['decoder_h2']), biases['decoder_b2']))
return layer_2

Let’s construct our model. In the variable cost we have the loss function and in the optimizer variable we have our gradient used for backpropagation.

# Launch the graph
# Using InteractiveSession (more convenient while using Notebooks)
sess = tf.InteractiveSession()
sess.run(init)
total_batch = int(mnist.train.num_examples / batch_size)
# Training cycle
for epoch in range(training_epochs):
# Loop over all batches
for i in range(total_batch):
batch_xs, batch_ys = mnist.train.next_batch(batch_size)
# Run optimization op (backprop) and cost op (to get loss value)
_, c = sess.run([optimizer, cost], feed_dict={X: batch_xs})
# Display logs per epoch step
if epoch % display_step == 0:
print(“Epoch:”, ‘%04d’ % (epoch+1),
“cost=”, “{:.9f}”.format(c))
print(“Optimization Finished!”)

Output:

Epoch: 0001 cost= 0.182728916
Epoch: 0002 cost= 0.150434598
Epoch: 0003 cost= 0.130958572
Epoch: 0004 cost= 0.125098571
Epoch: 0005 cost= 0.119374141
Epoch: 0006 cost= 0.116029739
Epoch: 0007 cost= 0.114480294
Epoch: 0008 cost= 0.110542893
Epoch: 0009 cost= 0.107315414
Epoch: 0010 cost= 0.103023507
Epoch: 0011 cost= 0.101529025
Epoch: 0012 cost= 0.097410828
Epoch: 0013 cost= 0.093311585
Epoch: 0014 cost= 0.093811013
Epoch: 0015 cost= 0.090760238
Epoch: 0016 cost= 0.089178301
Epoch: 0017 cost= 0.087290406
Epoch: 0018 cost= 0.085913278
Epoch: 0019 cost= 0.086014777
Epoch: 0020 cost= 0.084903874
Optimization Finished!

Above we have taken 20 Epochs.

Now, let’s apply encoder and decoder for our tests.

# Applying encode and decode over test set
encode_decode = sess.run(
y_pred, feed_dict={X: mnist.test.images[:examples_to_show]})
# Lets Let’s simply visualize our graphs!# Compare original images with their reconstructions
f, a = plt.subplots(2, 10, figsize=(10, 2))
for i in range(examples_to_show):
a[0][i].imshow(np.reshape(mnist.test.images[i], (28, 28)))
a[1][i].imshow(np.reshape(encode_decode[i], (28, 28)))

Final Output:

As you can see, the reconstructions were successful. It can be seen that some noise was added to the image.

Thank You for Reading

For more such content and knowledge related to Deep Learning and Neural Networks click here

--

--

Samyak Kala
Analytics Vidhya

A Machine Learning enthusiast, a python developer, focusing on Deep Learning and NLP