A Gentle Introduction into Variational Autoencoders

Aditya Mittal
Nov 25, 2020 · 6 min read
Image for post
Image for post
A representation of a variational autoencoder. (Image Source)

Imagine this: you’ve spent forever perusing the internet for images, and you’ve finally found the perfect image to put inside of your presentation. You save the image and move it to your presentation, when you realize, the image has a watermark! You angrily start to pick up your water bottle to throw at the computer when you remember a computer program you created a while back in your AI class: the perfect way to remove watermarks, using autoencoders.

Well, that’s a bit of an understatement about what autoencoders can do, but still an important one nonetheless! Autoencoders are used in a wide variety of things from dimensionality reduction to image generation to feature extraction. Autoencoders allow you to replicate the works of Picasso, scale down terabytes of data, and denoise grainy images from security cameras. Let’s first start with how to make general autoencoders, and then we’ll talk about variational autoencoders.

The Basics of Autoencoders

Image for post
Image for post
Simplistic representation of autoencoders. (Image Source)

Above is the simplest representation of an autoencoder. It consists of three major parts: the encoder, the bottleneck, and the decoder.

The encoder is how the model learns how to reduce input data and compress it into an encoded representation that the computer can use later for reconstructing the image. The encoder generally takes the form of a simple CNN with convolution and dropout layers. When coding an encoder, I find that using a Leaky ReLU activation function also works better than a normal ReLU activation function. A sample encoder taking in an input of a 28x28 image, returning a bottleneck layer of size 8, and using a Leaky ReLU activation function is seen below:

The bottleneck, or the area between the encoder and the decoder, is what the compressed form of the input data is. The data is encoded in a latent space in n dimensions where n is the number of outputs you have in the bottleneck. It is important to remember that n is a hyperparameter that you set, and that the more n is, the closer the bottleneck will represent the actual image, but its representation will require more storage. Bottlenecks can be used for feature extraction and image compression, as the original image can be compressed in smaller dimensions, thereby requiring less storage to hold.

The decoder takes this compressed input and tries to remake the data from the encoded representation for reconstruction of the original image. The decoder once again takes the form of a simple CNN with convolution and dropout layers. The model is trained by comparing the original image to the reconstructed image, creating the reconstruction loss, which is minimized when the network is being updated. A sample decoder taking in a bottleneck layer of 8 inputs, returning an output of a 28x28 image, and using a Leaky ReLU activation function can be seen below:

A representation below shows how watermarks and noise can be removed using autoencoders. Instead of finding the reconstruction loss between the input image and the decoded image, we find the reconstruction loss between the image without noise and the decoded image.

Image for post
Image for post
Representation of a denoising autoencoder. (Image Source)

Variational Autoencoders

Note: Variational autoencoders are slightly more complex than general autoencoders, and require knowledge of concepts such as normal distributions, sampling, and some linear algebra.

Image for post
Image for post
Simplistic representation of a variational autoencoder. (Image Source)

Variational autoencoders build on the concept of general autoencoders, but instead of the decoder taking in the bottleneck vector, it now takes in a sample of the bottleneck vector. This prevents overfitting of the autoencoder, as every point in the bottleneck vector is not being used to train the decoder. To further reduce overfitting, the sampled distribution is taken from points of a normal distribution (N(0,1)).

Simply put, an variational autoencoder is one whose training is regularized to avoid overfitting and ensures that the latent space is able to enable the generative process. It samples points from the latent space of an encoded vector and passes those in as inputs to the decoder.

On a deeper level, the encoded vector is split up into two vectors: a mean vector and a sample deviation vector. These vectors are what backpropogation is run upon to update the weights of both the encoder and decoder. You may be wondering, does the loss function to train the network remain the same as a general autoencoder?

Not quite. Although the reconstruction loss is still used in the loss, another term is added to it: a regularization loss (Kulback-Leibler divergence of KL), which makes the distributions returned by the encoder (the mean vector and the standard deviation vector) close to a standard normal distribution. Assuming a decoder of d, and a sample of z, the loss function is as follows:

Image for post
Image for post
Formula for loss function of variational autoencoder. (Image Source)

To express regularity or a normal distribution of the vector space in less mathematical terms, it can be explained using two terms: continuity and completeness. The latent space strives to accomplish these two attributes while training. Continuity is the condition where two close points in the latent space should not give two completely different contents when decoded. Completeness is the condition that a point sampled from the latent space should give meaningful content once decoded.

Project Idea

Now that you’ve learned about the theory behind variational autoencoders, it’s now time to put it to the test by actually coding one up yourself. Your first project will be generating numbers that resemble those from the MNIST dataset using Tensorflow. The final code can be seen here:

Good luck, and I hope this project will show you the incredible power of variational autoencoders! From its applications in film to security, variational autoencoders will undoubtedly be a driving force in AI for the future.

TL;DR

  • Autoencoders serve a variety of functions, from removing noise to generating images to compressing images.
  • General autoencoders consist of three parts: an encoder, a bottleneck, and a decoder. A bottleneck is the compressed form of your image of n dimensions where n is the number of outputs.
  • General autoencoders are trained using a reconstruction loss, which measures the difference between the reconstructed and original image.
  • Variational autoencoders are mostly the same, but they use a sampling of the bottleneck vector from a normal distribution to reduce overfitting.

Further Reading

Note: These are listed in order from easiest to understand to hardest to understand. I recommend starting with the resource at the top to start!

If you want to talk more about hyperloops or anything else, schedule a meeting: Calendly! For information about projects that I am currently working on, consider subscribing to my newsletter! Here’s the link to subscribe. If you’re interested in connecting, follow me on Linkedin, Github, and Medium.

Analytics Vidhya

Analytics Vidhya is a community of Analytics and Data…

Sign up for Analytics Vidhya News Bytes

By Analytics Vidhya

Latest news from Analytics Vidhya on our Hackathons and some of our best articles! Take a look

By signing up, you will create a Medium account if you don’t already have one. Review our Privacy Policy for more information about our privacy practices.

Check your inbox
Medium sent you an email at to complete your subscription.

Aditya Mittal

Written by

Analytics Vidhya

Analytics Vidhya is a community of Analytics and Data Science professionals. We are building the next-gen data science ecosystem https://www.analyticsvidhya.com

Aditya Mittal

Written by

Analytics Vidhya

Analytics Vidhya is a community of Analytics and Data Science professionals. We are building the next-gen data science ecosystem https://www.analyticsvidhya.com

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store