Variational Autoencoders: A generative model

Dhaval Taunk
Analytics Vidhya
Published in
3 min readApr 12, 2020
Source — https://www.jeremyjordan.me/variational-autoencoders/

In my last article, I had talked about the different types of autoencoders, their loss functions. In this article, I will be talking about variational autoencoders. They work as powerful generative models.

Again what will be the benefit of learning only the input features instead of learning useful features? The benefit of variational autoencoders is that they act as a generative model which means they can be used to generated features of similar features with slight changes. For example, an image with a human face can be generated with small changes in facial features from the input image. This can be very useful in data augmentation step that is used to overcome the problem of the small amount of dataset in machine learning problems.

Variational Autoencoders

The variational autoencoders are different from vanilla autoencoders in a way such that they represent their latent space by design, continuous, allowing easy random sampling and interpolation and thus helpful in data generation tasks.

They do this by not representing the encoding output as a single vector, instead by representing the encoding output as two vectors: a vector of means, μ, and another vector of standard deviations, σ.

Then the input vector to the decoder is sampled through these two vectors, one of means μ and other of standard deviations, σ. For any sampling generated from the encoder output, it is expected that the decoder learns these features and can generate similar features as input images with required variations.

The variational autoencoders generate the probability distribution by outputting the probabilities in their encoder and decoder and tries to minimize the difference between these two probability distributions by minimizing KL divergence loss which is a measure of the difference between two probability distributions.

Source — https://lilianweng.github.io/lil-log/2018/08/12/from-autoencoder-to-beta-vae.html

The loss function

Let say the input features are x and the encoding output be z then the conditional probability can be written as -

p(z|x) = (p(x|z) * p(z)) / p(x)

And there is another probability distribution that is similar to p(z|x) which is let say q(z|x). If it can be defined in such a way that it has a similar distribution to p(z|x). Then the difference between these two probability distributions can be measured and minimized by KL divergence loss.

minKL(q(z|x)||p(z|x))

The above loss function can be minimized to -

E(log p(x|z))−KL(q(z|x)||p(z))

That’s all from my side this time. You can read my previous article on autoencoders from the given below link -

More about KL divergence can be read here —

If you liked my article:

--

--

Dhaval Taunk
Analytics Vidhya

MS by Research @IIITH, Ex Data Scientist @ Yes Bank | Former Intern @ Haptik, IIT Guwahati | Machine Learning | Deep Learning | NLP