Which One Should You choose? GAN or VAE? Part-I

Shuo Li
5 min readJun 21, 2020

--

Both Generative Adversarial Network (GAN) and Variational Autoencoder (VAE) are popular models when it comes to generating images and sequences. As GAN and VAE share some similar tasks, we might encounter the challenge of choosing between them in specific application scenarios. In this article, I am going to provide some insights into how to make your decisions.

In Part-I, I am going to discuss their similarity and dissimilarities. My discussion bases on GAN and VAE original papers:

VAE: https://arxiv.org/abs/1312.6114

GAN: https://arxiv.org/abs/1406.2661

1. They are both generative models

There are two ways to define the term “generative models.” Heuristically, we could define generative models as models we can use to obtain synthetic data. Also important to note that the model can be considered as “generative” when the latent input variable has probability distribution associated with it.” Rigorously, generative models are algorithms learning posterior distribution P(Y|X) via Bayesian rule (Figure 1). To learn more about the comparison between generative and discriminative models, I refer the reader to an excellent article by Siwei Xu (https://towardsdatascience.com/generative-vs-2528de43a836)

Figure 1 Bayesian Rule

People call both GAN and VAE as “generative models.” By heuristic definition of a generative model, it makes sense to do so as researchers make use of GAN and VAE to generate lifelike images, like human faces, animals, or digits. Also, inputs to the GAN generator and VAE decoder have distributions. By rigorous definition, VAE models explicitly learn likelihood distribution P(X|Y) through loss function. GAN does not explicitly learn likelihood distribution. But GAN generators serve to generate images that could fool the discriminator. In other words, GAN generators aim at minimizing the difference between P(X) and P(Z|Y) where X is a real sample, and GAN generators generate fake samples (e.g. images) given Y. In this sense, GAN explicitly learns the likelihood distributions. As a result, by both heuristic and rigorous definitions, GAN and VAE are generative models.

Another similarity to mention is that both GAN and VAE are unsupervised learning. In other words, they do not require labels to learn. This lovely property enables GAN and VAE to makes use of gigantic unlabeled web data.

2. How do GAN and VAE differ from each other?

GAN and VAE have different architectures and value/loss functions.

GAN has two components in its architecture, namely generator, and discriminator. GAN learns through the so-called “min-max two-player game.” Heuristically, its generator generates fake images that could fool the discriminator. Its discriminator tries to distinguish fake images from real ones. GAN loss function is as follows:

Figure 2 GAN Loss Function

where x is an actual image whose distribution is subject to pdata(x). z is a random vector (or hidden state) whose distribution is subject to some predefined distribution pz(z). D(x) stands for the discriminator and output its confidence that the input is a real image. G(z) stands for generator and outputs generated images.

The first term in the GAN value function gets higher value when D(x) outputs a higher confidence rate for x. In other words, we are happy if D(x) is more confident in classifying real images as “real”. The second term gets higher if D(x) is less confident about classifying the image generated by G(z) as “real”. also, the second term gets lower if G(z) outputs more realistic images so that they could make D(x) more confident in classifying them as “real.” The “min-max two-player game” stands for the competition between generator and discriminator.

Figure 3 GAN Architecture

VAE also has two components in its architecture, namely encoder and decoder. The encoder encodes input data into hidden states. In other words, encoder projects input data onto the lower-dimension vector of which each element has its distribution. Then a vector is sampled from such distributions to obtain a specific input (hidden state) to the decoder. Last, the decoder tries to decode its input to the input data of the encoder. Note that, different from GAN, whose hidden state distributions are predefined, VAE learns the hidden state distributions p(z) during the process.

VAE value function is as follows:

Figure 4 VAE Loss Function

The first term is an approximation of the negative KL divergence between predicted conditional probability q_{phi}(z|x), and an assumed normal prior N(0, 1). The second term is the likelihood, which could be formulated as the reconstruction error between encoder input and decoder output. Optimizers maximize the value function to find the optimal encoder and decoder parameters.

Figure 5 VAE Architecture

In conclusion, VAE and GAN learn by different value/loss functions. VAE makes use of the probabilistic graph model and learns by finding good posterior p(z|x) and likelihood p(x|z). To generate images, VAE first chooses a prior distribution p(z) according to the expected x, and then samples a hidden state from p(z) and feed it into the decoder. GAN directly tries to find a suitable image generator by the “min-max two-player game.” To generate images, GAN samples a hidden state from a predefined distribution and feeds the hidden state into a generator.

3. Summary

For now, Part-I briefly discussed similarities and dissimilarities between GAN and VAE. In the second part, I will discuss how GAN and VAE are different in terms of the application through examples.

--

--