VAE v/s GAN — A case study

7 min readJun 15, 2023

Deep learning is a field that focuses on creating neural networks with multiple layers to uncover hidden patterns and representations in data. It encompasses various learning approaches, such as supervised, unsupervised, and reinforcement learning.
By developing complex models and optimizing them, deep learning explores the intricate interactions within the network’s hidden layers. This includes areas like hyperparameter tuning, regularization, and generative modeling. Generative modeling, a fundamental concept in deep generative models (DGMs), involves representing probability distributions over variables.
DGMs find applications in tasks like pattern completion, image generation, classification, and unsupervised learning. Two notable DGMs are Variational Autoencoders and Generative Adversarial Networks, each with its own strengths and purposes. While this summary provides an overview of the topic, a deeper understanding of neural networks, optimization algorithms, error functions, and regularization is necessary to grasp the technical aspects.
The dimensionality of the input data, whether images or real values, is crucial for network training and requires proper encoding and decoding. The networks are generative, aiming to minimize losses and errors during training. This summary sets the stage for a detailed exploration of each subtopic.

Autoencoders and Variational Autoencoders :

An autoencoder is a neural network that learns to copy its input to its output. It consists of an encoder that compresses the input into a latent space representation and a decoder that reconstructs the input from the latent space.
Variational autoencoders (VAEs) are a type of autoencoder that focus on dimensionality reduction. Unlike traditional autoencoders, VAEs represent the latent space as a distribution rather than a single point. This is achieved by incorporating a regularization term and a reconstruction error. VAEs address the issue of disorganized and randomly distributed latent space in traditional autoencoders by introducing a generative model and a recognition model. The recognition model approximates the posterior distribution over latent variables, while the generative model reconstructs the input based on samples from the latent space.

VAEs use regularization to retain the ordered distribution of the latent space, even with noisy input. The latent space is represented as a normal distribution, with mean and variance. The KL divergence loss is used to minimize the overlap between the true distribution and the parameterized distribution. This prevents the network from learning narrow distributions and enables the inference of latent attributes necessary for input reconstruction.
The reparameterization trick is used to sample values from the latent distribution, allowing for tweaking of the distribution’s parameters while still being able to randomly sample from it. The decoder network in VAEs can generate new data by sampling from the prior distribution in the latent space. VAEs offer a powerful tool for generative modeling and data reconstruction.

Application of Variational Autoencoders

The variational autoencoder (VAE) approach is highly regarded for its elegance, theoretical appeal, and ease of implementation. It is considered one of the leading methods in generative modeling, producing excellent results. However, a limitation of VAEs trained on images is that the generated samples can exhibit blurriness, and the exact causes of this phenomenon are still unknown.

Nevertheless, the VAE framework can be extended to various model architectures, making it applicable to a wide range of probabilistic models. One notable advantage of VAEs is that training both the encoder and the generator networks simultaneously encourages the model to learn a coherent coordinate system that the encoder can capture, making it effective for manifold learning.

While traditional autoencoders find applications in image denoising, dimensionality reduction, and data compression, VAEs excel in image and time series data generation. They are particularly useful for anomaly detection, as the latent attributes cannot accurately reconstruct images that are not part of the training dataset, enabling the identification of anomalies.

Generative Adversarial Networks

A generative adversarial network (GAN) is a generative modeling method that employs differentiable generator networks. It involves training two models simultaneously: a generative model that captures the data distribution, and a discriminative model that estimates the likelihood of a sample coming from the training data rather than the generative model. The training process aims to maximize the likelihood of the discriminative model making a mistake, thus improving the generative model.
GANs were introduced in 2014 and have garnered significant attention in the field of machine learning. They have the potential to replicate various data distributions, making them capable of generating realistic visuals, music, speech, and literature. However, their versatility also poses risks, as GANs can be used for creating deceptive media, such as Deepfakes.

The GAN framework involves the generative network competing against an adversary, represented by the discriminative model. The generative model strives to produce fake samples indistinguishable from the real data, while the discriminative model aims to detect the fakes. This competitive game drives both models to improve until the generated samples closely resemble the real data.
GANs offer flexibility in terms of learning algorithms and optimization techniques for different types of models.
The key idea of GANs is to focus on the models that do not rely solely on maximum likelihood, enabling the removal of distracting differences among different models. The training algorithm of GANs utilizes the model’s ability to generate samples rather than defining a specific density function. GANs can be analyzed using game theory and are often referred to as “adversarial” due to the competitive nature between the generator and discriminator. However, they can also be considered cooperative since the discriminator shares information with the generator.
Balancing the strengths of the generator and discriminator is crucial in GAN training. If the discriminator is too effective, the generator may struggle to learn the gradient. Conversely, if the generator becomes too strong, it can exploit weaknesses in the discriminator, resulting in false negatives. Adjusting the learning rates of both networks can mitigate these issues.
Although initially proposed for unsupervised learning, GANs have found applications in semi-supervised learning, fully supervised learning, and reinforcement learning. Various GAN variants have been developed, including DCGAN, SRGAN, VAE-GAN, WGAN, cycleGAN, and styleGAN, each with its own modifications and applications.

Applications of Generative Adversarial Network

The application of generative adversarial networks (GANs) is expanding rapidly in various fields. GANs can be utilized to enhance image quality, transforming lower-dimensional images into higher-dimensional ones. In terms of cybersecurity, GANs can be trained to detect fraudulent activities and strengthen deep learning models by recognizing malicious data introduced to photos by hackers.

In the healthcare industry, GANs can aid in drug research by generating molecular structures for targeting and treating diseases. GANs also find applications in the art and fashion domain, where they can be used to paint photographs, generate contemporary art, and assist independent artists with limited resources.

Additionally, GANs contribute to scientific advancements, such as improving astronomical imaging through gravitational lensing and enhancing video game models by upscaling low-resolution textures to higher resolutions.

Similarities between VAE and GAN

Variational autoencoders (VAEs) and generative adversarial networks (GANs) are two popular deep generative modeling paradigms rooted in deep learning. They both employ encoder and decoder architectures using neural networks.
In a study conducted by Wu and Goodman, the effectiveness of multimodal deep generative models in producing visual content was examined. VAE and GAN were trained as generative models using labeled image datasets such as MNIST, FashionMNIST, CIFAR10, and CelebA.
The visual assessment of the generated images showed that for simpler images like MNIST, both VAE and GAN produced similar results in terms of quality. However, for more complex images, the models yielded different outcomes.
Objective evaluation using the Fréchet Inception Distance (FID) as a metric revealed that the GAN performed better than the VAE on MNIST, while the opposite was observed for FashionMNIST, CIFAR10, and CelebA datasets. The higher FID scores associated with VAE images indicated blurriness, confirming the fidelity of the FID measure according to Wu and Goodman’s findings.

Differences between VAE and GAN

References :

[1] Hanna Peters, Norma Cueto Calis, “An empirical comparison of generative capabilities of GAN vs VAE”, KTH Royal Institute of Technology, School of Electrical Engineering and Computer Science
[2] Diederik P. Kingma and Max Welling (2019), “An Introduction to Variational Autoencoders”, Foundations and Trends in Machine Learning. [3] NIPS 2016 Tutorial: “Generative Adversarial Networks”, Ian Goodfellow [4] Ian J. Goodfellow, Jean Pouget-Abadie∗ , Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair† , Aaron Courville, Yoshua Bengio, “Generative Adversarial Nets”, Département d’informatique et de recherche op ´ erationnelle ´ Universite de Montreal
[5] Ian Goodfellow and Yoshua Bengio and Aaron Courville, “Deep Learning”, MIT Press Book.

VAE v/s GAN — A case study

Written by Aleema Parakatta