Ultimate Journey of GANs

Shashwat Tiwari
Analytics Vidhya
Published in
9 min readAug 11, 2019

“What I cannot create, I do not understand.”

— Richard Feynman

Hey Tech Geeks!

GAN is introduced by Ian Goldfellow and by Ian Goodfellow and other researchers at the University of Montreal, including Yoshua Bengio, in 2014. Many other Researchers who are referring to GANs, Facebook’s AI research director Yann LeCun called adversarial “the most interesting idea in the last 10 years in ML.” GAN’s are considered to be creative Neural Network which is much more similar like drawing a painting or composing a new song. Generative Adversarial Network consists of two Neural Network Architecture which competitive each other in order to produce better output.GAN’s are gaining popularity due to its huge potential of learning the distribution of data and later mock the data.

In this Blog I will be giving a heads up about GAN’s and its type however we will see that adversarial training is an enlightening idea, beautiful by its simplicity, that represents real conceptual progress for Machine Learning and more especially for models which are generative in Nature.

Let’s Fire it Up!

Generative Models

According to Wikipedia

“A generative model is a model of the conditional probability of the observable X, given a target y, symbolically, {P(X|Y=y)}”

Regardless of its definition Generative Model revolves around to generate random instances of independent as well as a dependent variable or observations of independent variables x given dependent variable y.On the other side of the coin, we have a discriminative model or discriminative classifier (without a model) can be used to “discriminate” the value of the target variable Y, given an observation x.

In order to train generative models we first collect a large amount of data in some domain (e.g., think millions of images, sentences, or sounds, etc.) and then train a model to generate data like it. Basically Neural Networks we train a model on a smaller amount of data than the number of parameters so the models are forced to generate the data internally. All types of Generative Models main objective is to learn the distribution of training data to generate new data points with some variations. Neural Networks basically mimic the true distribution of data points to model distribution.

Example of Generative Models is none other than Naive Bayes Algorithm which works by summarizing the probability distribution of each input variable and output class. In order to make predictions probability of each possible outcome is calculated with each variable hence, the independent probabilities are combined and a most likely outcome is predicted. When we reverse engineer this process we get the probability distributions for each variable can be sampled to generate new feature values.

Generative Adversial Networks

Taking about GAN’s these are deep learning-based, generative models. In other words, GAN’s revolved around the standardized approach called Deep convolutional network which is formalized by Alec Radford, et al. in the whitepaper “Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks“.

Let’s start this ultimate journey of GAN by taking an example — We have a set of images as an input and generate samples based on them as an output. Hence with this example, you need to understand that GAN’s falls under the umbrella of unsupervised learning fact that we are going to feed model without any label or target variable. The idea behind the GAN’s is simple and straight forward:

GAN basically contains two neural networks that compete against each other in a zero-sum game framework, i.e. generator, and a discriminator.

Talking about the Architecture it has two submodels namely Generator i.e. generating images and Discriminator for classification of real and fake images as generated by generator model. Lets’s dive in for more detailed explanation

Generator

  • The generator basically takes a fixed length of random input vectors and generates a random sample in the context of the fixed-length vector.
  • However, putting it in a different manner generator aims to fool the Discriminator to think that it is seeing real images while actually seeing fakes. We can think of the Generator as a counterfeit.
  • Talking more technically basically sample is drawn from randomly from a Gaussian distribution, and the sample is used to seed the generative process.
  • After the training process, the vector sample will fall in the problem domain emphasizing the compressed distribution of data.
source: http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture13.pdf)

Discriminator

  • Examples generated by a generator are weather real or fake as per classification is the goal of the discriminator. The real example, however, comes from a training set and generated samples however are the output of discriminator model.
  • As you are thinking that discriminator behaves like a normal classification model for distinguishing between real and fake images then you are right!.
  • Discriminator takes both real images from the input dataset and fake images from the Generator and outputs a verdict whether a given image is legit or not
  • After the training process, the discriminator model is discarded as we are interested in the generator. Most of cases discriminator model engineered to extract features from examples in the problem domain.

Mechanism Behind GAN’s

The basic mechanism of GAN’s revolves around the MinMax representation of two networks making competition with each other in order to improve themselves for transforming random noise to a realistic-looking image.

In the case of Discriminator if we are capable of distinguishing real or fake images also tries to tell whether a particular image is real or fake. We must be confident enough to tell as a single image came from original dataset or from a fake dataset. In order to achieve this, we can label images accordingly and perform a classic backpropagation allowing the Discriminator to learn over time and get better in distinguishing images. If discriminator did the correct classification then we have positive feedbacks in the form of loss gradients in case of failure negative feedbacks is rewarded!

In case of Generator basically inputs random noise and fools the discriminator in believing that image is real or fake. So this information can be backpropagation again. If the Discriminator identifies the Generator’s output as a genuine one, it means that the Generator performs a good job and it should be rewarded. On the other hand, if the Discriminator recognized that it was given a fake image then it should be pointed out with negative feedback.

(source: https://medium.com/@jonathan_hui/gan-whats-generative-adversarial-networks-and-its-application-f39ed278ef09)

As I want to keep this post more comprehensive so for understanding the Mathematics behind GAN’s Please Refer this Link.

Here is Point to Point explanation of some Popular GAN’s

Deep Convolutional Generative Adverserial Networks ( DCGAN)

  • Deep learning Neural Architecture capable of generating outputs mimic the pattern of data available on the training set.
  • The main difference between Generative Adversarial Network and DCGAN’s is DCGAN replaces the fully connected layers of the generative adversarial network model with convolution layers.
  • Let’s start with a real-world example of The forger (a.k.a. “the generator”) tries to produce fake paintings and pass them as a real image.
source:https://www.freecodecamp.org/news/how-ai-can-learn-to-generate-pictures-of-cats-ba692cb6eae4/
  • On the other side of the coin, the discriminator actually tries to catch the forger by using their knowledge of real paintings. As training goes on forger gets better and better in faking a real picture.
source:https://www.freecodecamp.org/news/how-ai-can-learn-to-generate-pictures-of-cats-ba692cb6eae4/
  • The Generator, however, has a random noise vector and generates a picture. The generated image gets into discriminator which undergoes comparison between the training set and an actual image
  • Discriminator returns a number between 0 (fake image) and 1 (real image) which is nothing but a typical classification.

Descriminator’s Story

Points to be Noted

  • For Discriminator in DCGAN point to be kept in mind that it takes as an input a real or fake image and outputs a score.
source:https://www.freecodecamp.org/news/how-ai-can-learn-to-generate-pictures-of-cats-ba692cb6eae4/
  • Filter Size for each convolutional layer should be doubled.
  • Instead of downsampling Strided convolutional layer is recommended.
  • Aid batch normalization at each layer ignoring the input layer, because it degrades the covariance shift. Use Leaky ReLU as an activation function because it helps to avoid the vanishing gradient effect.

Generator’s Story

Points to be Noted

source:https://www.freecodecamp.org/news/how-ai-can-learn-to-generate-pictures-of-cats-ba692cb6eae4/
  • Then, we create a generator. Remember, it takes as an input a random noise vector (z) and outputs a fake image, thanks to transposed convolution layers.
  • The generator starts its journey by taking random noise as input and outputting fake images. It is because of the transposed convolutional layers
  • Each Transposed convolutional layer is halved the filter size and size of image is however doubled. A generator works best with activation function tanh which is used in the output layer

Conditional GAN’s (cGANs)

Source: https://arxiv.org/pdf/1411.1784.pdf
  • GAN’s can be more extended on conditional statements if both the generator and discriminator are conditioned on some extra information of y. y is a class label or flag.
  • The main idea behind cGAN’s is to train a Generative Adversarial Network with a condition. Conditioning can be performed feeding y into both the discriminator and generator as the auxiliary input layer.
  • For distinguishing between real and fake data labels are also pushed into Discriminator as an input during the training process.

Laplacian Pyramid GAN (LAPGAN)

  • LAPGAN are basically linear invertible image representation consisting of a set of band-pass images, spaced an octave apart, plus a low-frequency residual.
  • This GAN uses multiple Generator and Discriminator with the different levels of the Laplacian Pyramid.
  • LAPGAN mainly used because it produces very high-quality images.
  • Architecture downsampled image at each layer of the pyramid. Up-Scaling is also performed in order to acquire some noise from the Conditional GAN at these layers until it reaches its original size.

StackGAN

  • StackGAN aims to provide a solution to generate High-quality images from text description with the help of computer vision.
  • These GAN’s are stacked to generate images which are wrapped around the text description
  • They decompose the hard problem into more manageable sub-problems through a sketch-refinement process.
Source:https://arxiv.org/pdf/1612.03242.pdf
  • It basically consists of two Stages in StackGAN Stage I transform the shape and color of an object based on text description outputs low-resolution images.
  • The Stage-II GAN takes Stage-I inputs and also text descriptions, and generates high-resolution images with realistic details.

InfoGAN’s

  • InfoGAN is an information-theoretic extension to the GAN that is able to learn disentangled representations in an unsupervised manner.
  • Used when your dataset is very complex also when training a cGAN when the dataset is not labeled.
  • InfoGAN can also be used as feature Extraction technique from image input.

Super Resolution GAN’s(SRGAN)

  • SRGAN’s is a way of architecting a GAN which is a deep neural network is used along with an adversarial network in order to produce higher resolution images.
  • Used in optimally up-scaling native low-resolution images to enhance its details minimizing errors while doing so.

Discover Cross-Domain Relations with Generative Adversarial Networks(Disco GANS)

  • DISCO GANS whitepaper author proposes a method based on generative adversarial networks that learn to discover relations between different domains.
Source:https://arxiv.org/pdf/1703.05192.pdf
  • Disco GAN’s are basically used in Network transfer style as part of one region to another. In the process, it preserves key attributes such as orientation and face identity.

Show me some Code!

Example of DCGAN powered by Keras

So we came to an end to an Awesome journey of GAN’s. However There are many more types of GANs, but we won’t be able to cover all of them in this article!

References

If you like this post, please follow me as well as Press that Clap button as long as you think I deserve it. If you have noticed any mistakes in the way of thinking, formulas, animations or code, please let me know.

Cheers!

--

--

Shashwat Tiwari
Analytics Vidhya

Senior Applied Data Scientist at EY || Machine Learning and Deep Learning Ardent ||