Introduction to Generative Adversarial Networks(GANs)

Mukund Khandelwal
Analytics Vidhya
Published in
7 min readNov 15, 2020

What are Generative Models?

Generative models are nothing but those models that uses an unsupervised learning approach. In a generative model there are samples in the data that is input variables(X) but it lacks the output variable(Y) and we use the only input variable(X) to train the generative models and it recognizes models from the input variables to generate an output that is unknown and based on the training data only. The unsupervised models are used to create or generate new examples in the input distribution. In layman terms, Generative models are able to generate new examples.

Supervised Vs Unsupervised Learning

In supervised learning a model is able to predict with the help of labeled dataset whereas in unsupervised learning the algorithm is trained using data that is unlabeled.

What are Generative Adversarial Networks?

Generative Adversarial Networks(GANs) is a class of Machine Learning frameworks and emergent part of deep learning algorithms that generates incredibly realistic images. The GANs helps to generate pictures of people that never existed or someone maybe yourself to look younger or older. The great advantage of GANs is that in today’s digital era, photos and videos plays a very crucial role for capturing valuable moments of our life. However current cameras specially in mobiles, users sometimes get disappointed as they photos they got didn’t match their expectation levels. In that case, GANs helps to take low resolution videos/images to high resolution images. GANs consist of pair of neural networks that fight with each other, where one is called generator and the other one is called the discriminator.

Types Of GANs

There are various types of GANs , the first we have is Vanilla GANs. In Vanilla GANs the generator and the discriminator are simple multi-layer perceptron, the algorithm tries to optimize the mathematical equation using stochastic gradient descent. The other type of GANs which we have is Deep Convolutional GANs(DCGANs) support convolutional neural networks instead of vanilla neural networks at both discriminator and generator, they are most stable and generate higher quality images. The generator is a set of convolutional layers with fractional transpose convolutions so it unsamples the input image whereas discriminator down samples the input image at every convolutional layer . The third type is conditional GANs which uses extra label to generate better results.

Why Generative Models?

So why do we care about generative models and why is this a really interesting core problem in unsupervised learning? Well there’s lot of things we can do with generative models. We are able to create realistic samples from the data distributions that we want, we can do really cool things. With generated samples we can do cool stuffs like super resolution, colorization or filling the edges with generated ideas of colors. We can also use generated models of time series data for simulation and planning and so this will be useful in for reinforcement learning applications.

The Discriminator Model

The discriminator is a classifier that inspects the examples, the fake examples, the real examples and determines whether they belong to real or fake class . for eg taking some fake model , we should go by approach by taking how fake the image is. In probabilistic terms, the discriminator models the probability of an example being fake given a set of input X. So, in short discriminator is a type of classifier that learns the probability of class Y (real or fake) given some features and the probabilities are the feedback of the generator.

The Generator Model

The Generator in GANs is like its heart. It’s a model that used to generate examples and the one that we should be invested in and helping achieve a really high performance at the end of the training process. We will discuss the role of Generator and how its able to improve the performance. Generators final goal is to produce examples from a certain class

How does GANs work?

In neural network language, what happens is that we have a pair of neural networks, the generator and discriminator and we train them with a set of real images and a set of fake images generated by the generator, the discriminator is tend to identify which images are real or which ones come from the generator and the generator is trained to fool the discriminator into classifying its images as real images.

There are various steps involving in training GANs :

Step1: Defining the problem

Step2: Defining the architecture of GANs

Step3: Train Discriminator on real data for ’n’ epochs

Step4: Generate fake inputs for generator and train discriminator on fake data.

Step5: Train Generator with the output of discriminator.

Step6: Repeat the steps from 3rd to 5th for few iterations.

Step7: Check whether the fake data seems legit. If it seems correct, then stop training, else go to step3.

Algorithm Behind Working of Gans

We will undergo multiple iterations of first order optimization algorithm i.e. gradient descent on Discriminator using real and generated pictures by keeping constant Generator(G). Then we fix Discriminator(D) and train Generator(G) for another single iteration to fool a fixed Discriminator. We keep on performing these iterations alternatively until we find good quality images from the generator and discriminator won’t be able to detect difference between real and fake images.

Challenges Faced By GANs

The concept of GANs is rather fascinating but there are lot of setbacks that can cause lot of hinderance in its path. Some of the major challenges faced by GANs are:

  1. The first one is stability, so there has to be a stability that is required between discriminator and the generator network otherwise the whole network would just fall.
  2. The next challenge which is faced by GANs is that it fails miserably in determining the positioning of the objects in terms of how many times the object should occur at that location.
  3. GANs faced problem in understanding the perspective and very often it gives flat image for a 3d object
  4. GANs have a problem of understanding the global objects and it cannot differentiate or understand a holistic structure.

Gans Applications

Prediction Of Next Frame In a Video

The prediction of future events in a video frame is made possible with help of GANs and Dual Video Discriminative GAN(DVD) can generate a 256 by 256 videos of notable fidelity up to 48 frames in length and this can be used for various purposes including surveillance in which we can determine the activities in a frame that gets distorted due to various factors like rain, dust ,smoke etc.

3D Object Generation

A variational image encoder maps an image to a latent vector for 3D object construction. The suitable model which can be used for the purpose is 3D-VAE-GAN.

Text to Image Generation

Object Driven attentive GAN which is also known as Object GAN performs the text to image synthesis in two steps. So the first step is generating the semantic layout and then generating the images by synthesizing the image by using a convolution image generator is the final step.

Image/Video Real Enhancement

Image/Video Enhancement methods are made to resolve the issues with image resolution and sharpness because of the small sensors and compact lenses with GANs. The most suitable type of GAN which can be used is Super Resolution Generative Adversarial Networks(SRGAN). It is built specially for optimally up-scaling native low-resolution images to enhance its details.

Interactive Image Generation

GANs can be used to generate interactive images as well and computer science and artificial intelligence library has developed a GAN that can generate 3d models with realistic lighting and reflections enabled by the shape and texture editing and more recently researchers have came up with a model that can synthesize a reenacted face animated by person’s movement while preserving the appearance of the face at the same time there are a lot more applications.

--

--