A Tour of Generative Adversarial Network Models

Aamir Jarda

Published in

DataSeries

11 min readAug 19, 2021

GANs were introduced in a paper by Ian Goodfellow and other researchers at the University of Montreal in 2014.

What is GAN?

A generative adversarial network (GAN) is a type of model in a neural network that offers a lot of potential in the world of machine learning. In GAN there are two neural networks: first is a generative network and the second is a discriminative network. So the main concept behind this project is the generative adversarial network. GAN is about creating stuff and this is hard to compare other deep leaning fields. The main focus of GAN is to generate data from scratch. As we see early GAN composes of two networks the generator and the discriminator.

Generative adversarial networks (GANs) are deep neural net architectures included of two nets, pitting one against the other.
Facebook’s AI research director Yann LeCun called adversarial training “the most interesting idea in the last 10 years in ML.”

GANs’ potential is very huge because they can learn to mimic any data. So, use of GAN we create worlds similar to our own in any domain: image, anime, news anchor, speech.

Generative vs. Discriminative Algorithms

To understand GANs, we need to know how generative algorithms work as well as how discriminative algorithms are sententious, so the work of discriminative algorithms tries to classify the data.

A standard example of this scenario is email, given all words in an email what the discriminator do is predict whether the message is spam or not spam. In this example, spam is one of the labels, and words of the email are features that compose the input data. If we expressed this problem as mathematical, so the label is called y and the feature is called x. The formulation p (y|x) is used to mean “the probability of y given x”.

The main question a generative algorithm tries to answer is: they assuming this email is spam while the discriminative model cares about the relations between a feature(x) and label(y).
So if we think about generative algorithms is that they do the opposite compare to the discriminator. Instead of predicting a label, they predict the features given a certain label.

Best way to distinguish generative from the discriminative like this:

· Discriminative models learn the range between classes

· Generative models model the distribution of individual classes

How do these models interact?

In the original paper which proposed this framework, it can be thought of the Generator as having an adversary, the Discriminator. So that means the generator needs to learn how to do operations as well as create data in such a way discriminator isn’t able to distinguish between the real and fake or it as fake anymore. The competition between these two models is what improves their knowledge until the generator is creating realistic data

1. Cycle Consistent Adversarial Networks

Unpaired image to image translation is the most interesting and challenging topic because of the graphic problem and loss function.

Cycle-GAN is a very popular GAN architecture primarily being used to learn transformation between images of different styles.

FaceApp is one of the most popular examples of Cycle-GAN where human faces are transformed into different age groups.

As an example, let’s say X is a set of images of horses and Y is a set of images of zebra.

The goal is to learn a mapping function G: X-> Y such that images generated by G(X) are indistinguishable from the image of Y. This objective is achieved using an Adversarial loss. This formulation not only learns G, but it also learns an inverse mapping function F: Y->X and use cycle-consistency loss to enforce F(G(X)) = X and vice versa.

•Role of G: G is trying to translate X into outputs, which are fed through Dy to check whether they are real or fake according to Domain Y.
•Role of F: F is trying to translate Y into outputs, which are fed through Dx to check if they are indistinguishable from Domain X.

So the main part of this project is here Cycle-consistency loss like if our input image is A from domain X is transformed into a target image or output image B from domain Y via Generator G, then image B from domain Y is translated back to domain X via Generator F. So that time the difference between these two images is called as the Cycle-Consistency loss.
This approach requires creating two pairs of generators and discriminators: one for A2B (source to output conversion) and another for B2A (output to source conversion).

So if we considering the example for converting an image from source to target domain (horse into a zebra) that Cycle-GAN requires two generators.

The generator A2B converts a horse into a zebra and B2A converts a zebra into a horse. Both train together to ensure the input horse image and the generated image of the zebra. The two discriminators determine real or fake images for horse and zebra.

Summary

This is the gentle introduction to Cycle-Consistent Adversarial Networks. Cycle-GAN is an open field for research sill lots of work can be done in this field
Cycle-GAN is a procedure for training unsupervised image translation models via the GAN architecture utilizing unpaired collections of pictures from two different areas.
Image-to-Image translation interpretation includes the controlled alteration of an image and requires huge datasets of combined images that are mind-boggling to get ready or now and then don’t exist.
In my next article, I briefly describe the mathematical way to define Cycle GAN and some steps to improve the efficiency of the Cycle GAN model.

2. CGAN (Conditional GAN)

Description: Training a GAN conditioned on class labels to generate handwritten digits.

In GAN, there is no control over modes of the data to be generated. The conditional GAN changes that by adding the label y as an additional parameter to the generator and hopes that the corresponding images are generated. We also add the labels to the discriminator input to distinguish real images better.

In this example, we’ll build a Conditional GAN that can generate MNIST handwritten digits conditioned on a given class. Such a model can have various useful applications:

let’s say you are dealing with an imbalanced image dataset, and you’d like to gather more examples for the skewed class to balance the dataset. Data collection can be a costly process on its own. You could instead train a Conditional GAN and use it to generate novel images for the class that needs balancing.
Since the generator learns to associate the generated samples with the class labels, its representations can also be used for other downstream tasks.

Advantages

By providing additional information, we get two benefits:

Convergence will be faster. Even the random distribution that the fake images follow will have some pattern.
You can control the output of the generator at test time by giving the label for the image you want to generate.

Summary

Conditional GAN is an extension of GAN such that we condition both the generator and the discriminator by feeding extra information, y, in their learning phase. In this way, we can generate/discriminate certain types of samples.

3. Deep Convolutional Generative Adversarial Network

DCGAN is an extension of the GAN architecture for using deep convolutional neural networks for both the generator and discriminator models and configurations for the models and training that result in the stable training of a generator model. It uses a convolutional stride and transposed convolution for the downsampling and the upsampling.

The DCGAN is important because it suggested the constraints on the model required to effectively develop high-quality generator models in practice. This architecture, in turn, provided the basis for the rapid development of a large number of GAN extensions and applications.

How GANs Work

As we know these algorithms belong to the field of unsupervised learning
Generative Adversarial Networks are composed of two models:

The first model is called a Generator and its target is to generate new data similar to the real one. The generator can create data and the discriminator is checked whether the data is real or fake.

And the second model is called a Discriminator. This model’s goal is to recognize if input data is real or fake — belongs to the original dataset- or if it’s fake generated by the generator. So discriminator is like police which tries to detect work is real or fake.

DCGAN, or Deep Convolutional GAN, is a generative adversarial network architecture. It uses a couple of guidelines, in particular:

Replacing any pooling layers with strided convolutions (discriminator) and fractional-strided convolutions (generator).
Using batch norm in both the generator and the discriminator.
Removing fully connected hidden layers for deeper architectures.
Using ReLU activation in a generator for all layers except for the output, which uses tanh.
Using LeakyReLU activation in the discriminator for all layers.

Summary

DCGAN demonstrates the versatility of the GAN framework. In theory, the Discriminator and Generator can be represented by any differentiable function, even one as complex as a multilayer convolutional network. However, DCGAN also demonstrates that there are significant hurdles to making more complex implementations work in practice. Without breakthroughs such as batch normalization, DCGAN would fail to train properly.

4. Wasserstein Generative Adversarial Network

The Wasserstein GAN, or WGAN for short, was introduced by Martin Arjovsky, et al. in their 2017 paper titled “Wasserstein GAN.”

The Wasserstein GAN (WGAN) is a GAN variant that uses the 1-Wasserstein distance, rather than the JS-Divergence, to measure the difference between the model and target distributions. This seemingly simple change has big consequences! Not only does WGAN train more easily (a common struggle with GANs) but it also achieves very impressive results — generating some stunning images. By studying the WGAN, and it's variant the WGAN-GP, we can learn a lot about GANs and generative models in general. After completing this curriculum you should have an intuitive grasp of why the WGAN and WGAN-GP work so well, as well as, a thorough understanding of the mathematical reasons for their success. You should be able to apply this knowledge to understanding cutting-edge research into GANs and other generative models.

The discriminator learned very quickly to distinguish between fake and real, and as expected provides no reliable gradient information. The critic, however, can’t saturate and converges to a linear function that gives remarkably clean gradients everywhere. The fact that we constrain the weights limits the possible growth of the function to be at most linear in different parts of the space, forcing the optimal critic to have this behaviour. — Wasserstein GAN, 2017.

GAN

The network design is almost the same except the critic does not have an output sigmoid function. The major difference is only on the cost function:

WGAN

However, there is one major thing missing. f has to be a 1-Lipschitz function. To enforce the constraint, WGAN applies a very simple clipping to restrict the maximum weight value in f, i.e. the weights of the discriminator must be within a certain range controlled by the hyperparameters c.

Summary

The Wasserstein GAN (WGAN) is a GAN variant which uses the 1-Wasserstein distance, rather than the JS-Divergence, to measure the difference between the model and target distributions.
In this model, we can improve the stability of learning, get rid of problems like mode collapse, and provide meaningful learning curves useful for debugging and hyperparameter searches.

Applications of GAN

Generates photographs of Human Faces( Tero Karras, et al. in their 2017 paper titled “Progressive Growing of GANs for Improved Quality, Stability, and Variation” demonstrate the generation of plausible realistic photographs of human faces.)
Generates Realistic Photographs ( Andrew Brock, et al. in their 2018 paper titled “Large Scale GAN Training for High Fidelity Natural Image Synthesis” demonstrate the generation of synthetic photographs with their technique BigGAN that are practically indistinguishable from real photographs.)
Generates Cartoon Characters ( Yanghua Jin, et al. in their 2017 paper titled “Towards the Automatic Anime Characters Creation with Generative Adversarial Networks” demonstrate the training and use of a GAN for generating faces of anime characters (i.e. Japanese comic book characters.)
Image to Image translations ( Phillip Isola, et al. in their 2016 paper titled “Image-to-Image Translation with Conditional Adversarial Networks” demonstrate GANs, specifically their pix2pix approach for many image-to-image translation tasks.) Most interesting Paper — ( Jun-Yan Zhu in their 2017 paper titled “Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks” introduce their famous CycleGAN and a suite of very impressive image-to-image translation examples.)
Photos to Emoji (Yaniv Taigman, et al. in their 2016 paper titled “Unsupervised Cross-Domain Image Generation” used a GAN to translate images from one domain to another, including from street numbers to MNIST handwritten digits, and from photographs of celebrities to what they call emojis or small cartoon faces.)
Super Resolution (Christian Ledig, et al. in their 2016 paper titled “Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network” demonstrates the use of GANs, specifically their SRGAN model, to generate output images with higher, sometimes much higher, pixel resolution.)
Video Prediction (Carl Vondrick, et al. in their 2016 paper titled “Generating Videos with Scene Dynamics” describe the use of GANs for video prediction, specifically predicting up to a second of video frames with success, mainly for static elements of the scene.)

Summary

In this article, you discovered a gentle introduction to generative adversarial networks and different GAN models. like what is GAN, Generative and discriminative algorithms, How GAN works, model interaction, Fundamental Steps, GAN problems,Cycle-GAN,Conditional GAN,Wasserstein Generative Adversarial Network,Deep Convolutional Generative Adversarial Network, and Application

In my next article, I will briefly describe the mathematical way to define GAN and different models of GANs,some steps to improve the efficiency of the model, and Cycle Consistency of adversarial networks for Video Video translation.

Thanks for reading!

Feel free to message me.

Twitter: aamirjarda
LinkedIn: aamirjarda
Instagram: aamir_jarda

Do you have any questions?
Ask your questions in the comments below and I will do my best to answer.

A Tour of Generative Adversarial Network Models

What is GAN?

Generative vs. Discriminative Algorithms

How do these models interact?

1. Cycle Consistent Adversarial Networks

Summary

2. CGAN (Conditional GAN)

Advantages

Summary

3. Deep Convolutional Generative Adversarial Network

How GANs Work

Summary

4. Wasserstein Generative Adversarial Network

Summary

Applications of GAN

Summary

Thanks for reading!

Written by Aamir Jarda