Generative art developed by Mark Horsell.

GANs — What and Where?

Mustaffa Hussain
TheCyPhy
Published in
12 min readApr 19, 2020

--

Yann LeCun described GANs as “the most interesting idea in the last 10 years in Machine Learning”.

GANs stands for Generative Adversarial Networks. The name has it all, LITERALLY

GANS consist of 2 types of networks, Generative and Adversarial.

1. Generative Network are a class of networks which Generates

2. Adversarial Network are networks opposite of the Generative Network

GANs are like Jordan and Kobe. Both trying to defeat each other. In the process both improve their game and put on a wonderful show for the fans.

To understand it better, let's begin with an analogy. GANs take me back to my school days. We had this system of getting our diaries signed from parents after class tests. I sure did not do well in many subjects. So what I did was, I forged my dad's signature in order to escape from the scolding. To my misery, the teacher was too good to be fooled by my forgery. She informed my parents and things ended badly. But that didn't stop me from forging his signature again. Eventually, I got better at this and could sign alike that my dad once confused if he had signed or not. *** please act at your own risk 😜😜😜

following the analogy, the generator network is me and the adversarial network (also called discriminator) is the teacher. The signature is the distribution. So the generator is trying to mimic the distribution and the discriminator is trying to catch the forgery.

GANs were first presented by Ian J. Goodfellow and team in 2014 via the paper titled Generative Adversaria Networks. He presented and explained his paper in NIPS 2016. Here is the video of the lecture.

The magician himself. 2 hours of pure magic.

Lets now come back to our story. The generator trying to fool the discriminator. Essentially there two important factors for training a good generator

1. We feed robust and meaningful data to the discriminator.

2. We have a good discriminator in hand. The learning of the generator depends directly on how well the discriminator performs.

Flow diagram of a GAN architecture.

Generative models existed before GANS. Models prior to GANs tried to approximate the distribution of data. For example, lets again go back to the school days 😜. My section had 21 students. If all the heights of students in the class were recorded, it is easy to identify the distribution of the heights of students in the section. Let's say the heights follow a Normal distribution with mean 165 cm and a standard deviation of 3 cm. If a new student is to join, we now have the idea as to what his height maybe with some confidence. That is we can randomly generate a new student's height from the information of the distribution of the class. Moreover, different classes will have different parameters. This way one can identify if any randomly picked student belongs to which class. Here height acts as an attribute on which the distribution is modeled. But things are no so straight forward in the real world. We have hundreds of attributes for the data. As the number of attributes increases, the distributions become really complex. It becomes really hard to model these distributions. And to add to this misery, data in the form of images, videos, text are just too complex to model statistically.

GANs on the other hand magnificently handles this real-world problem. GANs fall in the category of the direct density learning algorithm. One can use any model of their choice in this setup given they can iteratively update the parameters and propagate the loss in training. For example, one can use a simple SVM as a discriminator with gradient updations. Its the elegant idea of two models in a competition resulting in Generator to model the actual distribution of the data.

Now that we know the basic underlying idea. Lets us first see what and where all Gans is used or to say how they have evolved from just generating faces. Actually the list is really vast, let's look at 10 interesting use cases of GANS and some interesting papers.

1. Data Augmentation

a. GAN 2014 - Generating new plausible samples was the application described in the original paper by Ian Goodfellow, et al. in the 2014 paper “Generative Adversarial Networks” where GANs were used to generate new plausible examples for the MNIST handwritten digit dataset, the CIFAR-10 small object photograph dataset, and the Toronto Face Database.

Examples of GANs used to Generate New Plausible Examples for Image Datasets.Taken from Generative Adversarial Nets, 2014.

b. DCGAN 2015- This was also the demonstration used in the important 2015 paper titled “Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks” by Alec Radford, et al. called DCGAN that demonstrated how to train stable GANs at scale. They demonstrated models for generating new examples of bedrooms.

Example of GAN-Generated Photographs of Bedrooms.Taken from Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks, 2015

Importantly, in this paper, they also demonstrated the ability to perform vector arithmetic with the input to the GANs (in the latent space) both with generated bedrooms and with generated faces.

Example of Vector Arithmetic for GAN-Generated Faces.Taken from Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks, 2015.

c. Progressive GAN 2017 -Tero Karras, et al. in their 2017 paper titled “Progressive Growing of GANs for Improved Quality, Stability, and Variation” demonstrate the generation of plausible realistic photographs of human faces. They are so real looking, in fact, that it is fair to call the result remarkable. As such, the results received a lot of media attention. The face generations were trained on celebrity examples, meaning that there are elements of existing celebrities in the generated faces, making them seem familiar, but not quite.

Examples of Photorealistic GAN-Generated Faces.Taken from Progressive Growing of GANs for Improved Quality, Stability, and Variation, 2017.
Example of Photorealistic GAN-Generated Objects and ScenesTaken from Progressive Growing of GANs for Improved Quality, Stability, and Variation, 2017.

It presented a new training methodology for generative adversarial networks. The key idea is to grow both the generator and discriminator progressively: starting from a low resolution, we add new layers that model increasingly fine details as training progresses. This both speeds the training up and greatly stabilizes it, allowing us to produce images of unprecedented quality

Progressive GAN also suggests a new metric for evaluating GAN results, both in terms of image quality and variation. As an additional contribution, Progressive GAN constructs a higher-quality version of the CelebA dataset.

d. BigGAN 2018 -Andrew Brock, et al. in their 2018 paper titled “Large Scale GAN Training for High Fidelity Natural Image Synthesis” demonstrate the generation of synthetic photographs with their technique BigGAN that are practically indistinguishable from real photographs. BigGAN has generated the state of art results in image generation.

Example of Realistic Synthetic Photographs Generated with BigGAN. Taken from Large Scale GAN Training for High Fidelity Natural Image Synthesis, 2018.

2. Image to Image Translation

a. Pix2Pix 2016 -Phillip Isola, et al. in their 2016 paper titled “Image-to-Image Translation with Conditional Adversarial Networks” demonstrate GANs, specifically their pix2pix approach for many image-to-image translation tasks.

Experimental results included translation tasks such as Translation of semantic images to photographs of cityscapes and buildings, Translation of satellite photographs to Google Maps, Translation of photos from the day to night, Translation of black and white photographs to color, Translation of sketches to color photographs.

Example of Photographs of Daytime Cityscapes to Nighttime With pix2pix.Taken from Image-to-Image Translation with Conditional Adversarial Networks, 2016.
Example of Sketches to Color Photographs With pix2pix.Taken from Image-to-Image Translation with Conditional Adversarial Networks, 2016.

b. CycleGAN 2017 -Jun-Yan Zhu in their 2017 paper titled “Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks” introduce their famous CycleGAN and a suite of very impressive image-to-image translation examples.

Experimental results demonstrated image translation cases: Translation from photograph to artistic painting style, Translation of horse to zebra, Translation of photograph from summer to winter, Translation of satellite photograph to Google Maps view. Translation of painting to photograph, Translation of sketch to photograph, Translation of apples to oranges, Translation of photograph to artistic painting.

Example of Four Image-to-Image Translations Performed With CycleGANTaken from Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks, 2017.
Example of Translation from Paintings to Photographs With CycleGAN.Taken from Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks, 2017.

3. Text to Image Translation

a. StackGAN 2016 -Han Zhang, et al. in their 2016 paper titled “StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks” demonstrate the use of GANs, specifically their StackGAN to generate realistic looking photographs from textual descriptions of simple objects like birds and flowers.

Example of Textual Descriptions and GAN-Generated Photographs of BirdsTaken from StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks, 2016.

b. Generative Adversarial Text to Image Synthesis 2016 -Scott Reed, et al. in their 2016 paper titled “Generative Adversarial Text to Image Synthesis” also provide an early example of text to image generation of small objects and scenes including birds, flowers, and more.

Example of Textual Descriptions and GAN-Generated Photographs of Birds and Flowers.Taken from Generative Adversarial Text to Image Synthesis.

c. Learning What and Where to Draw 2016 -Scott Reed, et al. in their 2016 paper titled “Learning What and Where to Draw” expand upon this capability and use GANs to both generate images from text and use bounding boxes and key points as hints as to where to draw a described object, like a bird.

Example of Photos of Object Generated From Text and Position Hints With a GAN.Taken from Learning What and Where to Draw, 2016.

c. TAC-GAN 2017 -Ayushman Dash, et al. provide more examples on seemingly the same dataset in their 2017 paper titled “TAC-GAN — Text Conditioned Auxiliary Classifier Generative Adversarial Network“.

Comparison between the results of TAC-GAN and StackGAN.The images were synthesized based on the caption on the top.

4. Semantic Image to Photo Translation

Ting-Chun Wang, et al. in their 2017 paper titled “High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs” demonstrates the use of conditional GANs to generate photorealistic images given a semantic image or sketch as input.

Example of Semantic Image and GAN-Generated Cityscape Photograph.Taken from High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs, 2017.

Experimental results demonstrated semantic image transalation on : Cityscape photograph, given semantic image, Bedroom photograph, given semantic image, Human face photograph, given semantic image, Human face photograph, given sketch.

5. Face Frontal View Generation

Rui Huang, et al. in their 2017 paper titled “Beyond Face Rotation: Global and Local Perception GAN for Photorealistic and Identity Preserving Frontal View Synthesis” demonstrate the use of GANs for generating frontal-view photographs of human faces given photographs taken at an angle. The idea is that the generated front-on photos can then be used as input to a face verification or face identification system.

Example of GAN-based Face Frontal View Photo GenerationTaken from Beyond Face Rotation: Global and Local Perception GAN for Photorealistic and Identity Preserving Frontal View Synthesis, 2017.

6. Face Aging

a. Grigory Antipov, et al. in their 2017 paper titled “Face Aging With Conditional Generative Adversarial Networks” use GANs to generate photographs of faces with different apparent ages, from younger to older.

Example of Photographs of Faces Generated With a GAN With Different Apparent Ages.Taken from Face Aging With Conditional Generative Adversarial Networks, 2017

b. Zhifei Zhang, in their 2017 paper titled “Age Progression/Regression by Conditional Adversarial Autoencoder” uses a GAN based method for de-aging photographs of faces.

Example of Using a GAN to Age Photographs of FacesTaken from Age Progression/Regression by Conditional Adversarial Autoencoder, 2017.

7. Photo Blending

Huikai Wu, et al. in their 2017 paper titled “GP-GAN: Towards Realistic High-Resolution Image Blending” demonstrate the use of GANs in blending photographs, specifically elements from different photographs such as fields, mountains, and other large structures.

Example of GAN-based Photograph Blending.Taken from GP-GAN: Towards Realistic High-Resolution Image Blending, 2017.

8. Super Resolution

a. SRGAN 2016- SR GANS drew a lot of attention and showed how powerful GANs could be. Christian Ledig, et al. in their 2016 paper titled “Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network” demonstrate the use of GANs, specifically their SRGAN model, to generate output images with higher, sometimes much higher, pixel resolution.

Example of GAN-Generated Images With Super Resolution. Taken from Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network, 2016.

b. Huang Bin, et al. in their 2017 paper tilted “High-Quality Face Image SR Using Conditional Generative Adversarial Networks” use GANs for creating versions of photographs of human faces.

Example of High-Resolution Generated Human FacesTaken from High-Quality Face Image SR Using Conditional Generative Adversarial Networks, 2017.

c. Subeesh Vasu, et al. in their 2018 paper tilted “Analyzing Perception-Distortion Tradeoff using Enhanced Perceptual Super-resolution Network” provides an example of GANs for creating high-resolution photographs, focusing on street scenes.

Example of High-Resolution GAN-Generated Photographs of Buildings.Taken from Analyzing Perception-Distortion Tradeoff using Enhanced Perceptual Super-resolution Network, 2018.

d. ProGanSR 2018 -Wang, et al. in their paper “A Fully Progressive Approach to Single-Image Super-Resolution” proposed (ProSR) that is progressive both in architecture and training: the network upsamples an image in intermediate steps, while the learning process is organized from easy to hard, as is done in curriculum learning. ProGanSR, also follows the same progressive multi-scale design principle.

ProSR and ProGanSR submission video in CVPR

9. Photo Inpainting

a. Deepak Pathak, et al. in their 2016 paper titled “Context Encoders: Feature Learning by Inpainting” describes the use of GANs, specifically Context Encoders, to perform photograph inpainting or hole filling, that is filling in an area of a photograph that was removed for some reason.

Example of GAN-Generated Photograph Inpainting Using Context Encoders.Taken from Context Encoders: Feature Learning by Inpainting describe the use of GANs, specifically Context Encoders, 2016.

b. Raymond A. Yeh, et al. in their 2016 paper titled “Semantic Image Inpainting with Deep Generative Models” use GANs to fill in and repair intentionally damaged photographs of human faces.

Example of GAN-based Inpainting of Photographs of Human FacesTaken from Semantic Image Inpainting with Deep Generative Models, 2016.

c. Yijun Li, et al. in their 2017 paper titled “Generative Face Completion” also use GANs for inpainting and reconstructing damaged photographs of human faces.

Example of GAN Reconstructed Photographs of FacesTaken from Generative Face Completion, 2017

10. Video Prediction

a. Carl Vondrick, et al. in their 2016 paper titled “Generating Videos with Scene Dynamics” describes the use of GANs for video prediction, specifically predicting up to a second of video frames with success, mainly for static elements of the scene.

Example of Video Frames Generated With a GAN.Taken from Generating Videos with Scene Dynamics, 2016.

b. DVD-GAN 2019 -Clark, et al. in their 2019 paper titled “Adversarial Video Generation on Complex Datasets” propose the Dual Video Discriminator GAN for video synthesis and video prediction for up to 2 seconds of video frames with success. the out is of 256 X 256 and 48 frames.

Example of Video Frames Generated With a DVD-GAN.Taken from Adversarial video generation on complex dataset, 2019.

KEY POINTERS before we go deep into GANS!!!

GANs is definitely a millennial idea. The simplicity of the Idea makes the construction of GANs very easy. Though GANs are easy to design, it is very hard to train them. It is very much possible to generate Outputs which are as good as noise. There is no guarantee for training and it is very hard to mimic prior results.

Moreover, there are no standardized evaluation methods to analyze the output of GANs. Empirically observing the similarity in real data set and Output data needs better quantifications. In the case of images, we have inception scores.The GANs have come a long way from just generating recognizable facial structures to very high-resolution images. BigGAN Inception score is 155–165, whereas real images are at 230–235.

There are hundreds of specialized GANs but none robust and generalizable. GANs are heavily dependent on input data set for their learning. That is they fail for any other task, even though the tasks might be very similar. For example generation of Sanskrit and Hindi text.This an active area of research. We are far from making good general multi-objective GANs.

This image sums it all up. credits: Deep learning for computer vision, chapter 8, manning publication

We have essentially covered what is GANs and where are GANs used. This leaves us to ponder about How it works. I intend to write about the underlying mathematics, nash equilibrium, minimax optimization and changes GANs have seen in the past 5 years in another blog.

Related Content: this is really interesting content. just watch…

References :

The content of the post are heavily inspired and derived from -

https://developers.google.com/machine-learning/gan/generative

https://towardsdatascience.com/understanding-generative-adversarial-networks-gans-cd6e4651a29

https://towardsdatascience.com/gans-n-roses-c6652d513260

https://towardsdatascience.com/do-gans-really-model-the-true-data-distribution-or-are-they-just-cleverly-fooling-us-d08df69f25eb

https://www.groundai.com/project/mc-gan-multi-conditional-generative-adversarial-network-for-image-synthesis/1

https://towardsdatascience.com/an-end-to-end-introduction-to-gans-bf253f1fa52f

https://www.tensorflow.org/tutorials/generative/dcgan

--

--

Mustaffa Hussain
TheCyPhy

M.Sc Computer Science from South Asian University. I write to understand. Portfolio link- mustaffa-hussain.github.io/Portfolio