GANs (generative adversarial networks)

Published in

AIGuys

9 min readSep 28, 2021

Generative Adversarial Networks(GANs for short) are an amazing recent innovation in Machine Learning, they had huge success since they were introduced in 2014 by Ian J. Goodfellow and co-authors in the article Generative Adversarial Nets.

And Yann Lecun Facebook AI director describes GANs as :

Generative Adversarial Networks is the most interesting idea in machine learning in last ten years — Yann Lecun (Facebook AI Director)

So GANs are very powerful generative models, they create their own new training data instead of collecting it, with GANs you can create images of humans who don’t exist, you can create your own Monalisa, you can transform your own image into different poses, and much more.

Through this article i’m going to talk about :

what are Generative Models?
What are Variational Autoencoders VAE?
What are GANs?
GANs Model.
Generative Adversarial Networks variants: DCGAN, Pix2pix, CycleGAN
Further Readings

Generative Models

In this section, we are going to talk about the most popular generative model architectures in Machine Learning, but first, we will introduce the Discriminative Models.

So discriminative models learn to distinguish between classes such as Dogs vs Cats, and this is what we called it classifiers, they take a set of features X, such as having a nose, long tongue, .. etc, and determine a category, for example, say if the image is a cat or a dog.

In other words, they try to model the probability of a class Y according to a set of features X, P(Y/X)

On the other hand, generative models try to learn how to make a realistic representation of some class. for example, here’s a realistic image of the cat below. They take random input represented by the noise ᶓ (it could take 3,5 or -4 or another random number).

So the point is the noise represents a random set of values that goes into the generative model. The generative model sometimes takes a class Y such as a cat or a dog as input, so from this inputs (ᶓ , Y) generate a set of features X that looks like a realistic image of cat.

You might ask why we use the noise ᶓ in this model ? and why we can’t just generate a cat or dog directly without using the noise.

So if we don’t use the noise, we will generate the same realistic image of the cat each time and this is not fun and not really interesting as we know now that generative models try to capture the probability distribution of the features X, with the addition of the noise ᶓ the model will generate realistic and diverse representations of this class Y.

There are many types of generative models and in this article, I will present the most popular ones. So I will talk about Variational autoencoders (VAE) and GANs.

Variational Autoencoders VAE

Let’s start with Variational autoencoders VAE, they work with two models, the encoder and the decoder and these are typically neuron networks.

So firstly they learn by feeding realistic images into the encoder, then the encoder’s job is to find a good way of representing that image in this wanky latent space.

For example, vector (1.2,-5.2,0) can be represented in the latent space by a point, so the VAE takes this latent representation or a point close to it and send it into the decoder. Then the decoder will try to reconstruct the realistic image that was given to the encoder before. But in the beginning, the decoder won’t be able to build a good image of a cat for example, so after training, we actually lop off the encoder and we will take random points from the latent space, and then the decoder will learn to produce a realistic image of a cat for example, the image below explain this process.

Well, what I describe here is just the autoencoder part, so the variational part actually adds some noise to the whole model and the training process. So instead of having the encoder encode the image into a single point in the latent space, it will encode the image into a whole distribution where the samples are points on that distribution after that put it into the decoder which will produce a realistic image.

https://theaiacademy.blogspot.com/2020/05/understanding-conditional-variational.html

GANs

Just like Variational autoencoders, GANs compose of two different Deep Neural Networks (generator and discriminator), but they work differently.

So the generator generates images just like the decoder in VAE and the discriminator is actually a discriminative model hidden inside of this.

In GANs the generator tries to learn how to produce fakes that look real, and the discriminator tries to figure out which ones are fake and which ones are real, so you can think of the generator as a painting forger and the discriminator as an art inspector.

so the generator tries to forge fake images to look as much as possible like a realistic image and this makes the work of the discriminator more difficult.

Now to start this game, all you need is a real image collection of what you want to generate, for example, make a generator that paints famous paintings.

Here are some cool links that generate fake images :

Fake people: https://www.thispersondoesnotexist.com/

Fake horses: https://thishorsedoesnotexist.com/

Fake chemicals molecules: https://thischemicaldoesnotexist.com/

GANs Model

Now after you understand the intuition behind GANs, let’s see how GANs architectures really work.

In a Basic GAN, the generator will take a random noise ᶓ as input, and then produce the fake examples Ẋ (X hat), and the discriminator will take these fake examples Ẋ with also some real examples X as input, and then calculate the probability of a class this output called a Ẏ(Y hat).

Just like any model in AI, GANs also need to be trained and here you need to train both generator and discriminator.

The discriminator will take fake examples and real ones without knowing which are the real ones and which are the fake ones, and then try to make a prediction of Ẏ of which are real and which are fake ones, after that the predictions are compared to Y using BCE cost function with the desired labels (0 for fake 1, for real )and that helps update its parameters θd, where d represents the parameters of the discriminator and these updates are applied only on one Neural Network and not on the generator.

So using real data in the training process helps the discriminator to predict the fake ones much better and that’s what makes a generator and a discriminator, compete against each other.

For the generator first, it generates a few fake examples Ẋ using noise as input ᶓ and then these are passed to the discriminator so the generator doesn’t know about the real examples. Then the discriminator does what I already said before and after updating the θd and computing the cost, the gradient is then propagated backwards and the parameters of the generator θg are updated, and this time only the generator, this one neural network that is getting updated in this process.

Here’s an illustration that explains how GANs work

Now you may ask why we use the BCE cost function and don’t use another cost function, well that is because BCE is designed to classification tasks and in our case, the discriminator works like a classifier where the two categories are fake and real.

Binary Cross Entropy (BCE) Cost Function

Generative Adversarial Networks variants: DCGAN, Pix2pix, CycleGAN

DCGAN

DCGAN is one of the designs of the most popular network for GAN, they use convolutional layers without max-pooling or fully connected layers in the discriminator and convolutional-transpose layers in the generator.

So the discriminator is made up of stridden convolution layers, batch normalization layers, and LeakyReLU activations without max-pooling layers.

And for the generator is made of transpose-convolutional layers, batch norm layers, and ReLU activations.

Pix2pix

Pix2pix from its name can figure that it translates a pix into another pix, in other words, it’s a convolutional Neural Network for image-to-image translation tasks. here’s an example of the pix2pix model:

we can also see it as a type of conditional GAN ( cGAN), where the generation of the output image is conditional on an input (a source image for example).

First, the generator produces images then the discriminator looks at the input/target pair and input/output pair and says how realistic they look.

we can represent it like that :

G: {x, z} → y (z → noise vector, x → input image, y → output image).

In Pix2Pix, the generator not just tries to reduce the loss from the discriminator but also try to make the fake distribution as close as possible to the real distribution and by using L1 or L2 loss.

The loss function of generator network is:

Generator’s Loss Function

here’s an illustration of how the Pix2pix Model works

CycleGAN

CycleGAN has the same goal as Pix2pix, it is used for image-to-image translation, its objective is to learn how to transform the image from domain X into another domain Y and vice versa, here’s an example of transforming a horse into a zebra.

A CycleGAN is composed of two discriminator and two generator networks. The discriminators, Dy and Dx, which are convolutional neural networks that classify an input image as real or fake, learn the mappings :

G : X → Y and F: Y → X respectively

Where the Dy helps G to translate X into output indistinguishable from domain Y and the same thing for Dx and F. Also the discriminator, use two loss function, least squares GAN (LSGAN) and cross-entropy loss BCE, and for adversarial losses, we also use two-cycle consistency losses, Forward cycle-consistency loss, and backward cycle-consistency loss to make sure that what we translate it from one domain into another gives the same result from the other domain to the first one.

In other words, we can say that it measures how good a reconstructed image is when compared to the original image. Thus, the total generator loss will be the sum of the generator losses and the forward and backward cycle consistency losses.

So the key thing with CycleGAN is that we don’t have before and after images.