A friendly introduction to DCGAN

Harsh Vadalia
5 min readNov 14, 2018

--

Creativity is intelligence having FUN

--Albert Einstein

What is a GAN?

GAN stands for Generative Adversarial Network which is a class of neural networks that belong to unsupervised learning. Ian Goodfellow is known as the father of GANS when he made this discovery in 2014. You can check out his paper here. It is all about creating from nothing(given some training data), be it images, music, video clips etc. This is the biggest step to achieve intelligence as the machine will start creating stuff of it’s own, isn’t that cool?..We would be able to see the unseen.. Nowdays GANS have started painting pictures on their own, rendering images, creating movies from scratch, creating game environments and many more crazy stuff.

So how does this GAN work?

Consider an airport scenario where a criminal who is carrying drugs is trying to get past the security guard . In this case the criminal acts as a generator and the security guard is the discriminator. If you are not familiar with these two terms, we will discuss it in detail later. Now, this criminal will try to get past the security guard as cheekily as possible finding ways to hide his drugs in such a way that it looks like normal goods and the security guard on the other hand will do his best to prevent that person from doing a crime.

This is the basic structure of a simple GAN. It is composed of a discriminator and a generator. The role of the generator is to spawn fake images that resemble the training data images whereas the discriminator does its work to check if the generated image is really from the training data or is it a fake one. Now as the model gets trained, the generator starts trying to outsmart the discriminator by getting better and better on creating the fake images which highly resembles the training images and the discriminator starts to become a highly skilled and experienced security guard which tries its best to classify the image as real or fake. These two keep on playing like this upto a certain point where a balance is created when the generator starts creating perfect fake images that look as if they came directly from the training set. Now the discriminator is faced with no choice except to classify that image with a 50–50 probability of being real or fake.

https://wp-cdn-2.s3.amazonaws.com/wp-content/uploads/2017/09/generator_and_discriminator1.png

Here comes the math…

Let x be the data representing an image. This data is fed into the discriminator i.e. D(x), it outputs the probability that the image is real(came from training set). So hence D(x) works as a classifier as when it is high, the image is from the training data and if low, the image is generated one.

Further let us understand some notations:

  1. Pdata is the original distribution of the data set
  2. Pg is the estimated distribution

Now from the estimated distribution Pg, a latent vector z is chosen and with some noise, it is fed into the generator i.e G(z). The generator will then create a fake image based on the current estimated distribution i.e D(G(z)).

This D(G(z)) will output a probability that the image generated is from the training data set. Naturally, in the beginning this output will be very low as the Pg is far away from Pdata and the discriminator will discard every image considering it as fake. As the model gets trained, Pg starts getting closer to Pdata and there comes a neutralization point where both becomes equal.

Pg = Pdata

This state is very difficult to achieve but when it does, the discriminator has no choice but to give a 50–50 probability that it is a fake or real image.

According to Goodfellow’s paper, D will try to maximize the probability that it correctly classifies the fakes and real ones and on the other hand G will try to minimize the probability that D will try to predict its images as fake. Hence in this way both generator and discriminator play what is quoted as a ‘min-max’ game. This is represented by a GAN loss function which we won’t go into detail in this article as this is just an introduction, is given in the paper as:

Say hello to DCGAN!!

DCGAN stands for Deep Convolutional GAN which is simply an extension of a simple GAN. These type of GAN specifically use Convolutional layers in Discriminator and De-Convolutional layers in Generator. If you are not familiar with Convolutional neural nets, I suggest you go through that first. A link to understand it easily it is provided here (CNN)

As we can see, in the generator, the input is a latent vector z. It undergoes de-convolution based on some specified strides and an activation function(generally ReLU) and thus ultimately gets extended into a 3*64*64 image where 3 is the number of color layers and 64*64 is the dimension of the generated image.

The discriminator works as a classic CNN and performs convolutions and pooling on the generated image and finally the output we get is the probability of the image being real(from the training data). As it also has access to the training data set, it can compare and output the probability.

DCGAN is fun and exciting to experiment with. Personally I am a Pokemon fan so I loved the idea of creating pokemons using a GAN. A glimpse of some pokemons created using this is shown in the image below

https://i.ytimg.com/vi/rs3aI7bACGc/maxresdefault.jpg

References

  1. https://arxiv.org/pdf/1406.2661.pdf
  2. https://pytorch.org/tutorials/beginner/dcgan_faces_tutorial.html
  3. https://arxiv.org/pdf/1511.06434.pdf

--

--