Generative Adversarial Network (GAN)

8 min readMay 6, 2024

A concept which is fairly new into the world of deep learning and artificial intelligence and is proved to be prominent in approaching generative AI is the generative adversarial network (GAN), which had it’s seminal paper published in the year 2014.

In this we have a deep neural architecture, which comprises of two neural networks competing against one another to yield a certain output. Hence, the name has the term “adversarial” in it, which implies a conflict or opposition.

To get a rough idea, we can interpret them as neural networks which are trained in an adversarial manner to generate data mimicking a certain class or distribution, and then verifying the generated samples.

So before we move into the architecture of GANs, let us see the two type of models in machine learning, which are :

Discriminative Model

This type of models primarily discriminates or segregates the data into categories or performs a boolean operation to check the data adheres to certain constraints.
For example, a model of binary classification, which is suppose to state whether an image is of a dog or a cat.

Generative Model

A generative model is trained in such a way on data samples, that it is suppose to generate data points i.e. give new data points which approximately follow the same distribution of the trained data points, compared via some mathematical metric i.e. are as close to the training data points as possible, almost as if they are a part of the original data set.
For example, a model of name generation, which is suppose to generate a name in a certain language based on it’s training data points consisting of language phonetics and name suffixes/prefixes.

With the above being said, let us now look ahead at the architecture of a generative adversarial network :

So as we can see in the image above, we have two components working in sync to create the GAN network.

Firstly, we feed randomly generated samples from the latent space, which act as decoy samples in contrast to the real ones. This means that the latent space will have samples as close to real samples as possible, but not exactly the same.

To put it simply, imagine the latent space as a big box filled with random numbers. The generator selects a handful of these numbers and uses them as a starting point to generate a new sample. These starting points are like the “ingredients” for the generator to build something new, equivalent to real data, but not an exact copy so to say. Often at times, the latent space is high dimensional i.e. it gives the network many numbers to select from and narrow them down to create a sample.

With those, we add noise or variance to and send them to the generative samples. These in turn produce fake samples as we call them, due to the fact that these have yet not seen the real data. These generate samples of only those data points which are from the latent space. Basically, the generator learns to map these random points in the latent space to realistic data points.

Following which, we send those generated samples and the real data points together to the discriminator model, which tells whether the sample is from the real sample space or is a fake one i.e. it gives a decision.

This decision is compared and evaluated with a certain metric which leads to the calculation of the loss function. Since these are primarily neural networks, we eventually back propagate and adjust the weights and bias parameters of both the networks to minimise the loss function. This entire process defines the flow of GANs.

Back Propagation

Now, let us take for example a GAN network which is to work on whether the image is of a digit or not, and how we train the discriminator, as shown in the following example :

Before sending the data point of both the generated and real samples, we label them as 0 and 1 respectively. After this, the discriminator works as a classic ANN to segregate between both the classes and get the ŷ (predicted label).

Following which, the updating of weights and bias takes place, just as it would in a simple neural network of binary classification, which leads to the fact that the back propagation of the discriminator is fairly simple and straight forward.

A point here to remember is, that whenever we train and update the discriminator, the generator is assumed to be fixed i.e. we let it’s weights and bias be constant. Here, we are only trying to strengthen the capabilities of the discriminator to classify real and fake data points.

Next, let us have a view on how we can train the generator, as shown in the example below :

Now here we have a very interesting way to work with to train the generator. Just like the above, here we keep the discriminator fixed i.e. it’s weights and bias are constant. The main reason for that is that in this part, we do not want our discriminator to become to strong to accurately guess every image coming from the generator.

We generate samples from the noise vector and label them as 1 i.e. implying that these are real. The whole idea is to basically fool the discriminator into believing that these are real samples and then send them to the discriminator to train. We get the final loss which we then propagate to the generator to update it’s weights and bias in such a way that it strengthens the ability of the generator to be smart enough to bypass the discriminator’s check.

Towards the end of this process, the discriminator outputs the value of 0.5, implying it’s confusion to distinguish a real image from a fake one.

Limitations of GAN

There are some limitations to this architecture, which are important for us to know to get a holistic picture of this framework.

The first problem which arises is the problem of the vanishing gradients while training the generator model, a problem inherent with training deep neural networks. When we first train the discriminator, it becomes strong enough to detect a real image from a fake one, which works as intended. The discriminator send small loss signals back to the generator as it now has become strong enough to distinguish between real and fake images.

However post this process, when we head to train the generator model, the derivative of it’s weights and biases in back propagation starts to converge to 0. This simply means that the generator now becomes a lousy generator over the course of time as it cannot generate samples which are close to the real ones. With the discriminator being strong enough to detect the samples and classify them accurately, we are not able to strengthen the abilities of the generator.

In essence, the vanishing gradient problem prevents the generator from effectively learning from the discriminator’s feedback, as the feedback from the discriminator is indirect. This leads to a situation where it cannot adapt and generate realistic data that can fool the discriminator. This creates a training bottleneck, limiting the overall effectiveness of the GAN.

Think of the generator as a student trying to learn from a teacher i.e. the discriminator. If the teacher only gives vague feedback (small loss signal), the student struggles to understand what they need to improve (weaker updates).

Another problem is of the mode collapse. During the training, the generator performs certain epochs or iterations to generate fake samples for the process. Now during this, we may hit a situation where the generator can collapse to a setting where it always produces the same output. This implies that as although the epochs would progress, the samples generated in those would be the same and eventually, we may hit a stage where all or most images in an epoch are the exact same.

In real world data, the distribution which the real samples would be coming from would not be a simple gaussian distribution so to say. It can very well be a data with multiple modes across it’s regions i.e. it can be a multi-modal dataset. Our generator can very well get stuck in a single mode in a localised region instead of learning the overall data distribution, causing this problem.

Now to understand why this occurs, let us think of our data set having multiple modes. Now as we know, the main incentive of training the generator is to basically fool the discriminator by producing realistic fake samples.

As the generator learns to adapt it’s weights to produce good fake samples from a particular mode which successfully fool the discriminator, it achieves it’s target. With this, the reward it gets for fooling the discriminator in this mode is the same as it would for other modes, which gives the generator no incentive to move to other modes.

Over time, it only produces samples from a given mode as it tries to best achieve it’s target of fooling the discriminator with minimal effort. Basically, it overfits rather than expanding it’s data distribution.

One final problem we counter is of achieving the nash equilibrium. As we have seen above, we train one of the models while keeping the other one fixed. This although may be good for improvement in their individual capabilities, since both the models do not have contact with each other while training, their gradients simply never converge together.

It becomes a never ending recursive process over time as far as their gradients are concerned. This can also be thought of as a symptom of non-cooperative gain strategy which leads to this problem.

Conclusion

So to conclude, GANs have proved to be efficient in the field on generative AI. These two competing neural networks, the generator and discriminator, continuously improve through a min-max like game process.

While limitations like vanishing gradients and mode collapse exist, GANs hold immense potential in image generation, music composition, and various other fields. To read up more about this in depth, you can check out the paper here.

Credits

I would like to take the opportunity to thank Ahlad Kumar for his series in generative adversarial networks on his Youtube channel, which has allowed me to learn and present the above article. You can check out his Youtube channel here. Thanks for reading!

Generative Adversarial Network (GAN)

Back Propagation

Limitations of GAN

Conclusion

Credits

Written by Dhruv Pamneja