Understand Generative Adversarial Networks and Implement Them Easily Using Tensorflow.

Nitish Kumar Pilla
The Startup
Published in
12 min readFeb 2, 2021

This blog is a complete package for understanding Generative Adversarial Networks including its math intuition and Implementing it using Tensorflow.

What are Generative Adversarial Networks?

GAN is a deep neural network architecture comprised of two neural networks, competing for one against the other, that’s the reason the adversarial term is used.

GAN’s learn and mimic any distribution of data. That is, GANs can be taught to create new outputs. Their output is so good that it has the potential of both good and bad. Look at this website here. All the faces you see on this website are generated by a type of GAN called styleGAN2 ( created by NVIDIA ). It's difficult to identify which one is real or fake.

all completely fake — created by ThisPersonDoesNotExist.com using StyleGAN2

GAN Architecture

As I told you at the start, that GAN has two neural networks. They are Discriminator and Generator. Let's see how the whole architecture works and how both neural networks contribute to the architecture.

The role of the discriminator is to discriminate between two different classes and produce the result as fake (0) or not fake (1). Before knowing what classes it discriminates against and how it does, let's see how the generator works.

Taken from https://worthpreading.tistory.com/64 and modified a bit

In simple terms Generator or Generative model, G generates a new output that will be similar to the training data. Technically speaking, Generator is trained on the sampled data of training data which has the distribution D, when a random distribution data is given to the generator it produces a distribution that will be similar to the distribution of the training data according to some closeness metric. To understand clearly, have a look at the above image.

As you can see, both Discriminator and Generator are present in the image. The real samples are sent to the discriminator and meanwhile, the Generator receives Latent space z, which is a sample from the randomly distributed data. The generator creates output using z which has the distribution D¹. This D¹ will be nearly equal to the distribution of the real sample's distribution D.

Both real sample data and output generated by the generator are received by the discriminator. Now the target of the discriminator is to distinguish samples from D and D¹. If the output comes from the real sample distribution D it should give the output as 1 and if the output comes from the Generator then it should give the output 0.

If the Discriminator classifies incorrectly then feedback will happen in which fine-tuning of the discriminator and generator will happen. Fine-tuning is nothing but a backpropagation that will adjust the weights of the Discriminator and Generator. Don't worry about the terms D(x), D(G(z)), G(z), we will understand them when we explore the loss function. Now you understood the outline of GAN architecture. Now let's get into the maths part.

The loss function

Taken from GAN published paper

The above equation is the loss function for GAN. Don't panic, we will understand how the equation works. We know that the loss function is used to optimize the parameter values in a neural network model. Before understanding the loss function we will understand the basic conventions.

𝑃𝑑𝑎𝑡𝑎(𝑥) is the probability distribution of the training data and x is a sample from the data. when a sample data x is passed to the Discriminator then the notation we use is D(x; Θ). After passing the sample to D(x; Θ) it outputs a transformed variable D(x). This D(x) represents the probability that x is coming from the original dataset. This is the D(x) which is mentioned in the loss function. The range of D(x) will be present between 0 and 1.

The randomly distributed data which we create for feeding the Generator is called 𝑃𝑧(𝑥), where z is the samples generated by the random noise. when we fees z to the generator then we represent it as G(z; Θ1). Note that G and D are differentiable functions. when the randomly distributed data is passed to the generator, it outputs a transformed variable G(z) which will be closely related to the distribution of the training data after many iterations. The probability distribution of the transformed variable G(z) is 𝑃𝑧(𝑥). when we feed the G(z) into the discriminator, then we name it as D(G(z)). Now we know all the basic terms for understanding loss function. min-max in the equations tells that we are minimizing the loss of Discriminator and maximizing the loss of Generator. I know you are confused, you would be thinking why we are maximizing the loss function for the Generator and minimizing for the Discriminator?. Let's see why.

As you can see, the above graphs are the log plots from the loss function. As we know that the maximum value achieved by D(x) is 1. D(x)=1 says that whatever the data which is coming from 𝑃𝑑𝑎𝑡𝑎(𝑥) is classifying correctly, so maximizing the function log(D(x)) helps it to reach 1 and correctly classify the data by the discriminator.

Graph (b) from the above image indicates log(1-D(G(x))) .Here the maximum plot occurs at 0. It means that whatever the values that come from G(z) and passed through D(z) is classified as zero. So if we maximize log(1-D(G(x)), the graph will be forced to achieve zero, which makes the discriminator easily classify it as a fake one.

But we want the discriminator to classify the output which is coming from the generator as the real one. so we want the discriminator to classify the output generated by the generator as 1. To do that we need to force the graph of log(1-D(G(x)) to achieve 1 rather than zero. So we will minimize the graph log(1-D(G(x)) so that it will be forced to produce 1. In this way, both Discriminator and Generator fight with each other.

𝐸𝑥~𝑝𝑑𝑎𝑡𝑎(𝑥) means Expectation of the defined function where x is sampled from the 𝑃𝑑𝑎𝑡𝑎(𝑥). 𝐸𝑧~𝑝𝑧(𝑍) means the Expectation of the defined function where Z is sampled from 𝑃𝑧(𝑍).

Now you understood loss function and how it works.

Training and Backpropagation in GAN

To train both Discriminator and Generator, we will be doing it in alternating periods.

  1. At first, we will train the Discriminator for 1 or more epochs by keeping the Generator idle.
  2. Then we will keep the Discriminator idle and train the Generator for one or more epochs.
  3. In this way, we will repeat the above steps sequentially to train the GAN network.

Look at the image above which is taken from the GAN published paper by Good Fellow in 2014. They mentioned that they had used mini-batch stochastic gradient descent for training GAN. I assume that you know how mini-batch stochastic gradient descent works. If you are not aware of stochastic gradient descent I recommend you to understand it from this medium blog written by Aishwarya V Srinivasan who is a data scientist at Amazon.

  1. At first, they took a random number of training iterations ( it can be changed according to the need ) and they are sampling m noise samples from 𝑃𝑔(𝑍) which is Fake data.
  2. Simultaneously they are sampling m examples from the training data 𝑃𝑑𝑎𝑡𝑎(𝑥) which is original data.
  3. Next, we update the discriminator by ascending the stochastic gradient because it is the maximum optimization that can be done concerning the discriminator. Here updating means updating the weights and bias of the Discriminator network.

In the above equation, delta means derivative, Θ d represents the parameters (weights, bias )that are involved in the Discriminator network. (1/m)∑ from i=1 to m means Montecarlo estimation of expectation operator. we seen an expectation operator in the loss function before, that expectation operator is replaced by Montecarlo estimation here. we do know that log(D(x)) and log(1-D(G(z)) are the first and second parts of the loss function. Now we trained the Discriminator. Now let's train the Generator.

  1. For Generator, we take m noise samples from 𝑃𝑔(𝑍).
  2. Update the generator by descending its stochastic gradient because as we discussed before minimizing the log(1-D(G(z)) will force it to reach 1.

From the above equation, Θ d represents the parameters (weights, bias ) that are involved in the Generator network. we do know what does (1/m)∑ from i=1 to m mean. This is how GAN works. There are some drawbacks to GAN, Let's see what are those drawbacks.

Drawbacks of GAN

Some of the drawbacks of GAN are:

1. Vanishing Gradient problem: The derivatives concerning weights and bias tends to zero during the initial start of the training process because, during the start, the discriminator has trained well and the generator gives a poor output which can be classified easily by the Discriminator, which tends the weights and bias close to zero. To avoid this situation we change the loss function of the generator to a max of log(D(G(z)).

2. Model Collapse: During the process of training GAN, the Generator starts generating the same output, which is called the model collapse

3. Difficulty in Achieving Nash Equilibrium: when minimization and maximization of G and D loss function are going on, the oscillations grows more and more, the whole model tends to become more unstable. Instead of finding equilibrium in oscillations, they become unstable.

4. Difficulty in understanding the perspective of an image: GAN faces difficulty in learning the perspective of an image. It faces difficulty to adapt the 3D images.

Implementation of GAN model using TensorFlow

We are going to implement the whole GAN model using TensorFlow. Let’s start implementing the code.

  • Importing necessary libraries
import numpy as np 
import tensorflow as tf
import matplotlib.pyplot as plt
import wget
import tensorflow as tf
import numpy as np
  • Downloading the required files from the links below:

http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-images-idx3-ubyte.gz
http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-labels-idx1-ubyte.gz
http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-images-idx3-ubyte.gz
http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-labels-idx1-ubyte.gz

wget.download('http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-images-idx3-ubyte.gz')
wget.download('http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-labels-idx1-ubyte.gz')
wget.download('http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-images-idx3-ubyte.gz')
wget.download('http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-labels-idx1-ubyte.gz')
  • Copy all the downloaded files to the MNIST_fashion folder.
  • Now we import input_data for reading the data.
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets('MNIST_fashion/')

we use the input_data library to read data. if you fail to import input_data use this link to download the file from my GitHub link and copy it into the location where your notebook is present and use the code below.

import input_data
mnist = input_data.read_data_sets('MNIST_fashion/')
  • Now we are going to set the training parameters ( learning rate, batch size, epochs) and network parameters ( image dimensions, no of neurons in the hidden layer of Generator, no of neurons in the hidden layer of Discriminator, noise data dimensions ).
learning_rate = 0.0002
batch_size = 128
epochs = 100000
image_dim = 784
# we are using image dimensions as 784 because we are converting
# 28x28 input image into a 784x1 dimension
gen_hidd_dim = 256
disc_hidd_dim = 256
z_noise_dim = 100 # 100x1 dimension
  • Next, we will create the Xavier Initialization function. This Xavier Initialization helps in achieving the convergence of the Neural Networks faster. To learn how Xavier Initialization works, check out this link.
def xavier_init(shape):
return tf.random.normal(shape=shape,stddev = 1./tf.sqrt(shape[0]/2.0))
  • Next, we will create weights and bias dictionaries To connect the input data ( 787 x1 dimensions) to the discriminator.
#initialising weights and bias for Discriminator and Generator Neural Networks...(using Xavier initialisation)
weights = {
"disc_H" : tf.Variable(xavier_init([image_dim, disc_hidd_dim])),
"disc_final": tf.Variable(xavier_init([disc_hidd_dim,1])),
"gen_H": tf.Variable(xavier_init([z_noise_dim, gen_hidd_dim])),
"gen_final": tf.Variable(xavier_init([gen_hidd_dim, image_dim]))
}
bias = {
"disc_H" : tf.Variable(xavier_init([disc_hidd_dim])),
"disc_final": tf.Variable(xavier_init([1])),
"gen_H": tf.Variable(xavier_init([gen_hidd_dim])),
"gen_final": tf.Variable(xavier_init([image_dim]))
}

Here disc_H means input data connecting to the discriminator’s Hidden layer. disc_final connects from hidden layer to output node. gen_H connects the input data to the generator’s hidden layer. gen_final connects between generator’s hidden layer to output layer.

  • Now we are creating Discriminator and Generator networks. As you can see below we had created a single-layer neural network for both Discriminator and Generator.
def Discriminator(x):
hidden_layer = tf.nn.relu(tf.add(tf.matmul(x, weights["disc_H"]), bias["disc_H"]))
final_layer = (tf.add(tf.matmul(hidden_layer, weights["disc_final"]), bias["disc_final"]))
disc_output = tf.nn.sigmoid(final_layer)
return disc_output
def Generator(x):
hidden_layer = tf.nn.relu(tf.add(tf.matmul(x, weights["gen_H"]), bias["gen_H"]))
final_layer = (tf.add(tf.matmul(hidden_layer, weights["gen_final"]), bias["gen_final"]))
gen_output = tf.nn.sigmoid(final_layer)
return gen_output
  • Here we are creating placeholders for accepting external input. x_input is the MNIST data. z_input is the noise that is used to feed the Generator.
z_input = tf.placeholder(tf.float32, shape = [None, z_noise_dim], name = "input_noise")
x_input = tf.placeholder(tf.float32, shape = [None, image_dim], name = "real_noise")
  • Now we are calling the Generator function which creates G(z) ( we know what is G(z)). Next, we are calling the Discriminator function which creates D(x) and we will pass the output_gen to the Discriminator which creates D(G(x)).
with tf.name_scope("Generator") as scope:
output_Gen = Generator(z_input) #G(z)

# Building the Disc NW
with tf.name_scope("Discriminator") as scope:
real_output_disc = Discriminator(x_input) #implements D(x)
fake_output_disc = Discriminator(output_Gen) # implements D(G(x))
  • The next is to create the Loss functions. we add the -ve sign for the loss functions because adding the -ve sign maximizes the loss function.
with tf.name_scope("Discriminator_Loss") as scope:
Discriminator_Loss = -tf.reduce_mean(tf.log(real_output_disc+ 0.0001)+tf.log(1.- fake_output_disc+0.0001))
# LF= log(D(x))+log(1-D(G(z)));

with tf.name_scope("Genetator_Loss") as scope:
Generator_Loss = -tf.reduce_mean(tf.log(fake_output_disc+ 0.0001)) # due to max log(D(G(x)))
#LF= log(1-D(G(z))) -> -log(D(G(z)));
# T-board summary

Disc_loss_total = tf.summary.scalar("Disc_Total_loss", Discriminator_Loss)
Gen_loss_total = tf.summary.scalar("Gen_loss", Generator_Loss)
  • Here all the generator and discriminator variables are kept in one list .we are optimizing the neural network using adam optimizer. we defined it in such a way that it minimizes the discriminator loss. During the optimization of the Discriminator network, we keep the Generator parameters fixed. The variables which are defined in var_list are subjected to change and other variables are fixed. Similarly, the generator network is also optimized by keeping the Discriminator variables fixed.
Generator_var = [weights["gen_H"], weights["gen_final"], bias["gen_H"], bias["gen_final"]]
Discriminator_var = [weights["disc_H"], weights["disc_final"], bias["disc_H"], bias["disc_final"]]
#Define the optimizerwith tf.name_scope("Optimizer_Discriminator") as scope:
Discriminator_optimize = tf.train.AdamOptimizer(learning_rate = learning_rate).minimize(Discriminator_Loss, var_list = Discriminator_var)
with tf.name_scope("Optimizer_Generator") as scope:
Generator_optimize = tf.train.AdamOptimizer(learning_rate = learning_rate).minimize(Generator_Loss, var_list = Generator_var)
  • Now we are initializing the variables and start training the model.
# Initialize the variablesinit = tf.global_variables_initializer()
sess = tf.Session()
sess.run(init)
writer = tf.summary.FileWriter("./log", sess.graph)
for epoch in range(epochs):

#dividing into batches
x_batch, _ = mnist.train.next_batch(batch_size)
#Generate noise
z_noise = np.random.uniform(-1.,1.,size = [batch_size, z_noise_dim])
#Discriminator
_, Disc_loss_epoch = sess.run([Discriminator_optimize, Discriminator_Loss], feed_dict = {x_input:x_batch, z_input:z_noise})
#Generator
_, Gen_loss_epoch = sess.run([Generator_optimize, Generator_Loss], feed_dict = {z_input:z_noise})
#Running the Discriminator summary
summary_Disc_loss = sess.run(Disc_loss_total, feed_dict = {x_input:x_batch, z_input:z_noise})
# Adding the Discriminator summary
writer.add_summary(summary_Disc_loss, epoch)
#Running the Generator summary
summary_Gen_loss = sess.run(Gen_loss_total, feed_dict = {z_input:z_noise})
# Adding the Generator summary
writer.add_summary(summary_Gen_loss, epoch)
if epoch % 2000 == 0:
print("Steps: {0}: Generator Loss: {1}, Discriminator Loss:{2}".format(epoch, Gen_loss_epoch, Disc_loss_epoch))
  • Now we are going to generate images from the noise using a generator network. I had commented wherever it is required.
# Testing
# Generate images from noise, using the generator network
import matplotlib.pyplot as plt
n = 6
canvas = np.empty((28*n, 28*n))
for i in range(n):
# Noise input
z_noise = np.random.uniform(-1.,1., size = [batch_size, z_noise_dim])
# Generate image from noise
g = sess.run(output_Gen, feed_dict = {z_input:z_noise})
# Reverse colors for better display
g=-1*(g-1)
for j in range(n):
# Draw the generated images
canvas[i*28:(i+1)*28, j*28:(j+1)*28] = g[j].reshape([28, 28])

plt.figure(figsize = (n,n))
plt.imshow(canvas, origin = "upper", cmap = "gray")
plt.show()
Generated images by our code

As you can see above, Generator had generated output that is similar to the input data.

There are many types of GAN like StyleGAN, StyleGAN2, Conditional GAN, CycleGAN ....and many more. Explore them, implement them and enjoy the miracles of GAN.

The below is the link, which contains all the code and step by step implementation of GAN. you can check it if you face any confusion on the whole code.

If you have any suggestions or Queries do contact me at my mail: nitishkumar2902@gmail.com.

--

--

Nitish Kumar Pilla
The Startup

Data Scientist Enthusiast, Master’s Student in Computer Science