Generative Adversarial Networks GAN: A gentle introduction for beginners

Imran us Salam
Red Buffer
Published in
7 min readApr 22, 2019

Generative Adversarial Networks:

Originally proposed by Ian Goodfellow.
Nowadays, if you’re following even a little bit of Computer Vision or the Machine Learning field, you would see some engineered GAN in almost every fourth post. The amount of publications on this topic is extraordinary and each new publication has something unique and new to offer. Very recently, an engineered version of GAN was published by NVIDIA. Which they refer to as GAN2.0. The published paper is able to reproduce images of faces that are 1. High Quality and 2. Unique. Therefore you end up with high-quality faces that were never seen before.
Another such paper was published (Submitted for ICLR 2019), called GANSynth. This, however, does not generate faces, but in fact, it generates music that has not been heard before.

These people don’t exist and the faces are created by GAN2.0

To understand the intuition or working behind GAN, we need to ask 4 things in particular.
1. What are they?
2. Why do they work so well?
3. When are they used?
4. How do they work?

NOTE: I am going to be very brief in explaining the terms, the motive is to give an intuitive guide for beginners.

Generative Modeling vs Discrimination Modeling

If we are to define the learning in machine learning, on a very basic scale we can divide the formulation in two distinct problems.
1. Generative Modeling
2. Discrimination Modeling

“Discrimination modeling” as the name suggests can be defined as to model a task where it’s goal is to discriminate between a certain set of examples or to discriminate between two or more distributions. A simple example is classifying between cats and dogs etc.

“Generative modeling” as name suggests can be defined as to model a task where it’s goal is to generate a certain set of examples (from some target label or distribution). Example are:
1. Generating the variables used given a target variable (Remember Bayes)
2. Generating a distribution from another distribution by some complex function f (remember this f) (GANS generating a distribution similar to something like a dog from a normal distribution)

Generative Modeling

Generative Modeling is now further divided into two main parts if we have a use case of generating distributions from some distribution.
1. Generative Matching Networks
2. Generative Adversarial Networks

Generative Matching Networks

“Generative Matching Networks”, remember the complex function f we used above when giving an example of generative modeling. This function f is usually a neural network that takes as input a random vector (following a distribution) that would generate a distribution equal (nearer) to the real distribution that follows the dog distribution.
Architecture is a Neural Network type which takes as input a random vector and the output layer gives another vector following some distribution.
The output distribution is then compared with the real distribution (say a dog) and an error is computed through some loss function.
This error is then used to backpropagate through the network to tune its parameters.
The loss function to compare two distributions is usually KL-Divergence or JS-Divergence. But here we use a Maximum Mean Discrepancy. The only thing you need to know is these loss functions compare two distributions (in our case, the real distribution of a dog and the generated output from the neural network). If you want to learn more about the Maximum Mean Discrepancy, I would suggest looking up the lectures of Ali Ghodsi on GANs.

The general architecture of a Generative Matching Network

Generative Adversarial Networks

“Generative Adversarial Networks”, in contrast to the Generative Matching Networks, the Generative Adversarial Networks have two models.
1. Generator Network
2. Discriminator Network

“Generator Network”, is again a complex function G(a neural network) that takes a random vector with some distribution tries to generate another vector that follows some particular distribution (example is a dog distribution). So we can define input as a vector that goes through a generator network to produce an output. That is the resultant of the function G.
Generally, in the previous setting, we had a loss function attached at the end of this setting. Except here we don’t.
Instead, here this output is transferred to another network called the Discriminator Network.

“Discriminator Network”, takes either the output of the generator network, a distribution or a real distribution. This is also modeled by a function D which is again a neural network.
The output of this network is a class from two classes. 1. Real. 2. Generated.
The goal of the discriminator network is to classify if either the distribution it received is the generated distribution or the real distribution.

A general Architecture for the GAN

The word adversarial in the GAN is explained through this architecture.
The discriminator tries to minimize the error, that is it wants to classify between the real and generated. The generator tries to maximize this error, as it wants to generate distributions that are equal or nearer to the real.
Wait, what???

This is the reason it’s called adversarial training. The opposite goal of each network tries to dismantle the other one. In a perfect setting, the generator would be generating samples that equally match the real samples and the discriminator would still be able to classify them correctly. Hence it would reach an equilibrium.

The loss term for the discriminator
The discriminator is trying to minimize the term but the generator is trying to maximize it.

So far, we’ve answered what are GANS and how do they work?
But why do they work so well?

The problem with generative or discriminating networks (except GANS) is that they do not generalize, that is they output for only the data they have been trained on. GANs solve this problem by adding a generator and discriminator.

Since with GANs they are unsupervised, they can also be used to generate training data.

Even now it’s very difficult to train GANs and there are particular reasons like Mode collapse and diminishing gradients. But I won’t go in detail of these problems. There is a very good read on them in this link.

A horse turned into a zebra with the Pix2Pix, An engineered version of GANs

Let’s code some vanilla GANs in TF2.0

# https://www.tensorflow.org/alpha/tutorials/generative/dcgan
# The piece of code is open source by tensorflow in the link above.
import tensorflow as tf
import glob
import imageio
import matplotlib.pyplot as plt
import numpy as np
import os
import PIL
from tensorflow.keras import layers
import time

from IPython import display
(train_images, train_labels), (_, _) = tf.keras.datasets.mnist.load_data()train_images = train_images.reshape(train_images.shape[0], 28, 28, 1).astype('float32')
train_images = (train_images - 127.5) / 127.5 # Normalize the images to [-1, 1]
BUFFER_SIZE = 60000
BATCH_SIZE = 256
# Batch and shuffle the data
train_dataset = tf.data.Dataset.from_tensor_slices(train_images).shuffle(BUFFER_SIZE).batch(BATCH_SIZE)
def make_generator_model():
model = tf.keras.Sequential()
model.add(layers.Dense(7*7*256, use_bias=False, input_shape=(100,)))
model.add(layers.BatchNormalization())
model.add(layers.LeakyReLU())

model.add(layers.Reshape((7, 7, 256)))
assert model.output_shape == (None, 7, 7, 256) # Note: None is the batch size

model.add(layers.Conv2DTranspose(128, (5, 5), strides=(1, 1), padding='same', use_bias=False))
assert model.output_shape == (None, 7, 7, 128)
model.add(layers.BatchNormalization())
model.add(layers.LeakyReLU())

model.add(layers.Conv2DTranspose(64, (5, 5), strides=(2, 2), padding='same', use_bias=False))
assert model.output_shape == (None, 14, 14, 64)
model.add(layers.BatchNormalization())
model.add(layers.LeakyReLU())

model.add(layers.Conv2DTranspose(1, (5, 5), strides=(2, 2), padding='same', use_bias=False, activation='tanh'))
assert model.output_shape == (None, 28, 28, 1)

return model
generator = make_generator_model()
def make_discriminator_model():
model = tf.keras.Sequential()
model.add(layers.Conv2D(64, (5, 5), strides=(2, 2), padding='same',
input_shape=[28, 28, 1]))
model.add(layers.LeakyReLU())
model.add(layers.Dropout(0.3))

model.add(layers.Conv2D(128, (5, 5), strides=(2, 2), padding='same'))
model.add(layers.LeakyReLU())
model.add(layers.Dropout(0.3))

model.add(layers.Flatten())
model.add(layers.Dense(1))

return model
discriminator = make_discriminator_model()# This method returns a helper function to compute cross entropy loss
cross_entropy = tf.keras.losses.BinaryCrossentropy(from_logits=True)
def discriminator_loss(real_output, fake_output):
real_loss = cross_entropy(tf.ones_like(real_output), real_output)
fake_loss = cross_entropy(tf.zeros_like(fake_output), fake_output)
total_loss = real_loss + fake_loss
return total_loss
def generator_loss(fake_output):
return cross_entropy(tf.ones_like(fake_output), fake_output)
generator_optimizer = tf.keras.optimizers.Adam(1e-4)
discriminator_optimizer = tf.keras.optimizers.Adam(1e-4)
checkpoint_dir = './training_checkpoints'
checkpoint_prefix = os.path.join(checkpoint_dir, "ckpt")
checkpoint = tf.train.Checkpoint(generator_optimizer=generator_optimizer,
discriminator_optimizer=discriminator_optimizer,
generator=generator,
discriminator=discriminator)
EPOCHS = 50
noise_dim = 100
num_examples_to_generate = 16

# We will reuse this seed overtime (so it's easier)
# to visualize progress in the animated GIF)
seed = tf.random.normal([num_examples_to_generate, noise_dim])
# Notice the use of `tf.function`
# This annotation causes the function to be "compiled".
@tf.function
def train_step(images):
noise = tf.random.normal([BATCH_SIZE, noise_dim])

with tf.GradientTape() as gen_tape, tf.GradientTape() as disc_tape:
generated_images = generator(noise, training=True)

real_output = discriminator(images, training=True)
fake_output = discriminator(generated_images, training=True)

gen_loss = generator_loss(fake_output)
disc_loss = discriminator_loss(real_output, fake_output)

gradients_of_generator = gen_tape.gradient(gen_loss, generator.trainable_variables)
gradients_of_discriminator = disc_tape.gradient(disc_loss, discriminator.trainable_variables)

generator_optimizer.apply_gradients(zip(gradients_of_generator, generator.trainable_variables))
discriminator_optimizer.apply_gradients(zip(gradients_of_discriminator, discriminator.trainable_variables))
def train(dataset, epochs):
for epoch in range(epochs):
start = time.time()

for image_batch in dataset:
train_step(image_batch)

# Produce images for the GIF as we go
display.clear_output(wait=True)
generate_and_save_images(generator,
epoch + 1,
seed)

# Save the model every 15 epochs
if (epoch + 1) % 15 == 0:
checkpoint.save(file_prefix = checkpoint_prefix)

print ('Time for epoch {} is {} sec'.format(epoch + 1, time.time()-start))

# Generate after the final epoch
display.clear_output(wait=True)
generate_and_save_images(generator,
epochs,
seed)
def generate_and_save_images(model, epoch, test_input):
# Notice `training` is set to False.
# This is so all layers run in inference mode (batchnorm).
predictions = model(test_input, training=False)

fig = plt.figure(figsize=(4,4))

for i in range(predictions.shape[0]):
plt.subplot(4, 4, i+1)
plt.imshow(predictions[i, :, :, 0] * 127.5 + 127.5, cmap='gray')
plt.axis('off')

plt.savefig('image_at_epoch_{:04d}.png'.format(epoch))
plt.show()
train(train_dataset, EPOCHS)

GANs are an interesting topic and they have so much to offer. Even Yann LeCun said that this is the best thing to happen in the field in the past some time.

Thanks for reading.

--

--