Generative Adversarial Networks
Generative Adversarial Networks (GAN’s) were initially proposed in a 2014 paper (https://arxiv.org/abs/1406.2661).
The idea primarily revolves around a simple idea: neural networks which compete with each-other, in the hope this competition pushes them to excel.
Initially excited by the idea, it took scientist several years to overcome the difficulties which I will discuss later in the blog.
For this examples, I will be using the fashion MINST dataset, which is compromised of images of clothes.
Generator
One half the the adversarial network, which takes in a random distribution (typically Gaussian) and outputs some data (usually an image). We can think of the inputs as representations of the image to be generated. Through this method it offers the functionality of a decoder in a variational autodecoder, and can be used in the same way to generate a new image. Through just feeding it noise it will output and image.
Lets import our libraries and start building one.
import keras
import tensorflow as tf
import matplotlib.pyplot as pltfrom keras import models
from keras import layersimport numpy as npcodings_size = 30generator = tf.keras.models.Sequential([
tf.keras.layers.Dense(100, activation='selu', input_shape = [codings_size]),
tf.keras.layers.Dense(150, activation = 'selu'),
tf.keras.layers.Dense(28 * 28, activation = 'sigmoid'),
tf.keras.layers.Reshape([28, 28])
])
Discriminator
The discriminator takes either a fake image from the generator, or a real image from the training set, and tries to guess if the input is fake or real. It is a simple binary classifier.
discriminator = tf.keras.models.Sequential([
tf.keras.layers.Flatten(input_shape=[28,28]),
tf.keras.layers.Dense(150, activation = 'selu'),
tf.keras.layers.Dense(100, activation = 'selu'),
tf.keras.layers.Dense(1, activation = 'sigmoid')
])
Combining these neural networks into a GAN:
gan = tf.keras.models.Sequential([generator, discriminator])
Training
During the training process, the generators and discriminator have the opposite goals: the discriminator tries to tell fake images from real ones, while the generator attempts to produce images that look real enough to trick the discriminator.
Because there are two networks, each training phase is divided into two parts:
In the first phase, the discriminator is trained. A batch of real images is samples from the training set and is taken with an equal number of fake images from the generator. the labels are 0 for fake, and 1 for real images.
In the second phase, the generator is trained. First, it is used to produce a batch of fake images, and the discriminator is used to tell if they’re fake or real (again). This time however, there are no real images, and all labels are set to 1 (real). This causes the generator to produce images the discriminator will think is real.
The generator never actually sees any real images — it simply learns what the discriminator believes are real images. The better the discriminator gets at discerning real from fake images, the more information about the real images gets fed back to the generator — so significant progress can be made.
Since this training loop is unusual , we must write a function to train it.
def train_gan(gan, dataset, batch_size, codings_size, n_epochs=50):
generator, discriminator = gan.layers
for epoch in range(n_epochs):
print("Epoch {}/{}".format(epoch + 1, n_epochs))
for X_batch in dataset:
# phase 1 - training the discriminator
noise = tf.random.normal(shape=[batch_size, codings_size])
generated_images = generator(noise)
X_fake_and_real = tf.concat([generated_images, X_batch], axis=0)
y1 = tf.constant([[0.]] * batch_size + [[1.]] * batch_size)
discriminator.trainable = True
discriminator.train_on_batch(X_fake_and_real, y1)
# phase 2 - training the generator
noise = tf.random.normal(shape=[batch_size, codings_size])
y2 = tf.constant([[1.]] * batch_size)
discriminator.trainable = False
gan.train_on_batch(noise, y2)
plot_multiple_images(generated_images, 8)
plt.show()
Here we can see the two phases of training in code, and in the second phase, images begin to be generated. When we visualise the images, we can see they begin to look roughly like clothes!
Other fun examples include composing music: https://www.youtube.com/watch?v=UWxfnNXlVy8
Problems
During training, as the networks attempt to outsmart each-other, a Nash equilibrium could be reached, where a generator makes perfectly realistic images, and the discriminators optimal strategy is guessing 50/50 on real and not real images. Unfortunately it’s not guaranteed. Mode Collapse happens when the generator becomes less diverse — for example becoming very adept at generating convincing images of trousers. It can fool the discriminator for some time with just trousers, so only produces those images until the discriminator improves. Once the discriminator is good enough, it moves on to producing images of shoes, and forgets about trousers. It cycles through all these classes, without ever really becoming good at any of them.
As well as this, because the networks are constantly pushing against each-other, their parameters may oscillate and become unstable. This causes training to begin properly, however it could diverge for no apparent reason. Many attempts at rectifying this have been published, including new cost functions or stabilising methods to limit mode collapse, for example experience replay and mini-batch discrimination. Both of these attempt to bring down the chance of overfitting.
Developments
The field is a moving one, and one more recent development is the Deep Convolutional Generative Adversarial Network. These were state of the art just a few years ago, and are not completely understood.
Adding convolutional layers proved to decrease stability of an already unstable network, but Alec Radford et al succeeded in 2015 (www.homl.info/dcgan). They came up with the following guidelines:
— replace any pooling layers with strided convolutions in the discriminator, and transposed convolutions in the generator
— Use batch normalisation in both the generator and discriminator, except int he generators output layer and discriminators input layer
— Remove fully connected hidden layers for deeper architectures
— Use ReLU activation in the generator for all layers, apart from the output layer which should use tanh
— Use leaky ReLU activation in the discriminator for all layers.
Just two epochs of training, while computationally much more intensive, produced the following results
It’s certainly an improvement from our initial model. Some cool examples include removing rain from photographs:
Filling in parts from damaged photographs:
3D object generation, and much more:
References
Aurelien Geron (2019) Hands-on Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems, 2