Creating New Humans With Generative Adversarial Networks and Deep Learning

William Chen
Geek Culture
Published in
8 min readFeb 4, 2021

--

Do you know anyone here?

Photo from NVlabs

Have you ever played this Video Game?

Pacman from Nv-tlabs

Maybe you said yes to both, or maybe only to the second one. After all, it is Pacman, the game we’ve all played at least once in our lives.

Right?

Well, in reality, neither the people nor the game is real. They were both generated using artificial intelligence, and more specifically a generative adversarial network.

What actually is a GAN, and how does it work?

A GAN or a Generative Adversarial network is a deep learning framework that excels at generating fake data points that are indistinguishable from real ones. This translates into GANs being able to generate lifelike faces and even recreate video games.

How do they work?

As the name implies, generative adversarial networks consist of two competing networks that work against each other. The conflict is harnessed to create the desired output.

The first neural network of a GAN is named the generator. Given a random input, the generator tries its best to generate a plausible output.

The second network is the discriminator. The discriminator’s job is to look at the generator’s output and determine whether it is real or fake.

The two networks are trained together and play a zero-sum game. The generator generates samples. These samples along with examples from the dataset are given to the discriminator. The discriminator classifies each image as real or fake. This classification is used to improve both networks. The discriminator is tweaked to get better at classifying images and identifying the ones produced by the generator. The generator is updated to create better images to fool the discriminator.

As they train and improve, the generator gets better at generating data until eventually, the generated results are indistinguishable from real ones.

The loss of a GAN during training. The networks compete, which is shown by the losses. If one network has a decrease in loss, the other one will have an increase and vice versa | image from StackExchange

The generator can be thought of as a counterfeiter trying to produce fake currency, while the discriminator is the police trying to detect the fake money. The discriminator compares the fake money with real money and gets better at detecting the counterfeits. Consequently, the generator has to work harder and make more convincing counterfeits. The discriminator and the generator compete at beating each other and improve their methods until the generator is able to produce counterfeits that are almost indistinguishable from genuine currency.

The structure of a GAN

The Nitty-Gritty

To improve the networks, the two networks have to aim to maximize a value. Thus, we train the discriminator to assign the correct label to both the generated images as well as the real samples. D(x) represents the probability that x came from the real dataset and not the generator. D(x) producing a value of 1, would mean the discriminator is 100% confident that the sample is a real image. To maximize the probability of the discriminator assigning the correct label to both generated and real samples, the discriminator tries to maximize

The discriminator maximizes its accuracy at classifying images. To do this D(x) of a real image should be close to 1, and D(G(z)) (the discriminator’s prediction on an image generated from the generator) should be close to 0.

Simultaneously, the generator is trained to maximize log( D( G(z ) ), where z is the input noise. log( D( G(z ) ) represents the discriminator's classification on a generator’s output. The higher the number, the more the discriminator thinks that the generator’s output is real, and thus the better the generator is performing.

How are GANs so good at generating images?

Convolution layers

structure of CNN | photo from wikipedia

Convolution layers, normally found in convolutional neural networks excel at dealing with images. These networks recognize shapes and patterns from images.

This power is harnessed by the discriminator to better classify the images, and the generator to create more realistic images with the same shapes and patterns as real images.

Together convolutional layers and the GAN structure combine to create a GAN type called a deep convolutional generative adversarial network or DCGAN.

Let’s Create our own GAN to Generate Realistic Human Eyes

Photo by Victor Freitas on Unsplash

Data Collection

Like all machine learning models, a high quality and plentiful dataset is required to achieve good results.

We load images into 3 channels for RGB, then resize them to 128x128. I found this resolution a good compromise between low computing times and high resolution. When images are loaded in, each pixel’s value is a float from 0 to 255. Since our generator’s activation function will be tanh, the pixel’s values must be mapped to values between -1 to 1 to match.

The dataset used was a combination of images found using this image downloading script and another eye dataset. This resulted in around 1000 images of close-up eyes. This may have been enough, but I decided to augment the data by horizontally flipping each image to create a total of 2000 images.

After loading and augmenting the images, we create a TensorFlow dataset on line 23, shuffle the images and separate them into batches. I trained the model with a batch size of 64, as that was the maximum my GPU could handle. The final images in the dataset look like this.

Building the Model

The discriminator

The discriminator is just a simple CNN. A sigmoid activation function is added to keep the output values between 0 and 1. The CNN outputting 1 would mean that it is 100% confident that the input image is a real eye from the dataset.

The generator

The generator is a little more complicated

It acts almost like a reverse CNN. It takes in a tensor of random noise and applies filters to upscale it to a 128x128 image. As it trains, the weights of the filters will improve to create better images.

Thus the generator’s input is a random tensor of size 100.

The tensor of shape 100, is connected to a dense layer of size 16*16*256. This is so that the layer can be resized to 16, 16, 256. In other words, the random noise is converted to a 16x16 image with 256 channels. The Conv2DTranspose layers reduce the channels while increasing the size of the output until finally, the generator outputs a 128x128 image with 3 channels.

Loss

The loss function is a crucial part of the GAN and any neural network. For GANs, we use two separate loss functions for the discriminator and the generator, to optimize both.

Generator Loss

The generator is being optimized to create an output that the discriminator will classify as a 1 for an image that resembles out of the training data. To optimize for that, we use binary cross-entropy loss, with y_true as one, and y_pred as the output of the discriminator when given images from the generator at the current training step. This captures how close the generated images are to a real image.

Discriminator Loss

Unlike the generator, the discriminator is being optimized on two things each step. It’s accuracy at matching the real training data to a label of 1 and the generated outputs to a label of 0. Therefore, the discriminator loss is made up of two separate losses summed up. Like the generator loss, binary cross-entropy is used.

Training

The main training loop that runs the model’s training. In each training step, noise is generated for the generator. Then, the generator’s output images are fed into the discriminator. Likewise, a batch of real eyes is fed into the discriminator. The discriminator’s two predictions are put into the loss function, then the gradient is calculated. Finally, the Adam optimizer applies the gradients to the two models. After, each epoch, an image is generated with examples of what the generator produces.

Results

400 epochs

Running the GAN for 400 epochs and gets the results below. The generator has learned the skin colour from the dataset and has a general idea of what eye sockets and pupils look like. However, the generated images are still blurry and have many artifacts

Eyes after 400 epochs

Another 400 epochs (800 epochs total)

After running the GAN for another 400 epochs, the generator has a better idea of the texture and colour of skin. It is beginning to generate proper eyes with good pupil shape and clours. The resolution of the images has also improved.

GAN after 800 epochs

Yet another 400 epochs (1200 epochs total)

After a total of 1200 epochs, the GAN is able to produce images with much finer detail. The generator is now able to generate eyelashes, and there is more definition around the eyes.

GAN after 1200 epochs

Final 400 epochs (1600 epochs total)

Finally, after 1600 epochs, the generator is mostly able to generate high-quality eyes with minor artifacts. The generator is not perfect, as shown in the image with blue skin, the image without an actual eye, and the image, with red instead of an eye. At the same time, the generator is able to generate almost lifelike eyes. Notably the fourth eye in the third column.

GAN after 1600 epochs

The full code, as well as the training data, can be found on Github

Conclusion

Even with a simple architecture, GANs are extremely powerful. They are able to generate eyes that are indistinguishable from real ones. With more complex architectures and deeper networks, GANs are able to create lifelike faces, recreate video games, colourize images and films, turn people into cartoons and much more.

styleGAN architecture | image from styleGANpaper

--

--

William Chen
Geek Culture

In Canada. Love Tech. Interested in Artificial Intelligence and Deep Learning