Deep Convolutional Generative Adversarial Network

Generate New Anime Faces

Published in

Analytics Vidhya

3 min readJun 10, 2020

Introduction

Generative Adversarial Networks are used to generate images that never existed before. They are able to learn about the world and create new versions of those images that never existed.

They are divided into two basic components:

A Generator — that creates the images.
A Discriminator — that assesses the images and tells the generator if they are similar to the trained images. These are based off real world examples.

When training the network, both the generator and discriminator start from scratch and learn together.

The objective of a GAN is to train a data generator in order to imitate a given dataset. The dataset to be imitated here is the anime faces dataset.
A GAN is similar to a zero sum game between two neural networks, the generator of data and a discriminator, trained to recognize original data from fakes created by the generator.

Here is a structure of a GAN :

DCGAN

DCGAN is one of the most popular and successful network design for GAN. It mainly composes of convolution layers without max pooling or fully connected layers. It uses convolutional stride and transposed convolution for the downsampling and the upsampling.

In a Deep Convolutional GAN, the data generator has the following structure :

It takes a noise vector as the input in order to diversify the potential outputs. This vector is reshaped into a structure with an important number of channels, followed by a succession of convolutional layers that will reduce the depth and form a pattern in the other dimensions until we are able to obtain the output, for a colorized image.

Ideally, after training, each dimension will then correspond to a feature of the image, for example the hair color of the character.

Following this is the discriminator, which for the current compleixty of the problem is a simple CNN that is flexible.

Here is an example of the training images:

Things to pay attention to during DCGAN training:

Cost functions may or may not converge using gradient descent due to the discriminator constantly evolving.
The discriminator must be powerful enough for the generator to progress.
Finally the learning rate and decay rate must be carefully chosen. A higher learning rate could lead to overfitting but at the same time the learning rate should be high enough for the generator to swiftly adapt to.The suggested exponential decay rate is 0.5.

Keras Implementation of the Generator :

Keras Implementation of the Discriminator :

Final Results

A few images in the results are convincing and they present a variety of different features including hair styles, hair colors or face orientations.
We are able to avoid mode collapse and with a stable accuracy of 80% for the discriminator during training.
However, the generator does not create globally convincing faces, for example some of them present different shape and colors between the two eyes.

References

[1] https://arxiv.org/abs/1511.06434