A Review of Generative Adversarial Networks — part 1

Roy Ganz
Analytics Vidhya
Published in
4 min readDec 30, 2020
General GAN architecture

The Hype around Generative Adversarial Networks (GANs) in recent years has grown massively. Since the first GAN, introduced by Ian Goodfellow et al., more than 500 (!!!) GAN architectures were suggested and implemented. The progress in this field is extremely fast, and new architectures are designed on a regular basis.

In this post, I will cover the most important basic GANs architectures and their main contributions. I will cover and examine DCGAN, cGAN and ACGAN.

DCGAN —

paper: UNSUPERVISED REPRESENTATION LEARNING WITH DEEP CONVOLUTIONAL GENERATIVE ADVERSARIAL NETWORKS

Deep Convolutional Generative Adversarial Network GAN is an architecture in which both the generator and the discriminator are based on deep convolutional neural networks. The Generator is based on transpose convolutions and the Discriminator is based on regular convolutions.

DCGAN’s Generator

Main contributions:
in this paper, the authors suggest several guidelines for designing stable DCGANs:

  • Avoid pooling layers and replace them with strided-convolutions. Strided- Convolutions in the discriminator provide better gradients to the generator.
  • Use transposed convolutions (fractional-strided convolutions) in the generator.
  • Use batch normalization in both the generator and the discriminator.
  • Remove fully connected hidden layers for deeper architectures.
  • In the generator, use ReLU activation in hidden layers and Tanh for the output.
  • In the discriminator, use LeakyReLU for all layers.

Besides the guidelines, they showed in the paper the concept of latent space interpolation:

latent space interpolation

Namely, if two different latent vectors are mapped by the generator to two different images, how does the generator map a linear combination of these vectors? The figure below is taken from the original DCGAN paper and demonstrates this concept very well:

We can see from the image above that a “middle point” between two latent vectors is mapped to a perceptual “middle point” between the corresponding generated images.

cGAN —

paper: Conditional Generative Adversarial Nets

Conditional Generative Adversarial Network is an architecture in which the generator and the discriminator use additional information. For example, they can use labels.

The architecture’s overview is depicted below:

cGAN

Implementation details:

  • The generator receives a latent vector and an additional information vector. It embeds the last to the shape of the latent vector and performs element-wise multiplication. The product is fed to the generator.
  • The discriminator receives an image and an additional information vector. It embeds the last to the image’s shape and performs an element-wise multiplication and feeds the product to the discriminator.

Main contributions: this paper introduces a framework that enables the GAN architecture to benefit from additional information.

ACGAN —

paper: Conditional Image Synthesis with Auxiliary Classifier GANs

Auxiliary Classifier Generative Adversarial Network is an architecture for conditional image synthesis in which the discriminator has two objectives:

  1. Discriminate between Real and Fake images
  2. Classify the input images

Discriminator’s architecture:

ACGAN’s Discriminator

The discriminator performs Multitask Learning — the network has a shared convolutional base model and two different models to generate the outputs of the different tasks: Classification and Discrimination.

Implementation details:

  • The Discriminator receives as input an image (Generated / Real) and it is trained as follows: if the image is real, it is trained to predict that it is real and also to classify it correctly. If the input image is fake, the discriminator is trained to predict it as fake.
  • The Generator receives as input latent vector and a target label vector. Its task is to generate a genuine-looking sample from the target label distribution. For example, if the target label is “dog” and the input latent vector is z, G(z) should look like a dog for the discriminator. The generator is trained to “fool” the discriminator — cause it to “think” that its samples are real and from the target class.

Main contribution: this paper shows how to harness Multitask learning in the form of auxiliary classification to improve conditional image synthesis.

In the next post of the series, I will cover some of the advanced GAN architecture and techniques.

--

--

Roy Ganz
Analytics Vidhya

MSc student at Technion for Deep Learning and Computer Vision