Review — BiGAN: Adversarial Feature Learning (GAN)

Bidirectional Generative Adversarial Networks (BiGANs): Learning the Inverse Mapping, from Image Space to Latent Space

Sik-Ho Tsang
May 1 · 5 min read

In this story, Adversarial Feature Learning, (BiGAN), by University of California, and University of Texas, is briefly reviewed. In this paper:

  • Bidirectional Generative Adversarial Network (BiGAN) is designed as a means of learning the inverse mapping, i.e. projecting data back into the latent space.
  • This resulting learned feature representation is useful for auxiliary supervised discrimination tasks.

This is a paper in 2017 ICLR with over 1100 citations. (Sik-Ho Tsang @ Medium)

The idea is the same as ALI while they are proposed independently and published in the same conference (2017 ICLR). Some papers would cite both BiGAN and ALI together when talking about this idea.

Outline

  1. BiGAN: Overall Structure
  2. Experimental Results

1. BiGAN: Overall Structure

BiGAN: Overall Structure
  • In addition to the generator G from the standard GAN, BiGAN includes an encoder E which maps data x to latent representations z.
  • The BiGAN discriminator D discriminates not only in data space (x versus G(z)), but jointly in data and latent space (tuples(x, E(x)) versus (G(z), z)), where the latent component is either an encoder output E(x) or a generator input z.
  • The data space is flatten as a vector and concatenated with the latent space vector, then input into the discriminator D.
  • In this context, a latent representation z may be thought of as a “label” for x, but one which came for “free,” without the need for supervision.
  • The BiGAN training objective is defined as a minimax objective:
  • where
  • The same alternating gradient based optimization as GAN is used.
  • In one iteration, the discriminator parameters θD are updated by taking one or more steps in the positive gradient direction.
  • Then, the encoder parameters θE and generator parameters θG are together updated by taking a step in the negative gradient direction.

A model trained to predict features z given data x should learn useful semantic representations. BiGAN objective forces the encoder E to do exactly this.

In order to fool the discriminator at a particular z, the encoder must invert the generator at that z, such that E(G(z)) = z.

  • At that moment, generating high resolution images remains difficult for generative models. Thus, the encoder may take higher resolution input while the generator output and discriminator input remain low resolution.

2. Experimental Results

  • BiGAN is trained by first training them unsupervisely, then transferring the encoder’s learned feature representations for use in auxiliary supervised learning tasks.

2.1. Permutation-Invariant MNIST

  • Each 28×28 digit image must be treated as an unstructured 784D vector.
  • The latent vector z is 50D vector.
One Nearest Neighbors (1NN) classification accuracy (%) on the permutation-invariant MNIST test set in the feature space
  • AE is the Autoencoder learnt by l2 or l1, proposed by Prof. Hinton in 2006.

All methods, including BiGAN, perform at roughly the same level. This result is not overly surprising given the relative simplicity of MNIST digits.

Qualitative results for permutation-invariant MNIST BiGAN training, including generator samples G(z), real data x, and corresponding reconstructions G(E(x)).

Digits generated by the generator G in nearly perfectly match the data distribution (qualitatively), as shown above.

2.2. ImageNet

  • The encoder E architecture follows AlexNet through the fifth and last convolution layer (conv5), with local response normalization (LRN) layers removed and batch normalization with leaky ReLU non-linearity applied to the output of each convolution at unsupervised training time.
  • The encoder input images have the size of 112×112 or 64×64.
  • The latent vector is 200D vector.
Qualitative results for ImageNet BiGAN training, including generator samples G(z), real data x, and corresponding reconstructions G(E(x)).

As shown above, the reconstructions, while certainly imperfect, demonstrate empirically that the BiGAN encoder E and generator G learn approximate inverse mappings.

Classification accuracy (%) for the ImageNet LSVRC validation set
  • The above evaluation is performed with various portions of the network frozen, or reinitialized and trained from scratch.
  • e.g., in the conv3 column, the first three layers – conv1 through conv3 — are transferred and frozen, and the last layers — conv4, conv5, and fully connected layers — are reinitialized and trained fully supervised for ImageNet classification.

BiGAN is competitive with these contemporary visual feature learning methods.

2.3. PASCAL VOC

Classification and Fast R-CNN detection results for the PASCAL VOC 2007 test set and FCN segmentation results on the PASCAL VOC 2012 validation set
  • The transferability of BiGAN representations to the PASCAL VOC is evaluated.
  • Classification models are trained with various portions of the AlexNet model frozen.
  • In the fc8 column, only the linear classifier (a multinomial logistic regression) is learned — in the case of BiGAN, on top of randomly initialized fully connected (FC) layers fc6 and fc7.
  • In the fc6–8 column, all three FC layers are trained fully supervised with all convolution layers frozen.
  • Finally, in the all column, the entire network is “fine-tuned”.
  • BiGAN outperforms other unsupervised (unsup.) feature learning approaches, including the GAN-based baselines, and despite its generality, is competitive with contemporary self-supervised (self-sup.) feature learning approaches specific to the visual domain.
  • (If interested, please read the paper for more details.)

Reference

[2017 ICLR] [BiGAN]
Adversarial Feature Learning

Generative Adversarial Network (GAN)

Image Synthesis [GAN] [CGAN] [LAPGAN] [AAE] [DCGAN] [CoGAN] [SimGAN] [BiGAN]
Image-to-image Translation [Pix2Pix] [UNIT]
Super Resolution [SRGAN & SRResNet] [EnhanceNet] [ESRGAN]
Blur Detection [DMENet]
Camera Tampering Detection [Mantini’s VISAPP’19]
Video Coding
[VC-LAPGAN] [Zhu TMM’20] [Zhong ELECGJ’21]

My Other Previous Paper Readings

Nerd For Tech

From Confusion to Clarification

Nerd For Tech

NFT is an Educational Media House. Our mission is to bring the invaluable knowledge and experiences of experts from all over the world to the novice. To know more about us, visit https://www.nerdfortech.org/.

Sik-Ho Tsang

Written by

PhD, Researcher. I share what I've learnt and done. :) My LinkedIn: https://www.linkedin.com/in/sh-tsang/, My Paper Reading List: https://bit.ly/33TDhxG

Nerd For Tech

NFT is an Educational Media House. Our mission is to bring the invaluable knowledge and experiences of experts from all over the world to the novice. To know more about us, visit https://www.nerdfortech.org/.