# Review — BiGAN: Adversarial Feature Learning (GAN)

## Bidirectional Generative Adversarial Networks (BiGANs): Learning the Inverse Mapping, from Image Space to Latent Space

In this story, **Adversarial Feature Learning**, (BiGAN), by University of California, and University of Texas, is briefly reviewed. In this paper:

- Bidirectional Generative Adversarial Network (BiGAN) is designed as a means of
**learning the inverse****mapping, i.e. projecting data back into the latent space.** - This resulting learned feature representation is useful for auxiliary supervised discrimination tasks.

This is a paper in **2017** **ICLR** with over **1100** **citations**. (

The idea is the same as ALI while they are proposed independently and published in the same conference (2017 ICLR). Some papers would cite both BiGAN and ALI together when talking about this idea.

# Outline

**BiGAN: Overall Structure****Experimental****Results**

**1. BiGAN: Overall Structure**

- In addition to the generator
*G*from the standard GAN,**BiGAN includes an encoder***E*which maps data*x*to latent representations*z*. **The BiGAN discriminator**not only in data space (*D*discriminates*x*versus*G*(*z*)), but**jointly in data and latent space (tuples(**, where the latent component is either an encoder output*x*,*E*(*x*)) versus (*G*(*z*),*z*))*E*(*x*) or a generator input*z*.- The data space is flatten as a vector and concatenated with the latent space vector, then input into the discriminator
*D*. - In this context,
**a latent representation z may be thought of as a “label” for**, but one which came for “free,” without the need for supervision.*x* - The BiGAN training objective is defined as a minimax objective:

- where

- The same alternating gradient based optimization as GAN is used.
- In one iteration, the discriminator parameters
*θD*are updated by taking one or more steps in the positive gradient direction. - Then, the encoder parameters
*θE*and generator parameters*θG*are together updated by taking a step in the negative gradient direction.

A model trained to predict features

zgiven dataxshouldlearn useful semantic representations.BiGAN objective forces the encoderEto do exactly this.

In order to fool the discriminator at a particularz, the encoder must invert the generator at thatz, such thatE(G(z)) =z.

- At that moment, generating high resolution images remains difficult for generative models. Thus, the encoder may take higher resolution input while the generator output and discriminator input remain low resolution.

# 2. **Experimental** **Results**

- BiGAN is trained by first training them unsupervisely, then transferring the encoder’s learned feature representations for use in auxiliary supervised learning tasks.

## 2.1. Permutation-Invariant MNIST

- Each 28×28 digit image must be treated as an unstructured 784D vector.
- The latent vector
*z*is 50D vector.

- AE is the Autoencoder learnt by
*l*2 or*l*1, proposed by Prof. Hinton in 2006.

All methods, including BiGAN, perform at roughly the same level. This result is not overly surprising given the relative simplicity of MNIST digits.

Digits generated by the generator

Gin nearly perfectly match the data distribution (qualitatively), as shown above.

## 2.2. ImageNet

- The encoder
*E*architecture follows AlexNet through the fifth and last convolution layer (conv5), with local response normalization (LRN) layers removed and batch normalization with leaky ReLU non-linearity applied to the output of each convolution at unsupervised training time. - The encoder input images have the size of 112×112 or 64×64.
- The latent vector is 200D vector.

As shown above, the reconstructions, while certainly imperfect, demonstrate empirically that

the BiGAN encoderEand generatorGlearn approximate inverse mappings.

- The above evaluation is performed with various portions of the network frozen, or reinitialized and trained from scratch.
- e.g., in the conv3 column, the first three layers – conv1 through conv3 — are transferred and frozen, and the last layers — conv4, conv5, and fully connected layers — are reinitialized and trained fully supervised for ImageNet classification.

BiGAN is competitive with these contemporary visual feature learning methods.

## 2.3. PASCAL VOC

- The transferability of BiGAN representations to the PASCAL VOC is evaluated.
- Classification models are trained with various portions of the AlexNet model frozen.
- In the fc8 column, only the linear classifier (a multinomial logistic regression) is learned — in the case of BiGAN, on top of randomly initialized fully connected (FC) layers fc6 and fc7.
- In the fc6–8 column, all three FC layers are trained fully supervised with all convolution layers frozen.
- Finally, in the all column, the entire network is “fine-tuned”.
**BiGAN outperforms other unsupervised (unsup.) feature learning approaches**, including the GAN-based baselines, and despite its generality, is**competitive with contemporary self-supervised (self-sup.) feature learning approaches**specific to the visual domain.- (If interested, please read the paper for more details.)

## Reference

[2017 ICLR] [BiGAN]

Adversarial Feature Learning

## Generative Adversarial Network (GAN)

**Image Synthesis **[GAN] [CGAN] [LAPGAN] [AAE] [DCGAN] [CoGAN] [SimGAN] [BiGAN]**Image-to-image Translation **[Pix2Pix] [UNIT]**Super Resolution** [SRGAN & SRResNet] [EnhanceNet] [ESRGAN]**Blur Detection** [DMENet]**Camera Tampering Detection **[Mantini’s VISAPP’19]**Video Coding** [VC-LAPGAN] [Zhu TMM’20] [Zhong ELECGJ’21]