Review — BiGAN: Adversarial Feature Learning (GAN)
Bidirectional Generative Adversarial Networks (BiGANs): Learning the Inverse Mapping, from Image Space to Latent Space
In this story, Adversarial Feature Learning, (BiGAN), by University of California, and University of Texas, is briefly reviewed. In this paper:
- Bidirectional Generative Adversarial Network (BiGAN) is designed as a means of learning the inverse mapping, i.e. projecting data back into the latent space.
- This resulting learned feature representation is useful for auxiliary supervised discrimination tasks.
This is a paper in 2017 ICLR with over 1100 citations. (Sik-Ho Tsang @ Medium)
- BiGAN: Overall Structure
- Experimental Results
1. BiGAN: Overall Structure
- In addition to the generator G from the standard GAN, BiGAN includes an encoder E which maps data x to latent representations z.
- The BiGAN discriminator D discriminates not only in data space (x versus G(z)), but jointly in data and latent space (tuples(x, E(x)) versus (G(z), z)), where the latent component is either an encoder output E(x) or a generator input z.
- The data space is flatten as a vector and concatenated with the latent space vector, then input into the discriminator D.
- In this context, a latent representation z may be thought of as a “label” for x, but one which came for “free,” without the need for supervision.
- The BiGAN training objective is defined as a minimax objective:
- The same alternating gradient based optimization as GAN is used.
- In one iteration, the discriminator parameters θD are updated by taking one or more steps in the positive gradient direction.
- Then, the encoder parameters θE and generator parameters θG are together updated by taking a step in the negative gradient direction.
A model trained to predict features z given data x should learn useful semantic representations. BiGAN objective forces the encoder E to do exactly this.
In order to fool the discriminator at a particular z, the encoder must invert the generator at that z, such that E(G(z)) = z.
- At that moment, generating high resolution images remains difficult for generative models. Thus, the encoder may take higher resolution input while the generator output and discriminator input remain low resolution.
2. Experimental Results
- BiGAN is trained by first training them unsupervisely, then transferring the encoder’s learned feature representations for use in auxiliary supervised learning tasks.
2.1. Permutation-Invariant MNIST
- Each 28×28 digit image must be treated as an unstructured 784D vector.
- The latent vector z is 50D vector.
All methods, including BiGAN, perform at roughly the same level. This result is not overly surprising given the relative simplicity of MNIST digits.
Digits generated by the generator G in nearly perfectly match the data distribution (qualitatively), as shown above.
- The encoder E architecture follows AlexNet through the fifth and last convolution layer (conv5), with local response normalization (LRN) layers removed and batch normalization with leaky ReLU non-linearity applied to the output of each convolution at unsupervised training time.
- The encoder input images have the size of 112×112 or 64×64.
- The latent vector is 200D vector.
As shown above, the reconstructions, while certainly imperfect, demonstrate empirically that the BiGAN encoder E and generator G learn approximate inverse mappings.
- The above evaluation is performed with various portions of the network frozen, or reinitialized and trained from scratch.
- e.g., in the conv3 column, the first three layers – conv1 through conv3 — are transferred and frozen, and the last layers — conv4, conv5, and fully connected layers — are reinitialized and trained fully supervised for ImageNet classification.
BiGAN is competitive with these contemporary visual feature learning methods.
2.3. PASCAL VOC
- The transferability of BiGAN representations to the PASCAL VOC is evaluated.
- Classification models are trained with various portions of the AlexNet model frozen.
- In the fc8 column, only the linear classifier (a multinomial logistic regression) is learned — in the case of BiGAN, on top of randomly initialized fully connected (FC) layers fc6 and fc7.
- In the fc6–8 column, all three FC layers are trained fully supervised with all convolution layers frozen.
- Finally, in the all column, the entire network is “fine-tuned”.
- BiGAN outperforms other unsupervised (unsup.) feature learning approaches, including the GAN-based baselines, and despite its generality, is competitive with contemporary self-supervised (self-sup.) feature learning approaches specific to the visual domain.
- (If interested, please read the paper for more details.)
[2017 ICLR] [BiGAN]
Adversarial Feature Learning
Generative Adversarial Network (GAN)
Image Synthesis [GAN] [CGAN] [LAPGAN] [AAE] [DCGAN] [CoGAN] [SimGAN] [BiGAN]
Image-to-image Translation [Pix2Pix] [UNIT]
Super Resolution [SRGAN & SRResNet] [EnhanceNet] [ESRGAN]
Blur Detection [DMENet]
Camera Tampering Detection [Mantini’s VISAPP’19]
Video Coding [VC-LAPGAN] [Zhu TMM’20] [Zhong ELECGJ’21]