# Review — AAE: Adversarial Autoencoders (GAN)

**GAN**** **Combined **With Autoencoder**

--

In this story, **Adversarial Autoencoders**, (AAE), by University of Toronto, Google Brain, and OpenAI, is briefly reviewed. Only the AAE variants are described. This is a paper by Ian Goodfellow, who is also the first author of GAN. In this paper:

- AAE is
**a probabilistic autoencoder that uses****GAN****.** **The decoder**of the adversarial autoencoder**learns a deep generative model that maps the imposed prior to the data distribution.**

This is a paper in **2016 ICLR **with over **1600 citations**. (Sik-Ho Tsang @ Medium)

# Outline

**AAE: Network Architecture****AAE vs VAE****Supervised AAE****Semi-supervised AAE****Unsupervised AAE****Dimension Reduction for Data Visualization Using AAE**

**1. AAE: Network Architecture**

**The top row**is a standard**autoencoder**that**reconstructs an image***x*from a latent code*z*.**The bottom row**diagrams a second network trained to discriminatively**predict whether a sample arises from the hidden code of the autoencoder or from a sampled distribution specified by the user.**- Let
*p*(*z*) be the prior distribution we want to impose on the codes,*q*(*z*|*x*) be an encoding distribution and*p*(*x*|*z*) be the decoding distribution. - Also let
*pd*(*x*) be the data distribution, and*p*(*x*) be the model distribution. The encoding function of the autoencoder*q*(*z*|*x*) defines an aggregated posterior distribution of*q*(*z*) on the hidden code vector of the autoencoder as:

- It is the adversarial network that guides
*q*(*z*) to match*p*(*z*).

The autoencoder attempts to minimize the reconstruction error.The generator of the adversarial network is also

the encoder of the autoencoderThe encoder ensures the aggregated posterior distribution canq(z|x).fool the discriminative adversarial network into thinking that the hidden code q(z) comes from the true prior distribution p(z).

- Both, the adversarial network and the autoencoder are
**trained jointly with SGD**in two phases: the reconstruction phase and the regularization phase. - The reconstruction phase trains the autoencoder.
- The regularization phase trains the GAN.

# 2. AAE vs VAE

- The hidden code
*z*of the hold-out images fits to (A/C) a 2-D Gaussian and (B/D) a mixture of 10 2-D Gaussians. **A**: The learned manifold by**AAE**exhibits**sharp transitions**indicating that the coding space is filled and exhibits no “holes”.**C**:**VAE**roughly matches the shape of a 2-D Gaussian distribution. However,**no data points map to several local regions**of the coding space indicating that the VAE may not have captured the data manifold as well as the AAE.**B**: AAE successfully matched the aggregated posterior with the prior distribution.**D**: In contrast, the VAE exhibit systematic differences from the mixture 10 Gaussians.

# 3. Supervised AAE

- Before going into semi-supervised AAE, supervised AAE is tried where
**the architecture separates the class label information from the image style information.** - The decoder utilizes both the one-hot vector identifying the label and the hidden code
*z*to reconstruct the image.

This architecture forces the network to retain all information independent of the label in the hidden code

z.

# 4. Semi-Supervised AAE

- The supervised AAE is further modified as above semi-supervised AAE.

The inference network of the AAE predicts both the discrete class variable

yand the continuous latent variablezusing the encoderq(z,y|x).

- The first adversarial network imposes a Categorical distribution on the label representation. This adversarial network ensures that the latent class variable
*y*does not carry any style information. - The second adversarial network imposes a Gaussian distribution on the style representation which ensures the latent variable
*z*is a continuous Gaussian variable. - Both of the adversarial networks as well as the autoencoder are trained jointly with SGD in
**three phases**— the reconstruction phase, regularization phase and the semi-supervised classification phase. - In the
**reconstruction phase**, the autoencoder**updates the encoder**to*q*(*z*,*y*|*x*) and the decoder - In the
**regularization phase**, each of the adversarial networks first**updates their discriminative network**to tell apart the true samples from the generated samples. - The adversarial networks then
**update their generator**to confuse their discriminative networks. - In the
**semi-supervised classification phase**, the autoencoder updatesto*q*(*y*|*x*)**minimize the cross-entropy cost**on a labeled mini-batch.

- It is worth mentioning that
**all the AAE models are trained end-to-end**, whereas the semi-supervised VAE models have to be trained one layer at a time.

On the MNIST dataset with 100 and 1000 labels, the performance of AAEs is significantly better than VAEs.

# 5. Unsupervised AAE

The architecture is the semi-supervised AAE, with the difference that the

semi-supervised classification stage is removedand thus no longer train the network on any labeled mini-batch.

- As seen above, the digit 1s and 6s that are tilted (cluster 16 and 11) are put in a separate cluster than the straight 1s and 6s (cluster 15 and 10).

- Once the training is done, for each cluster
*i*, the major correct label is assigned to all the points in the cluster*i*. Then the test error can be estimated based on the assigned class labels to each cluster.

As shown in the above table, the AAE achieves the classification error rate of 9.55% and 4.10% with 16 and 30 total labels respectively.

# 6. **Dimension Reduction for Data Visualization Using AAE**

The finalis constructed by firstndimensional representationmapping the one-hot label representation to anandndimensional cluster head representationthen adding the result to anndimensional style representation.

*n*=2 or 3 for data visualization.- The cluster heads are learned by SGD with an additional cost function that penalizes the Euclidean distance between of every two of them.

- (There are details for this part, please feel free to read the paper directly.)
- Overall, we can see that AAE can achieve a
**clean****separation of the digit clusters.**

This paper is an early paper for GAN using autoencoders. The main goal of using VAE in this paper is to have semi-supervised or unsupervised learning, rather than purely image-to-image translation or synthesizing images using latent vectors.

## Reference

[2016 ICLR] [AAE]

Adversarial Autoencoders

## Generative Adversarial Network (GAN)

**Image Synthesis** [GAN] [CGAN] [LAPGAN] [AAE] [DCGAN] [CoGAN] [SimGAN]**Image-to-image Translation **[Pix2Pix] [UNIT]**Super Resolution** [SRGAN & SRResNet] [EnhanceNet] [ESRGAN]**Blur Detection** [DMENet]**Camera Tampering Detection **[Mantini’s VISAPP’19]**Video Coding** [VC-LAPGAN] [Zhu TMM’20] [Zhong ELECGJ’21]