Adversarial Autoencoders on MNIST dataset Python Keras Implementation

Jan 23 · 6 min read

You can find the source code of this post at https://github.com/alimirzaei/adverserial-autoencoder-keras

In this post, I implemented three parts of the Adversarial Autoencoder paper [1]. We can assume the idea of AAE as a combination of Generative Adversarial Networks(GANs) idea and Variational Autoencoders idea. Variational Autoencoders are generative autoencoders which in addition to reconstruction error it tries to minimize the KL-Divergence between the distribution of latent codes and the desired distribution function (in most cases Gaussian). After the training phase, a sample can be generated with a sampling from desired distribution and feeding it to the decoder part.

Generative Adversarial Networks (GANs) are deep neural net architectures comprised of two nets, pitting one against the other (thus the “adversarial”). The generator network tries to generate fake images to fool discriminator and discriminator tries to discriminate fake and real images correctly. GAN was introduced in a paper by Ian Goodfellow and other researchers at the University of Montreal, including Yoshua Bengio, in 2014. By this scheme, the generator learns to generate sample based on training data distribution.

Adversarial Autoencoders (AAE) works like Variational Autoencoder but instead of minimizing the KL-divergence between latent codes distribution and the desired distribution it uses a discriminator to discriminate latent codes and samples from the desired distribution. Using this scheme encoder learns to generate samples which are similar to the desired distribution. For generating a new sample you only need to sample from the desired distribution and feed it to the decoder. The scheme of AAE is shown in the following figure:

AAE Scheme [1]

Adversarial Autoencoder

In this section, I implemented the above figure. The desired distribution for latent space is assumed Gaussian. In all implementations in this post, I used Python as the programming language and Keras as the deep learning framework.

I implement the AAE scheme to generate MNIST images. The MNIST dataset contains 60,000 handwritten number image and each image dimension is 28x28. So the number of input feature will be 28x28 = 784

The Encoder

As paper suggested we used two fully-connected (Each layer has 1000 neurons) layers as the hidden layers of the encoder and an 8 neuron fully-connected layer as an output layer of the encoder. For hidden layers, the Relu activation function is used and the output layer does not have any activation function (linear). The below table shows the details of the encoder.

________________________________________________________
Layer (type) Output Shape Param #
========================================================
flatten_1 (Flatten) (None, 7) 0
________________________________________________________
dense_1 (Dense) (None, 1000) 785000
________________________________________________________
dense_2 (Dense) (None, 1000) 1001000
________________________________________________________
dense_3 (Dense) (None, 8) 8008
========================================================
Total params: 1,794,008
Trainable params: 1,794,008
Non-trainable params: 0
________________________________________________________

The Decoder

For the decoder, I used the same architecture of the encoder. For the output layer, we used the sigmoid function as the activation function. The following table shows the detail of the decoder.

_______________________________________________________
Layer (type) Output Shape Param #
=======================================================
dense_4 (Dense) (None, 1000) 9000
_______________________________________________________
dense_5 (Dense) (None, 1000) 1001000
_______________________________________________________
dense_6 (Dense) (None, 784) 784784
_______________________________________________________
reshape_1 (Reshape) (None, 28, 28) 0
=======================================================
Total params: 1,794,784
Trainable params: 1,794,784
Non-trainable params: 0
_______________________________________________________

The Discriminator

The discriminator role is to classify fake and read latent codes, so the output is one neuron. The detailed arch of the discriminator is illustrated at the following table. The activation function for two hidden layers are Relu and for the output layer is sigmoid.

_____________________________________________________
Layer (type) Output Shape Param #
=====================================================
dense_7 (Dense) (None, 1000) 9000
_____________________________________________________
dense_8 (Dense) (None, 1000) 1001000
_____________________________________________________
dense_9 (Dense) (None, 1) 1001
=====================================================
Total params: 1,011,001
Trainable params: 1,011,001
Non-trainable params: 0
\end{lstlisting}

Training

I trained the network with a batch size of 100. For each batch the following procedures are done:

1- Train Discriminator: We feed 50 training images to the encoder and assume the obtained latent codes as fake ones (label=0). We also generate 50 samples from desired distribution, the 8-D Gaussian distribution, and assume them as real ones (label=1). After generating these latent codes we train discriminator with these samples and their corresponding labels. The network will be train based on classification error.
2- Train Autoencoder for Reconstruction Error: The 100 sample of training images are feed to autoencoder (encoder and decoder) and the autoencoder will be trained based on reconstruction error (MSE).
3- Train Generator (Encoder): At this phase, we have to train generator (encoder) to generate latent codes the same as sampled ones. In other words, the encoder should be trained such that it fools discriminator. For this aim, we freeze the discriminator weights and train encoder and discriminator together such that discriminator classifies the latent codes of feed images as real ones (label=1).

Results

The following figures show the generated images after 1000 and 4000 epochs. As shown images are sharp and not blur like Variational Autoencoder. The SGD is used for discriminator and generator with learning rate $0.01$ and ADAM with a learning rate of 0.001 for reconstruction phase.

The generated images after 1000 epochs
The generated images after 4000 epochs

Incorporating Label Information in the Adversarial Regularization

The previous section is completely unsupervised. In the scenarios where data is labeled, we can incorporate the label information in the adversarial training stage to better shape the distribution of the hidden code. The proposed scheme is shown in the following figure. In this scheme is tried to map the latent codes of each number to a specific Gaussian distribution. In addition, the one-hot code of the label is fed to the discriminator. In our implementation, we used a mixture of 10 Gaussian distribution. We trained this scheme in a semi-supervise manner. for this purpose, an extra dimension is assumed for one-hot encoder (11 dimensions). If a label of a sample is not provided the 11'th element of the code is one and the generated sample are sampled from the whole mixture of Gaussian.

AAE scheme for using the label information [1]

Implementation & Results

I trained semi-supervised AAE using 40000 labeled sample and 20000 unlabeled samples. The details of the architecture of the network is the same as the previous one. The conditional generated samples are shown in the following image:

Sample conditional generated images after 3500 epochs

and the latent codes of some test images are plotted in the following figure. The details of implementation are accessible in the source code.

The latent codes for test images after 3500 epochs

Supervised Adversarial Autoencoder

This section focuses on the fully supervised scenarios and discusses the architecture of adversarial autoencoders that can separate the class label information from the image style information.
In order to incorporate the label information, the paper alters the network architecture of the previous section to provide a one-hot vector encoding of the label to the decoder (The following figure). The decoder utilizes both the one-hot vector identifying the label and the hidden code z to reconstruct the image. This architecture forces the network to retain all information independent of the label in the hidden code z.

SAAE scheme [1]

Implementation & Results

I implement the same architecture of the above figure. The output of epoch 1000 is shown in the following figure. In this picture, each row belongs to the same number and each column belongs to the same style.

Conclusion

In this project, I implemented three scheme from AAE: Original AAE, Semi-Supervised and Supervised. The details of implementation are given in source codes. The optimization algorithms and their learning rates are chosen such that the networks converge correctly.

Source Code

https://github.com/alimirzaei/adverserial-autoencoder-keras

Bibliography

[1] Makhzani, Alireza, et al. “Adversarial autoencoders.” arXiv preprint arXiv:1511.05644 (2015).

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade