Building an Autoencoder with Tied Weights in Keras

5 min readDec 18, 2019

What this tutorial covers

(1) Brief theory of autoencoders
(2) Interest of tying weights
(3) Keras implementation of an autoencoder with parameter sharing

Definition of autoencoders

Autoencoders are artificial neural networks which can learn the important features in a dataset in an unsupervised manner. The goal of an autoencoder is essentially to learn an efficient way to reconstruct its input dataset, c.a. to learn to copy its inputs to its outputs. Main applications of autoencoders include denoising, dimensionality reduction, pretraining and generating data.

An autoencoder is composed of an encoder, which converts the input data into a latent representation (bottleneck layer), and a decoder, which converts the latent representation into outputs (reconstructions).

A latent representation of a dataset is a simpler, more compact way to express it. Consider the following series: 1, 2, 4, 9, 16, 25, 36, 49, 64, 81, 100, … Are these numbers difficult to memorize? No; we notice a pattern, namely that each term is determined by n**2, and this facilitates the memorization. Each neuron in the latent representation layer of an autoencoder can be used to store such a pattern.

Both the encoder and the decoder can comprise many layers of neurons, while the latent representation is typically one layer only. The overall architecture is similar to the multilayer perceptron, with the particularity that the output layer size must match the one of the input layer.

Importance of bottleneck size

The latent representation layer size determines how much information an autoencoder can keep. It is usually significantly smaller than the input data. Restricting its size forces the autoencoder to find patterns in the inputs and to eliminate unimportant features.

The optimal number of neurons for the latent representation can be determined empirically. If it is too big, for example the same size as the inputs, the autoencoder will lack the incentive to find patterns in the dataset as it will be able to essentially learn everything by heart. Such an autoencoder would not be useful. On the contrary, if the latent representation layer size is too small, the autoencoder will not be able to memorize enough patterns, resulting in reconstructions of lower quality (bad copies of the inputs).

Tying weights 101

An autoencoder with tied weights has decoder weights that are the transpose of the encoder weights; this is a form of parameter sharing, which reduces the number of parameters of the model.

Advantages of tying weights include increased training speed and reduced risk of overfitting, while yielding comparable performance than without weight tying in many cases (Li et al. (2019)). It is therefore a common practice to tie weights when building a symmetrical autoencoder.

Keras implementation of a tied-weights autoencoder

Implementing autoencoders in Keras is a very straightforward task. It gets more complicated for tied weights autoencoders, as the built-in classes forbid parameter sharing. A simple solution to this problem is to create a custom layer class. We will use this approach to build a stacked autoencoder with tied weights trained with the MNIST dataset.

The environment used for this tutorial is a Google Colaboratory Python 3 notebook with a GPU hardware accelerator. Tensorflow version is 2.0.0.

Let’s start! We load our dataset and visualize a few examples. The MNIST dataset consist of grayscale images that are 28 by 28 pixels.

Note: The grayscale numbers appear colored as we are using a colormap from Matplotlib.

We normalize our data to make the training faster. We create functions that we will use to plot the reconstructions.

Source for the show_reconstructions function: Géron (2019)

Now comes the fun part! We create a custom dense layer class which will have the transpose of the weights of the dense layer we pass it as an argument.

We then define the tied weights autoencoder model using Keras functional API. We name our layers so that we can pass them as an argument to our DenseTranspose class that we just created. As we are using dense layers, we flatten our 2D images that we input. We define the layers for the encoder, the latent representation and the encoder. We reshape our data at the end so that the outputs size matches the inputs size and we finally instantiate the model.

We use a 4 times smaller latent representation (196 neurons) than the input data size (28*28=784), but for this simple task we could have most likely decreased the layer size even more. However, optimizing the autoencoder was not the point of this tutorial.

We fit our model. Due to its simplicity, the training process is fast.

We then call our show_reconstructions function to compare the reconstructions with their respective inputs.

If a latent representation allows a good reconstruction of its inputs, it generally means that it has retained most of the information present in these inputs (Vincent et al. (2010)). Here, the input images (top) match nicely the reconstructions (bottom).

This concludes the tutorial. Thanks for reading!

Reference

P. Li, P.-M. Nguyen, On Random Deep Weight-Tied Autoencoders: Exact Asymptotic Analysis, Phase Transitions, and Implications to Training. ICLR, 2019.
A. Géron, Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools and Techniques to Build Intelligent Systems. 2nd edition, O’REILLY, September 2019, Chapter 17.
P. Vincent et al., Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion. Journal of Machine Learning Research 11 (2010) 3371–3408.