Understanding autoencoders

Published in

Patrick’s notes

5 min readJan 2, 2022

*Figure 1: There is always amazement to be found in the complexity*

Autoencoders are an unsupervised learning technique used across a range of real-life applications such as dimensionality reduction, feature extraction and outlier detection. In terms of prerequisites, while this article is not particularly technical, it does require a sound understanding of neural networks.

So, what are autoencoders?

*Figure 2: autoencoder model architecture (image based on* *https://www.jeremyjordan.me/autoencoders/)*

Autoencoders are a variant of neural networks, where the input and output have the same number of neurons so that we expect the input and output data to be the same. However, in the design of the neural network architecture, there is imposed a bottleneck which forces the autoencoder to copy only input that resembles the training data.

If the input features are independent of each other, then an autoencoder approach is often challenging, but if there is a structure within the data then this can be learned and represented through the neural networks bottleneck. Historically, autoencoders have been used for dimensionality reduction or feature extraction but have been used more recently for generative modelling as well.

We can think of the network as having two key components used to generate the output layer.

How are autoencoders trained?

Autoencoders are typically trained using feedforward networks and can be trained with all the usual techniques. The learning process can simply be described as minimizing a loss function:

Undercomplete autoencoder

As shown in figure 2, an undercomplete autoencoder simply has an architecture that forces a compressed representation of the input data to be learned. While the example in figure 2 involves just one hidden layer, this is not required with deep encoders offering many advantages. Depth can exponentially reduce the computational cost of representing some functions, in addition it reduces the training data required to learn something. Figure 3 below gives an example with multiple layers.

*Figure 3: autoencoder model architecture with multiple hidden layers (image based on* *https://www.jeremyjordan.me/autoencoders/)*

One further advantage of autoencoders is that the neural network architecture means that they can learn non-linear manifolds (continuous non-intersecting function), while principal component analysis learns a hyperplane to represent the input data in lower dimensionality.

The ideal autoencoder?

Unfortunately, the ideal autoencoder needs to balance two key features:

1. If the bottleneck is allowed too much capacity then the autoencoder simply copies the data and does not extract the useful information in the bottleneck.

2. If the bottleneck is too restrictive then the output has no representation that closely matches the input.

Regularized autoencoders

Regularized encoders look to solve the problem and proposes a solution where any autoencoder architecture can be trained successfully. This solution instead proposes that the code dimension and the capacity of the encoder and decoder are affected by the complexity of distribution to be modelled. As a result, we instead model a function based on two terms, the first involves minimising the loss function as usual and the second is the introduction of a penalty function that penalizes neurons activated. We can summarise this below:

The most standard method is L1 regularization that penalizes the absolute value of the vector of activations a in layer h for observation i. This can be fully summarised as:

The other sparsity autoencoder method is known as KL-Divergence but is not covered in this article.

Denoising autoencoders

Denoising autoencoder represents a modification of the basic autoencoder, where the input is partially corrupted by adding noise to it. So why do we do this? To repair the partially destroyed input, the denoising autoencoder has to discover and capture relationships between dimensions of input in order to infer missing pieces. The noise is controlled by a stochastic mapping.

A denoising autoencoder can be summarised as follows:

As the input data has been corrupted, the autoencoder cannot just learn a direct mapping from input to output. The autoencoder instead has to learn a lower dimension manifold of the input data, which an output can then be built from. If this manifold accurately describes the input data then the noise has been cancelled out.

N.B. There are other autoencoder methods such as contractive autoencoders and variational autoencoders but they have been left out of this article for now.

Conclusion

This article provides an understanding of autoencoders and how they can be designed. While, there are many different types of autoencoders, many of which have not been touched on in this article, the key when building these solutions is to actually force a meaningful representation of the original input. This is why exploring across multiple autoencoders is so crucial and also why it is unlikely that the autoencoder model built is likely to be only capable of reconstructing data similar to the class of observations done when the model is originally built.

References

https://www.deeplearningbook.org/contents/autoencoders.html

https://lilianweng.github.io/lil-log/2018/08/12/from-autoencoder-to-beta-vae.html

https://medium.com/@venkatakrishna.jonnalagadda/sparse-stacked-and-variational-autoencoder-efe5bfe73b64

https://www.jeremyjordan.me/autoencoders/

https://arxiv.org/abs/1211.4246