The Theoretical Basis of Autoencoders (Part I)

James Moody
4 min readMay 4, 2020

This article is targeted towards a mathematical audience. We aim to explain why autoencoders work, and prove rigorously that for certain classes of input data, autoencoders converge to an optimal representation. We’ll try to give necessary background information where feasible, but having already a solid understanding of linear regression and statistics will be helpful here.

The basic structure of an autoencoder is an “encoder” network chained before a “decoder” network, stuck together in the middle like two funnels glued together on their narrow ends. The basic idea is that the encoder network takes high-dimensional input data, and produces a low-dimensional “code” or “latent representation”. The decoder network then takes this low-dimensional code / latent representation and tries to reconstruct, as best as it can, the original input data.

A schematic for an autoencoder network.

The heuristic idea is that by training this network to minimize the error between the input and output, we are simultaneously training the encoder half of the network to isolate only the most important information from the input data and condense it into an efficient representation of the input, and training the decoder half of the network to decode this efficient representation of the input.

One way to view what is happening is that the autoencoder is learning how to compress

--

--