Autoencoders with PyTorch

Shivang Ganjoo
5 min readMay 20, 2018

--

Auto Encoders are self supervised, a specific instance of supervised learning where the targets are generated from the input data.

“Autoencoding” is a data compression algorithm where the compression and decompression functions are 1) data-specific, 2) lossy, and 3) learned automatically from examples rather than engineered by a human. Additionally, in almost all contexts where the term “autoencoder” is used, the compression and decompression functions are implemented with neural networks.

Today two interesting practical applications of autoencoders are data denoising, and dimensionality reduction for data visualization. With appropriate dimensionality and sparsity constraints, autoencoders can learn data projections that are more interesting than PCA or other basic techniques.

For 2D visualization specifically, t-SNE (pronounced “tee-snee”) is probably the best algorithm around, but it typically requires relatively low-dimensional data. So a good strategy for visualizing similarity relationships in high-dimensional data is to start by using an autoencoder to compress your data into a low-dimensional space (e.g. 32 dimensional), then use t-SNE for mapping the compressed data to a 2D plane.

Note: A nice parametric implementation of t-SNE in Keras was developed by Kyle McDonald and is available on Github. Otherwise scikit-learn also has a simple and practical implementation.

Over Complete hidden Layers

When the number of hidden units >= input or output units then information just flows through like in a straight line and our network cheats us(overfitting) as it doesn’t do feature extraction.

Variations of AutoEncoders

Above problem is solved by the following variants and it gives them new functionality too.

Sparse Autoencoders

It uses regularisation by putting a penalty on the loss function. At any time an AutoEncoder can use only a limited units of the hidden layer

So features are getting extracted and thus the AutoEncoder cannot cheat(no overfitting)

Denoising Autoencoders

Randomly turn some of the units of the first hidden layers to zero. This is a stochastic AutoEncoder.

Stacked AutoEncoders

They can superseed the results of Deep Belief Networks and are made up of multiple encoding and decoding layers.

Deep AutoEncoders

These are Stacked Restricted Boltzman Machines.

Convolutional autoencoder

If our inputs are images, it makes sense to use convolutional neural networks (convnets) as encoders and decoders. In practical settings, autoencoders applied to images are always convolutional autoencoders — they simply perform much better.

Sequence-to-sequence autoencoder

If you inputs are sequences, rather than vectors or 2D images, then you may want to use as encoder and decoder a type of model that can capture temporal structure, such as a LSTM. To build a LSTM-based autoencoder, first use a LSTM encoder to turn your input sequences into a single vector that contains information about the entire sequence, then repeat this vector n times (where n is the number of timesteps in the output sequence), and run a LSTM decoder to turn this constant sequence into the target sequence.

Variational autoencoder (VAE)

Variational autoencoders are a slightly more modern and interesting take on autoencoding.It’s a type of autoencoder with added constraints on the encoded representations being learned. More precisely, it is an autoencoder that learns a latent variable model for its input data. So instead of letting your neural network learn an arbitrary function, you are learning the parameters of a probability distribution modeling your data. If you sample points from this distribution, you can generate new input data samples: a VAE is a “generative model”.

Training a Recommendation System Model

Why use PyTorch?

A network written in PyTorch is a Dynamic Computational Graph (DCG). It allows you to do any crazy thing you want to do.

  1. Dynamic data structures inside the network. You can have any number of inputs at any given point of training in PyTorch. Lists? Stacks? No problem.
  2. Networks are modular. Each part is implemented separately, and you can debug it separately, unlike a monolithic TF construction.

Data Wrangling

The data was collected from MovieLens and it has 1 million movie ratings.

Pre-processing

For training purposes a 2-D matrix has to be made with users as rows and movies as columns. The train and test matrices have to be converted into torch tensors. Zero is filled in the matrices where the user didn’t rate the movie.

Activation Functions

Most common activation function used for Autoencoders is tanh but for this project I found sigmoid to be the best activation function among sigmoid, Leaky ReLU, ReLU and tanh.

Architecture

It is a Stacked Autoencoder with 2 encoding and 2 decoding layers. I experimented with a number of units for different layers. The most accurate layer-wise solution for this model is input features, 20, 10, 20 and output(same as the number of input features).

MSE Loss is the most suitable loss for this task.

I tried a few optimizers like SGD, Adam, AdaDelta and RMSprop. RMSprop gave the best results for this model.

The model has been trained for 200 epochs and a mean corrector constant has been used for finding training and testing losses. This constant is used for taking in account the number of movies where user didn’t rate the movies.

Results

A validation loss of 0.8852 and a test loss of 0.9602 was obtained meaning that this model can predict how much a movie will be liked by a particular person within an error of less than one rating. This is a very good result.

Source Code

My other repositories can be found at https://github.com/lightsalsa251

Additional Readings

  1. Deep Learning Tutorial — Sparse AutoEncoder by Chris McCormick 2014
  2. Deep Learning — Sparse AutoEncoder by Erik Wilkinson 2014
  3. k-Sparse AutoEncoder by Alireza 2014
  4. Reducing the dimensionality of data with Deep AutoEncoders by Geoffrey Hinton
  5. Contractive AutoEncoders by Salah Rifai
  6. Pascal Vincent -> Extracting and Composing Robust Features with Denoising AutoEncoders
  7. Stacked Denoising AutoEncoders by Pascal Vincent

--

--