Creating an Autoencoder with PyTorch

Samrat Sahoo
Analytics Vidhya
Published in
5 min readNov 1, 2020
Autoencoder Architecture

Autoencoders are fundamental to creating simpler representations of a more complex piece of data. They use a famous encoder-decoder architecture that allows for the network to grab key features of the piece of data. If you are new to autoencoders and would like to learn more, I would reccommend reading this well written article over auto encoders: https://towardsdatascience.com/applied-deep-learning-part-3-autoencoders-1c083af4d798

In this article we will be implementing an autoencoder and using PyTorch and then applying the autoencoder to an image from the MNIST Dataset.

Imports

For this project, you will need one in-built Python library:

import os

You will also need the following technical libraries:

import numpy as np                       
import torch
import torchvision
from torch import nn
from torch.autograd import Variable
from torchvision.datasets import MNIST
from torchvision.transforms import transforms from torchvision.utils import save_image import matplotlib.pyplot as plt

Autoencoder Class __init__

For the autoencoder class, we will extend the nn.Module class and have the following heading:

class Autoencoder(nn.Module):

__init__ method header

For the init, we will have parameters of the amount of epochs we want to train, the batch size for the data, and the learning rate. The method header should look like this:

def __init__(self, epochs=100, batchSize=128, learningRate=1e-3):

We will then want to call the super method:

super(Autoencoder, self).__init__()

Initializing Network Parameters

For this network, we only need to initialize the epochs, batch size, and learning rate:

self.epochs = epochs                               
self.batchSize = batchSize self.learningRate = learningRate

Encoder Network Architecture

The encoder network architecture will all be stationed within the init method for modularity purposes. For the encoder, we will have 4 linear layers all with decreasing node amounts in each layer. We will also use 3 ReLU activation functions. This in mind, our encoder network will look something like this:

self.encoder = nn.Sequential(nn.Linear(784, 128),                                                            nn.ReLU(True),                                                            nn.Linear(128, 64),                                                            nn.ReLU(True),                                                            nn.Linear(64, 12),                                                            nn.ReLU(True),                                                            nn.Linear(12, 3))

Decoder Network Architecture

The decoder network architecture will also be stationed within the init method. For the decoder, we will use a very similar architecture with 4 linear layers which have increasing node amounts in each layer. We will also use 3 ReLU activation functions as well has 1 tanh activation function. This in mind, our decoder network will look something like this:

self.decoder = nn.Sequential(nn.Linear(3, 12),                                                            nn.ReLU(True),                                                            nn.Linear(12, 64),                                                            nn.ReLU(True),                                                            nn.Linear(64, 128),                                                            nn.ReLU(True),                                                            nn.Linear(128, 784),                                                            nn.Tanh())

Data and Data Loaders

Our data and data loaders for our training data will be held within the init method. We will also normalize and convert the images to tensors using a transformer from the PyTorch library.

self.imageTransforms = transforms.Compose([ transforms.ToTensor(),                                   transforms.Normalize([0.5], [0.5]) 

])

self.data = MNIST('./Data', transform=self.imageTransforms)
self.dataLoader = torch.utils.data.DataLoader(dataset=self.data, batch_size=self.batchSize, shuffle=True)

Optimizer and Criterion

For this network, we will use an Adams Optimizer along with an MSE Loss for our loss function.

self.optimizer = torch.optim.Adam(self.parameters(), lr=self.learningRate, weight_decay=1e-5)self.criterion = nn.MSELoss()

Complete Autoencoder __init__

The complete autoencoder init method can be defined as follows

Forward Method

The forward method will take an numerically represented image via an array, x, and feed it through the encoder and decoder networks. It can very simply be defined as:

Train Model Method

For this method, we will have the following method header:

def trainModel(self):

We will then want to repeat the training process depending on the amount of epochs:

for epoch in range(self.epochs):

Then we will need to iterate through the data in the data loader using:

for data in self.dataLoader:

We will need to initialize the image data to a variable and process it using:

image, _ = data                                       
image = image.view(image.size(0), -1) image = Variable(image)

Finally, we will need to output predictions, calculate the loss based on our criterion, and use back propagation. This can very simply be done through:

# Predict
output = self(image)

# Loss
loss = self.criterion(output, image)
# Back propagation self.optimizer.zero_grad() loss.backward() self.optimizer.step()

We can then print the loss and epoch the training process is on using:

print('epoch [{}/{}], loss:{:.4f}'                                         .format(epoch + 1, self.epochs, loss.data))

The complete training method would look something like this:

Test an Image Method

Finally, we can use our newly created network to test whether our autoencoder actually works. We can write this method to use a sample image from our data to view the results:

Main Method

For the main method, we would first need to initialize an autoencoder:

model = Autoencoder()

We would then need to train the network:

model.trainModel()

Then we would need to create a new tensor that is the output of the network based on a random image from MNIST. We will also need to reshape the image so we can view the output of it. For the sake of simplicity, the index I will use is 7777.

tensor = model.testImage(7777)
tensor = torch.reshape(tensor, (28, 28))

We will then need to create a toImage object which we can then pass the tensor through so we can actually view the image. We can also save the image afterward:

toImage = torchvision.transforms.ToPILImage()
image = toImage(tensor)
image.save('After.png')

Our complete main method should look like:

Results of Autoencoder

Our before image looked something like this:

After we applied the autoencoder, our image looked something like this:

As you can see all of the key features of the 8 have been extracted and now it is a simpler representation of the original 8 so it is safe to say the autoencoder worked pretty well! My complete code can be found on Github

If you enjoyed this or found this helpful, I would appreciate it if you could give it a clap and give me a follow! Thank you for reading!

--

--

Samrat Sahoo
Analytics Vidhya

comp sci @ georgia tech 🐝 • formerly @ roboflow, cruise • I occasionally write: samratsahoo.com