Reconstruct corrupted data using Denoising Autoencoder(Python code)

This article will help you demystify denoising using autoencoder in few minutes!!

Garima Nishad
Analytics Vidhya
6 min readAug 3, 2020

--

Autoencoders aren’t too useful in practice, but they can be used to denoise images quite successfully just by training the network on noisy images. We can generate noisy images by adding Gaussian noise to the training images, then clipping the values to be between 0 and 1.

“Denoising auto-encoder forces the hidden layer to extract more robust features and restrict it from merely learning the identity. Autoencoder reconstructs the input from a corrupted version of it.”

A denoising auto-encoder does two things:

  • Encode the input (preserve the information about the data)
  • Undo the effect of a corruption process stochastically applied to the input of the auto-encoder.

For the depiction of the denoising capabilities of Autoencoders, we’ll use noisy images as input and the original, clean images as targets.

Example: Top image is input, and the bottom image is the target.

Problem Statement:

Build the model for the denoising autoencoder. Add deeper and additional layers to the network. Using MNIST dataset, add noise to the data and try to define and train an autoencoder to denoise the images.

Solution:

Import Libraries and Load Dataset: Given below is the standard procedure to import the libraries and load the MNIST dataset.

import torch
import numpy as np
from torchvision import datasets
import torchvision.transforms as transforms
# convert data to torch.FloatTensor
transform = transforms.ToTensor()
# load the training and test datasets
train_data = datasets.MNIST(root='data', train=True, download=True, transform=transform)
test_data = datasets.MNIST(root='data', train=False, download=True, transform=transform)
# Create training and test dataloaders
num_workers = 0
# how many samples per batch to load
batch_size = 20
# prepare data loaders
train_loader = torch.utils.data.DataLoader(train_data, batch_size=batch_size, num_workers=num_workers)
test_loader = torch.utils.data.DataLoader(test_data, batch_size=batch_size, num_workers=num_workers)

Visualize the Data: You can use standard matplotlib library to view whether you’ve loaded your dataset correctly or not.

import matplotlib.pyplot as plt
%matplotlib inline

# obtain one batch of training images
dataiter = iter(train_loader)
images, labels = dataiter.next()
images = images.numpy()
# get one image from the batch
img = np.squeeze(images[0])
fig = plt.figure(figsize = (5,5))
ax = fig.add_subplot(111)
ax.imshow(img, cmap='gray')

The output should be something like this:

Network Architecture: The most crucial part is the network generation. It is because denoising is a hard problem for the network; hence we’ll need to use deeper convolutional layers here. It is recommended to start with a depth of 32 for the convolutional layers in the encoder, and the same depth going backwards through the decoder.

import torch.nn as nn
import torch.nn.functional as F
# define the NN architecture
class ConvDenoiser(nn.Module):
def __init__(self):
super(ConvDenoiser, self).__init__()
## encoder layers ##
# conv layer (depth from 1 --> 32), 3x3 kernels
self.conv1 = nn.Conv2d(1, 32, 3, padding=1)
# conv layer (depth from 32 --> 16), 3x3 kernels
self.conv2 = nn.Conv2d(32, 16, 3, padding=1)
# conv layer (depth from 16 --> 8), 3x3 kernels
self.conv3 = nn.Conv2d(16, 8, 3, padding=1)
# pooling layer to reduce x-y dims by two; kernel and stride of 2
self.pool = nn.MaxPool2d(2, 2)

## decoder layers ##
# transpose layer, a kernel of 2 and a stride of 2 will increase the spatial dims by 2
self.t_conv1 = nn.ConvTranspose2d(8, 8, 3, stride=2) # kernel_size=3 to get to a 7x7 image output
# two more transpose layers with a kernel of 2
self.t_conv2 = nn.ConvTranspose2d(8, 16, 2, stride=2)
self.t_conv3 = nn.ConvTranspose2d(16, 32, 2, stride=2)
# one, final, normal conv layer to decrease the depth
self.conv_out = nn.Conv2d(32, 1, 3, padding=1)
def forward(self, x):
## encode ##
# add hidden layers with relu activation function
# and maxpooling after
x = F.relu(self.conv1(x))
x = self.pool(x)
# add second hidden layer
x = F.relu(self.conv2(x))
x = self.pool(x)
# add third hidden layer
x = F.relu(self.conv3(x))
x = self.pool(x) # compressed representation

## decode ##
# add transpose conv layers, with relu activation function
x = F.relu(self.t_conv1(x))
x = F.relu(self.t_conv2(x))
x = F.relu(self.t_conv3(x))
# transpose again, output should have a sigmoid applied
x = F.sigmoid(self.conv_out(x))

return x
# initialize the NN
model = ConvDenoiser()
print(model)

Training: The training of the network takes significantly less time with GPU; hence I would recommend using one. Though here we are only concerned with the training images, which we can get from the train_loader.

# specify loss function
criterion = nn.MSELoss()
# specify loss function
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)
# number of epochs to train the model
n_epochs = 20
# for adding noise to images
noise_factor=0.5
for epoch in range(1, n_epochs+1):
# monitor training loss
train_loss = 0.0

###################
# train the model #
###################
for data in train_loader:
# _ stands in for labels, here
# no need to flatten images
images, _ = data

## add random noise to the input images
noisy_imgs = images + noise_factor * torch.randn(*images.shape)
# Clip the images to be between 0 and 1
noisy_imgs = np.clip(noisy_imgs, 0., 1.)

# clear the gradients of all optimized variables
optimizer.zero_grad()
## forward pass: compute predicted outputs by passing *noisy* images to the model
outputs = model(noisy_imgs)
# calculate the loss
# the "target" is still the original, not-noisy images
loss = criterion(outputs, images)
# backward pass: compute gradient of the loss with respect to model parameters
loss.backward()
# perform a single optimization step (parameter update)
optimizer.step()
# update running training loss
train_loss += loss.item()*images.size(0)

# print avg training statistics
train_loss = train_loss/len(train_loader)
print('Epoch: {} \tTraining Loss: {:.6f}'.format(
epoch,
train_loss
))

In this case, we are actually adding some noise to these images and we’ll feed these noisy_imgs to our model. The model will produce reconstructed images based on the noisy input. But, we want it to produce normal un-noisy images, and so, when we calculate the loss, we will still compare the reconstructed outputs to the original images!

Because we’re comparing pixel values in input and output images, it will be best to use a loss that is meant for a regression task. Regression is all about comparing quantities rather than probabilistic values. So, in this case, I’ll use MSELoss.

Results: Here let’s add noise to the test images and pass them through the autoencoder.

# obtain one batch of test images
dataiter = iter(test_loader)
images, labels = dataiter.next()
# add noise to the test images
noisy_imgs = images + noise_factor * torch.randn(*images.shape)
noisy_imgs = np.clip(noisy_imgs, 0., 1.)
# get sample outputs
output = model(noisy_imgs)
# prep images for display
noisy_imgs = noisy_imgs.numpy()
# output is resized into a batch of iages
output = output.view(batch_size, 1, 28, 28)
# use detach when it's an output that requires_grad
output = output.detach().numpy()
# plot the first ten input images and then reconstructed images
fig, axes = plt.subplots(nrows=2, ncols=10, sharex=True, sharey=True, figsize=(25,4))
# input images on top row, reconstructions on bottom
for noisy_imgs, row in zip([noisy_imgs, output], axes):
for img, ax in zip(noisy_imgs, row):
ax.imshow(np.squeeze(img), cmap='gray')
ax.get_xaxis().set_visible(False)
ax.get_yaxis().set_visible(False)

It does a surprisingly great job of removing the noise, even though it’s sometimes difficult to tell what the original number is.

Code: You can find this code on my Github: Denoising Autoencoder

Conclusion: In this article, we learnt how to code denoising autoencoder in python properly. We also learnt that denoising is a hard problem for the network, hence using deeper convolutional layers provide exceptionally accurate results.

Reference: I learnt this topic from “Udacity’s Secure and Private AI Scholarship Challenge Nanodegree Program.”

--

--

Garima Nishad
Analytics Vidhya

A Machine Learning Research scholar who loves to moonlight as a blogger.