Denoising Autoencoder on Colored Images Using Tensorflow

Xingyu Bian
Analytics Vidhya
Published in
5 min readDec 9, 2019
Image before and after using the denoising autoencoder

In this article, I will build an autoencoder to remove noises from colored images. Most articles use grayscale instead of RGB, I want to do something different.

To program this, we need to understand how autoencoders work. An autoencoder is a type of neural network that aims to copy the original input in an unsupervised manner. It consists of two parts: the encoder and the decoder. The encoder takes the input and compresses it into a latent space representation. The decoder, on the other hand, tries to recreate the original input from the latent space.

Structure of an autoencoder. Source: https://i-systems.github.io/HSE545/machine%20learning%20all/KIMM/image_files/AE_arch2.png

Nathan Hubens wrote a great article on autoencoders: https://towardsdatascience.com/deep-inside-autoencoders-7e41f319999f

It helped me to get a better idea of the concept. Thank you, Nathan, for your contributions.

So why do we want to copy the original data?

Autoencoders are known for their uses in dimension-reduction applications. However, in this case, we want to use a type of autoencoder called the denoising autoencoder.

We can add noises to the training data and train them against our original images. As a result, the autoencoder will learn the necessary steps to denoise the input data.

For this project, I am using Microsoft’s Dog and Cats dataset. You can find it here. I am just using the pictures of dogs for this example since this is not a classification algorithm. You can use whatever dataset you want.

First, we need to resize the images to the same size. This can be achieved by cv2.resize() in OpenCV. I am resizing them to 300 x 300. Of course, you can use other image processing libraries to do this as well.

Make sure you also use reshape the data to (# of pictures, width, height, 3) for the appropriate shape, otherwise the autoencoder will not work. You can use a method called np.reshape to do this.

Now let’s talk about adding noises to the data. There are several ways you can achieve that. However, I am using the Gaussian noise algorithm in this example. Remember the normal distribution from statistics? Well, we need that to generate the noises.

Source: https://machinelearningmastery.com/a-gentle-introduction-to-calculating-normal-summary-statistics/

Like most things in the universe, noises also follow the normal distribution, which is characterized by the iconic bell-shaped curve. Essentially we are generating noises based on the values of the Gaussian function. This can be achieved by the following code:

# adds the Gaussian noise based on the mean and the standard deviation
def add_gaussian_noise(data):
mean = (10, 10, 10)
std = (50, 50, 50)
row, col, channel = data.shape
noise = np.random.normal(mean, std, (row, col, channel)).astype('uint8')
eturn data + noise
def add_gaussian_to_dataset(data):
count = 0
end = len(data)
output_data = []
while count < end:
output_data.append(add_gaussian_noise(data[count]))
count+=1
return np.array(output_data)

The first function is used to add Gaussian noise to an individual image. The second function is used to add the Gaussian noise to the given data set using the first function.

Now let’s talk about the elephant in the room, the denoising autoencoder. As we can see, the neural network consists of an encoder and a decoder. I am using 3000 images of dogs with Gaussian noise and 3000 images of dogs without Gaussian noise to train my neural network. Here is the code:

def create_model():
x = Input(shape=(300, 300, 3))
# Encoder
e_conv1 = Conv2D(64, (3, 3), activation='relu', padding='same')(x)
pool1 = MaxPooling2D((2, 2), padding='same')(e_conv1)
batchnorm_1 = BatchNormalization()(pool1)
e_conv2 = Conv2D(32, (3, 3), activation='relu', padding='same')(batchnorm_1)
pool2 = MaxPooling2D((2, 2), padding='same')(e_conv2)
batchnorm_2 = BatchNormalization()(pool2)
e_conv3 = Conv2D(16, (3, 3), activation='relu', padding='same')(batchnorm_2)
h = MaxPooling2D((2, 2), padding='same')(e_conv3)
# Decoder
d_conv1 = Conv2D(64, (3, 3), activation='relu', padding='same')(h)
up1 = UpSampling2D((2, 2))(d_conv1)
d_conv2 = Conv2D(32, (3, 3), activation='relu', padding='same')(up1)
up2 = UpSampling2D((2, 2))(d_conv2)
d_conv3 = Conv2D(16, (3, 3), activation='relu')(up2)
up3 = UpSampling2D((2, 2))(d_conv3)
r = Conv2D(3, (3, 3), activation='sigmoid', padding='same')(up3)
model = Model(x, r)
model.compile(optimizer='adam', loss='mse')
return model
gaussian_auto_encoder = create_model()
gaussian_early_stop = EarlyStopping(monitor='loss', patience=3)
gaussian_history = gaussian_auto_encoder.fit(gaussian_train_data, train_data, epochs=50, batch_size=32, callbacks=[gaussian_early_stop])
The graph we get for training

This autoencoder layout uses the functional API, comparable to how functions work in algebra (function composition). This allows our model to be more flexible than a traditional sequential format.

Similar to using a convoluted neural network, you can use convolution and pooling layers for autoencoders as well. I am using the early stop callback to make sure I don’t train my model more than what is necessary.

Here are the results:

I am using the trained model on a picture from the testing data. My testing data consists of 1000 pictures of dogs with Gaussian noise and 1000 without.

Before and after using the autoencoder

As we can see the autoencoder did a good job removing the Gaussian noise from the test image. Despite the huge improvement in image clarity (less image loss), this does not improve the image resolution. That is an entirely different implementation of convolutional neural networks.

However, there is still much room for improvement. A larger dataset might be advantageous as it offers more variations. Also, we might want to diversify the training data to include images other than dogs. As a result, the model might generalize better.

The code for this article can be found here

I hope this is helpful. Feel free to provide me with feedback.

--

--