Image Denoising using Deep Learning

Sharath Solomon

Published in

Analytics Vidhya

17 min readAug 27, 2021

Content

Business Problem
Why Deep Learning?
Business Constraints
Performance Metric
Dataset Overview
Exploratory Data Analysis
Existing Approaches
Experimenting with Deep Learning models
Results
Model Quantization
Deployment using Streamlit Sharing
Conclusion
Potential Improvements
Link to Github and LinkedIn
References

Business Problem

The images that are captured in the real world come with noises. These noises can appear due to many reasons such as electric signal instabilities, malfunctioning of camera sensors, poor lighting conditions, errors in data transmission over long distances, etc. This can degrade the captured image's quality and can cause loss of information as the original pixel values are replaced by random values due to noise. So, there is a need to remove these noises from images when it comes to low-level vision tasks and image processing. The process of removing such noises from images is known as Image Denoising.

Therefore, the task at hand is to develop a solution that would remove these noises from images thereby improving image quality and retaining relevant pieces of information in the image.

Why Deep Learning?

The task of image denoising has been an interesting area of research for decades. Over the years many techniques and ideas have been introduced for image denoising. Most of these techniques assumed these noises in images to be Gaussian noise or impulse noise.

Gaussian Noise - Noise having PDF equal to the normal distribution. i.e. the pixel values that these noises can take are Gaussian distributed.

Impulse Noise - caused by sharp and sudden disturbances in the image signal. It usually occurs as white and black pixels in the image.

But this assumption doesn’t completely hold for real noise in photographs. The real-world noise (also known as blind noise) is more sophisticated and diverse. Due to this, most of the denoising techniques performed poorly in removing real noise from images.

So, to tackle this issue of denoising real-world noisy images, there is a need of using more advanced techniques. This is where deep learning comes into the picture and experiments have proved that training a convolutional blind denoising deep learning network outperforms other conventional image denoising techniques by a large margin. This is why we use deep learning for image denoising tasks.

Business Constraints

The two major constraints to look upon when solving this problem are

Denoise the real-world noisy images as close to the ground truth image.
No latency constraint. The priority is to denoise the image as close to the ground truth as possible, even if it takes a reasonable amount of time.

With these constraints in mind, I’ll be building deep learning models for image denoising tasks.

Performance Metric

Two famous metrics are usually used to check image quality.

A) Peak Signal to Noise Ratio (PSNR) [1]: It is the ratio of maximum possible power of a signal and the power of corrupting noise that aﬀects the quality of its representation. Since signals can take a wide range of values, PSNR is usually expressed in a logarithmic decibel scale.
Mathematically PSNR can be represented as

where MSE is given by

B) Structural Similarity Index (SSIM) [2]: It measures the similarity between two given images by mainly focusing on the structural information from a scene and identifying the differences between the information extracted from a reference and a sample scene. It is believed that the human visual perception system behaves in this way. Hence this is a good metric to use for measuring image quality.

This metric extracts three features namely luminance, contrast, and structure. Comparison between two images is done based on these features.

The further mathematical understanding of this metric can be found in the reference given above.

Dataset Overview

I have taken two publicly available datasets used for image denoising tasks as follows :

Smartphone Image Denoising Dataset (SIDD) [3]:- It consists of 320 clean-noisy image pairs.
Real Low-Light Image Noise Reduction Dataset (RENOIR) [4]:- It consists of 221 clean-noisy image pairs.

I have merged these datasets and shuffled them. Thus, we get a total of 541 clean noisy image pairs for our task. Then, I’ve split the dataset into train and test images in the ratio (80:20). So, we have a total of 432 train images pairs and 109 test image pairs.

Exploratory Data Analysis

Let's try to understand our image dataset better by performing a thorough EDA on them. We can look into plots like pixel distributions, PSNR and SSIM values of the image pairs, etc, and see the difference between clean and noisy images.

Visualizing few clean-noisy image pairs

As one can see, there is a significant amount of noise in the noisy images and the ground truth images show the corresponding clean images free from noise.

Mean pixel distribution of images

For most of the images (both clean and noisy), the mean pixel values range between 20–75. This means, most of the images have dark to medium brightness. Only a few images have high mean pixel values or high brightness.

Analyzing the pixel distribution of few clean-noisy image pairs by plotting histograms:

Pixel distribution of clean and its corresponding noisy image

The noisy images seem to have smoother pixel intensity distribution compared to clean images. Also, many pixels in noisy images take zero pixel value compared to the corresponding pixel values of its clean image. This means noise replaces many of the actual pixel values with dark color.

Analyzing the PSNR and SSIM value of the images

PDF and Histogram plot of the PSNR values of all clean-noisy image pairs

The majority of the clean-noisy image pairs have PSNR values between 25–30. So, a good denoising model should give an average PSNR value greater than 30db for the clean-noisy image pairs.

PDF and Histogram plots of the SSIM values of all clean-noisy image pairs

The majority of the clean-noisy image pairs have SSIM values between 0.1–0.7. So, a good denoising model should give an average SSIM value greater than 0.7 for the clean-noisy image pairs.

Creating Patches

We will split each of these images into small patches. Experiments have shown that splitting images into patches and using these patches for training improve model performance in denoising.

This is what patching does. It splits the images into different patches based on the given patch size. We will plot few clean noisy image patches and visualize them.

There is a significant amount of noise in the noisy image patches and this is what we are trying to remove. Since the dataset images have different sizes, to maintain a fixed number of patches for each image, we have to resize every image to a fixed value. So, we will resize all the images to a fixed size of 1024 x 1024 and create patches with a patch size of 256 x 256. This will give 4x4=16 patches for each image.

After creating patches, we got 6912 image patches for train images and 1744 image patches for test images. We will use these train and test image patches for modeling.

Existing Approaches

As discussed, this task of image denoising has been an interesting area of research for decades. Over the years, many techniques have been used for solving this problem. One such famous technique is applying filters to remove noise. There are many filters available for image denoising. Most of these filters are very specific to the type of noise present in the image. One such famous filter is known as the Non-local-means (NLM) algorithm.

NLM filter replaces each pixel value of an image by the mean of all pixels in the image patch, weighted by how similar these pixels are to the target pixel. This results in much greater post-filtering clarity, and less loss of detail in the image and was found to work well in image denoising compared to many other traditional filters.

Check the below video for a more detailed explanation of the NLM algorithm.

Image Denoising using NLM filter

Now, before we jump into using deep learning models for denoising, let's look at how this simple filter performs in image denoising. We will take few image patches from our dataset and apply the NLM filter on them for denoising and visualize these denoised images. This will help us understand the need of using more advanced techniques like deep learning for denoising tasks.

As one can see, the NLM filter can denoise the images to some extent. But it smoothens many details that are present in the ground truth images leading to the loss of important information that should have been retained. Also, when noise is too high NLM fails to provide good results.

So, there is a need of using more advanced techniques like deep learning for image denoising tasks.

Input Data Pipeline

Now that we have the train and test image patches taken from the clean-noisy image pairs of SIDD and RENOIR datasets, we are ready for modeling. We have a total of 6912 and 1744 train and test image patches with patch size 256 x 256.

X_train_image_patches.shape = (6912, 256, 256, 3) — Ground Truth Images y_train_image_patches.shape = (6912, 256, 256, 3)—Noisy Images X_test_image_patches.shape = (1744, 256, 256, 3) — Ground Truth Images y_test_image_patches.shape = (1744, 256, 256, 3) — Noisy Images

I’ll create an input data pipeline that will take these image patches as inputs for model training. I’ll be using Keras Custom Data Generators for building the input pipeline.

Before loading it to the custom generators, the train and test patches were normalized by dividing every pixel by 255. The input data pipeline will load data to the models as batches with batch size = 32. The input shape given to the models will be (32, 256, 256, 3).

Experimenting with Deep Learning models

With the advancement of Deep Learning techniques, it is now possible to remove real noise from images such that the denoised image will be very similar to the ground truth image with minimal loss of detail.

Over the recent years, many deep learning architectures have been developed for image denoising tasks. Among them, I’ll be implementing four state-of-the-art deep learning architectures to solve this problem as follows:

Autoencoders (Baseline Model)
CBDNet
PRIDNet
RIDNet

Autoencoders

This is a simple encoder-decoder network [5] with 3 convolutional layers followed by max-pooling for the encoder unit and 3 deconvolutional layers for the decoder unit. The output from the decoder is then given to a convolutional layer with 3 filters to maintain similar input and output shape. This is a simple architecture that will be used as a baseline model.

The loss function used for training the model is Mean Squared Error (MSE). The model is trained for 15 epochs and it gave a train and test loss of 0.0011.

Though this simple architecture can reduce the noise, there is a lack of clarity in the predicted images. Before denoising, the average PSNR and SSIM scores on the test data were 18.74 and 0.47 respectively. The autoencoder model gave a PSNR score of 31.19 and an SSIM score of 0.74 respectively on the same test data. This means, the model is working pretty well and we can take these scores as a benchmark value to compare the performance of the other models.

CBDNet — Convolutional Blind Denoising Network [6]

The CBDNet architecture comes with 2 subnetworks. First is a noise estimation subnetwork (CNNe- estimates the noise level map in a noisy image), followed by a non-blind denoising subnetwork (CNNd- denoises the noisy image). The network architecture is as shown below

In the original research paper, they trained this model using real noisy images and synthetically added noisy images. They synthetically added noise to the images using a noise model. Since the dataset we are using already has clean-noisy image pairs, I didn’t consider synthetically adding noise to the images. So, I haven’t taken the noise-creating model for building the network. Also, the loss function used in the paper is as follows :

Loss = Mean Square Error (MSE) + (lambda x Total variation regularizer)
where total variation regularizer prevents over smoothing of denoised images and lambda is a hyperparameter.

Modifications in CBDNet implementation compared to research paper:

Did not add synthetic noise to the image dataset since we have real-noisy image pairs.
Taken the loss function as mean squared error.

The model was trained for 30 epochs and it gave a train loss of 0.00044 and test loss of 0.000453.

As one can see, there is a great improvement in model performance compared to the autoencoder model. The predicted denoised images are more clear which was not the case for autoencoders. The model gave an average PSNR score of 35.256 and an average SSIM score of 0.848 on test data.

PRIDNet — Pyramid Real Image Denoising Network [7]

The network architecture is as shown below :

The number of channels of feature maps is shown below them, for the “sRGB” model it is in the parentheses, while for the “raw” model it has no parentheses. The symbol || indicates concatenation.

The network is divided into three stages that solve three main issues that were never really addressed in many of the CNN-based denoising networks.

A) Channel Attention Module (CAM): Most CNN-based denoising networks give equal importance to all the channel-wise features. But in reality, some noises are more signiﬁcant than others and should be given more weightage. PRIDNet achieves this by implementing a channel attention module in their network, that will add diﬀerent weightage to the channels depending upon the estimated noise level.

CAM squeezes the input information U using global average pooling followed by 2 convolutions with ReLU activation for ﬁrst and sigmoid activation for the second. This computation will give you the weights 𝜇 for diﬀerent channels which are then multiplied with the input information U thereby recalibrating channel importance. This stage is called the Noise Estimation Stage.

B) Five layer pyramid module: Traditional CNN-based denoising network uses ﬁxed receptive ﬁelds which captures global information of the noise in the image but fails to capture diverse information. PRIDNet ﬁxes this issue by using diﬀerent scaled receptive ﬁelds which will also capture diverse noise information in the image. Results show that this implementation helps in denoising images that suﬀer from heavy noise.

Here the input feature maps are downsampled to diﬀerent sizes and so we can use diﬀerent scale receptive ﬁelds thereby capturing both global and diverse information. These down-sampled features are given to pooling layers followed by a U-Net architect and then upsampled to original size and the outputs are concatenated together. The number of ﬁlters and their sizes are shown in the image. This stage is known as Multi Scaled Denoising Stage.

C) Kernel Selecting Module: In traditional CNN-based denoising networks, multi-scaled features are combined usually using element-wise summation or by concatenating them. This means information from diﬀerent scales is treated the same which fails to express multi-scaled features adaptively. To prevent this, PRIDNet introduced a kernel selecting module which uses diﬀerent sized kernels for each channel of the concatenated multi-scaled features.

The multi-scaled concatenated output (U) that we get from the multi-scale denoising stage is given to three parallel convolutions with kernel sizes 3,5,7 and is then summed up. Then it is squeezed using global average pooling followed by 2 convolutions to give 3 vectors 𝜶,𝜷,𝛄 to which softmax activation is applied. Then these vectors are multiplied with the output of the initial 3 parallel convolutions to get V’, V’’, V’’’ and are added to give the ﬁnal denoised output image V = V’ + V’’ + V’’’. This stage is called Feature Fusion Stage.

Also, to avoid loss of information, the output of each stage is concatenated with the input of the previous stage throughout the network.

The model was trained for 30 epochs with MSE as loss function and it gave a train loss of 0.000449 and test loss of 0.000457.

The model seems to give almost the same visual results as that of CBDNet. The model gave an average PSNR score of 35.126 and an average SSIM score of 0.848 on the test data. According to research papers, the PRIDNet model surpasses the CBDNet model in performance when trained for a higher number of epochs. But PRIDNet model comes with the disadvantage of having a huge number of parameters to train and a high model size. Despite that, it doesn’t give a significant improvement in model performance compared to CBDNet.

RIDNet — Residual Image Denoising Network [8]

The network architecture is as shown below:

Different green colors of the convolution layers denote different dilations while the smaller size of the convolution layer means the kernel is 1 × 1. The second row shows the architecture of each EAM.

This network is composed of three main modules as follows :

A) Feature Extraction Module: It is composed of only one convolutional layer to extract initial features from the noisy input. I’ve used 64 filters with kernel size=3 for the convolutional layer.

B) Feature Learning Residual on Residual Module: It is composed of a network called Enhancement Attention Modules (EAM) that uses a residual on the residual structure with local skip and short skip connections. The initial part of EAM uses wide receptive fields through kernel dilation and branched convolutions thereby capturing global and diverse information from the input image. Additional features are learned using a residual block of two convolutions followed by an enhanced residual block (ERB) of three convolutions. Finally, it is given to a feature attention block that gives more weight to the important features.

We can increase the depth of the RIDNet network by increasing the number of EAM blocks. However, in the research paper, they restricted the network to four EAM blocks only.

C) Reconstruction Module: The output of the final EAM block is given to the reconstruction module which is again composed of only one convolutional layer that gives the denoised image as output.

The loss function used in the research paper is Mean Absolute Error (L1 loss) but I’ll be using Mean Squared Error (L2 Loss).

The model was trained for 25 epochs with MSE as loss function and it gave a train loss of 0.000321 and test loss of 0.000334.

Looking visually, this model also seems to give a similar performance to that of PRIDNet and CBDNet. The model gave an average PSNR score of 36.595 and an average SSIM score of 0.881 on the test data.

Results

We will compare the performance of all the models based on the PSNR and SSIM score and will also take the model size into account to decide the best model.

Compared to CBDNet and PRIDNet models, the RIDNet model has better PSNR and SSIM scores and less model size. Therefore, we will finalize the RIDNet model as the best model for the image denoising task.

RIDNet Model Performance on few noisy images :

Model Quantization

Model quantization is a conversion technique that can reduce model size while also improving CPU and hardware accelerator latency, with little degradation in model accuracy. It works by reducing the precision of the numbers used to represent a model’s parameters, which by default are 32-bit floating-point numbers. This results in a smaller model size and faster computation.

One can quantize an already-trained float TensorFlow model when you convert it to TensorFlow Lite format using the TensorFlow Lite Converter [9].

RIDNet model performance and size before and after quantization.

Prediction on a noisy image using original RIDNet and quantized RIDNet model.

After model quantization, the size of the model decreased from 20.956 MB to 6.877 MB without any significant drop in model performance. But unfortunately, the prediction time increased drastically for the quantized model. This is because the prediction is done on image patches and then these patches are merged to give the final denoised output. So, for each patch, the TensorFlow lite version needs to invoke the model thereby taking more time.

Deployment using Streamlit Sharing

For better user experience to the readers, I have deployed the model using Streamlit which is an open-source app framework for machine learning and data science projects.

The deployed model can be accessed here:

https://share.streamlit.io/sharathsolomon/imagedenoising/main/model.py

The created web app has two options.

Predict on sample images: There are a few sample images already uploaded to the app which you can select to see how the model performs. Select any of the sample images listed and get its denoised output.
Upload a noisy image: The user can also upload a noisy image and get its denoised output.

Here is a video that shows how the deployed model makes predictions.

The web app is running on CPU and therefore prediction time takes around 10 seconds. The prediction time can be reduced to milliseconds by using GPU.

Conclusion

CBDNet and PRIDNet model gives a comparable performance. According to research papers, the PRIDNet model surpasses the CBDNet model in performance when trained for a high number of epochs. But PRIDNet model comes with the disadvantage of a huge number of parameters to train and a high model size. Despite this, it doesn’t give a significant improvement in model performance compared to CBDNet.
RIDNet model is a more recent technique compared to CBDNet and PRIDNet. It also gives a significant improvement in model performance in terms of PSNR and SSIM values. Another advantage is that the number of parameters and model size is less compared to the other models.
A simpler network like RIDNet proves to significantly improve image denoising performance compared to a complex network like PRIDNet with a huge number of parameters to train. This means that a complex model doesn’t necessarily guarantee to work better in solving a problem compared to simple networks.

Potential Improvements

Image denoising is an active field of research and many amazing architectures are being developed to denoise the images. Recently, researchers are using GANs to denoise images, which has proven to give amazing results.

Also, image restoration is another active field of research that tries to restore damaged images such as deblurring blurred images, image deraining, etc. Over the years many advanced deep learning architectures have developed to solve this problem and these networks also work well in image denoising tasks. According to www.paperswithcode.com [10], image restoration models like HINet, Uformer32, MIRNet give better performance for image denoising tasks compared to models only designed for image denoising purposes.

Link to GitHub and LinkedIn

You can find the entire code for this case study in my Github repository. Please feel free to connect with me on LinkedIn or via email at sharath.solomon@outlook.com

References

Peak Signal to Noise Ratio (PSNR): https://www.ni.com/en-in/innovations/white-papers/11/peak-signal-to-noise-ratio-as-an-image-quality-metric.html
Structural Similarity Index (SSIM): https://medium.com/srm-mic/all-about-structural-similarity-index-ssim-theory-code-in-pytorch-6551b455541e
Smartphone Image Denoising Dataset (SIDD): https://www.eecs.yorku.ca/~kamel/sidd/dataset.php
Real Low-Light Image Noise Reduction Dataset (RENOIR): http://adrianbarburesearch.blogspot.com/p/renoir-dataset.html
Autoencoders: https://keras.io/examples/vision/autoencoder/
CBDNet Research Paper: https://arxiv.org/pdf/1807.04686v2.pdf
PRIDNet Research Paper: https://arxiv.org/pdf/1908.00273v2.pdf
RIDNet Research Paper: https://arxiv.org/pdf/1904.07396v2.pdf
Tensorflow Lite Converter: https://www.tensorflow.org/lite/convert/
Image Denoising on SIDD:https://paperswithcode.com/sota/image-denoising-on-sidd
Applied AI Course: https://www.appliedaicourse.com/