Analytics Vidhya
Published in

Analytics Vidhya

Real Image Denoising with Feature Attention (RIDNet)

Photo by Jon Tyson on Unsplash

One of the fundamental challenges in the field of image processing and computer vision is image denoising, where the underlying goal is to estimate the original image by suppressing noise from a noise-contaminated version of the image.

Contents :

  1. Business Problem
  2. Use of Deep Learning
  3. Source of Data
  4. Existing Approaches
  5. Ridnet
  6. First Cut Solution
  7. Implementation
  8. References
  9. Github Repo
  10. Linkedin profile

1. Business Problem

Image noise may be caused by different intrinsic (i.e. sensor) and extrinsic (i.e. environment) conditions which are often not possible to avoid in practical situations.

Therefore, image denoising plays an important role in a wide range of applications such as image restoration, visual tracking, image registration, image segmentation, and image classification, where obtaining the original image content is crucial for strong performance.

2. Use of Deep Learning

While many algorithms have been proposed for the purpose of image denoising , deep learning techniques have received much attention in the area of image denoising.

Deep convolutional neural networks perform better on images containing spatially invariant noise (synthetic noise); however, their performance is limited on real-noisy photographs and requires multiple stage network modeling. To advance the practicability of denoising algorithms, this paper proposes a novel single-stage blind real image denoising network (RIDNet) by employing a modular architecture.

The authors have used a residual on the residual structure to ease the flow of low-frequency information and apply feature attention to exploit the channel dependencies.

Although , the authors have created the model using pytorch , I have tried to recreate it using tensorflow and keras.

3. Source of Data

The dataset is taken from — RENOIR — A Dataset for Real Low-Light Image Noise Reduction — .

The dataset consists of noisy and clean ground truth image pairs clicked from Xiaomi Mi3 mobile phone’s camera.

4. Existing Approaches

  • Currently, due to the popularity of convolutional neural networks (CNNs), image denoising algorithms have achieved a performance boost. Notable denoising neural networks, DnCNN and IrCNN predict the residue present in the image instead of the denoised image as the input to the loss function is ground truth noise as compared to the original clean image. Both networks achieved better results despite having a simple architecture where repeated blocks of convolutional, batch normalization and ReLU activations are used. Furthermore, IrCNN and DnCNN are dependent on blindly predicted noise i.e. without taking into account the underlying structures and textures of the noisy image.
  • Very recently, CBDNet trains a blind denoising model for real photographs. CBDNet is composed of two subnetworks: noise estimation and non-blind denoising. CBDNet also incorporated multiple losses, is engineered to train on real-synthetic noise and real-image noise and enforces a higher noise standard deviation for low noise images. Furthermore, may require manual intervention to improve results.

5. Ridnet

5.1. Network Architecture

The model is composed of three main modules i.e. feature extraction, feature learning residual on the residual module, and reconstruction, as shown in Figure 2. Let us consider x is a noisy input image and yˆ is the denoised output image. Our feature extraction module is composed of only one convolutional layer to extract initial features f0 from the noisy input:

f0 = Me(x),

where Me(·) performs convolution on the noisy input image. Next, f0 is passed on to the feature learning residual on the residual module, termed as Mf l:

fr = Mf l(f0),

where fr are the learned features and Mf l(·)is the main feature learning residual on the residual component, composed of enhancement attention modules (EAM) that are cascaded together as shown in Figure 2.

The network has small depth, but provides a wide receptive field through kernel dilation in each EAM initial two branch convolutions. The output features of the final layer are fed to the reconstruction module, which is again composed of one convolutional layer:

yˆ = Mr(fr),

where Mr(·) denotes the reconstruction layer.

Some networks employ more than one loss to optimize the model, contrary to earlier networks, we only employ one loss i.e. l1 or Mean absolute error (MAE).

Now, given a batch of N training pairs, {xi , yi} N i=1, where x is the noisy input and y is the ground truth, the aim is to minimize the l1 loss function as

L(W) = 1 /N i=1- N ||RIDNet(xi) − yi ||,

where RIDNet(·) is the network and W denotes the set of all the network parameters learned.

5.2. Feature learning Residual on the Residual

Enhancement attention module (EAM) uses a Residual on the Residual structure with local skip and short skip connections. Each EAM is further composed of D blocks followed by feature attention.

The first part of EAM covers the full receptive field of input features, followed by learning on the features; then the features are compressed for speed, and finally a feature attention module enhances the weights of important features from the maps.

The first part of EAM is realized using a novel merge-and-run unit as shown in Figure 2 second row. The input features branched and are passed through two dilated convolutions, then concatenated and passed through another convolution. Next, the features are learned using a residual block of two convolutions while compression is achieved by an Enhanced residual block (ERB) of three convolutional layers. The last layer of ERB flattens the features by applying a 1×1 kernel.

Finally, the output of the feature attention unit is added to the input of EAM.

5.3. Feature Attention

Attention has been around for some time; however, it has not been employed in image denoising. Channel features in image denoising methods are treated equally, which is not appropriate for many cases. To exploit and learn the critical content of the image, we focus attention on the relationship between the channel features; hence the name: feature attention .

As convolutional layers exploit local information only and are unable to utilize global contextual information, we first employ global average pooling to express the statistics denoting the whole image, other options for aggregation of the features can also be explored to represent the image descriptor. Let fc be the output features of the last convolutional layer having c feature maps of size h × w; global average pooling will reduce the size from h × w × c to 1 × 1 × c as:

gp = 1 / h x w i=1- h i=1- w fc(i, j),

where fc(i, j) is the feature value at position (i, j) in the feature maps.

Furthermore , a self-gating mechanism is used to capture the channel dependencies from the descriptor retrieved by global average pooling.The gating mechanism is =

rc = α(HU (δ(HD(gp)))),

where HD and HU are the channel reduction and channel upsampling operators, respectively. The output of the global pooling layer gp is convolved with a downsampling Conv layer followed by relu activation.To differentiate the channel features, the output is then fed into an upsampling Conv layer followed by sigmoid activation.

6. First Cut Solution

To test the model’s performance, I first trained it on the mnist dataset with default parameters for just 10 epochs.

Original mnist images
Mnist images with Added white gaussian noise (AWGN)
Predicted denoised images

7. Implementation

Next, I trained the model on the RENOIR dataset. In each training batch, each image is divided into patches with a size of 80 × 80. Adam is used as the optimizer with default parameters. The learning rate is initially set to 0.0001 and then divided by 50 after each batch is processed . Training is done for 50 epochs with a batch size of 32 . The network is implemented in tensorflow and keras using google colaboratory . Furthermore, Peak signal to noise ratio (PSNR) is used as evaluation metric, with the best model giving validation mae loss of 0.02968.

Noisy rgb images
Predicted denoised rgb images
Original ground truth rgb images
Noisy rgb single image
Predicted denoised rgb single image
Original ground truth rgb single image

9. Github Repo

Link to my github repo — .

10. Linkedin profile

Link to my Linkedin profile — .




Analytics Vidhya is a community of Analytics and Data Science professionals. We are building the next-gen data science ecosystem

Recommended from Medium

Making Sense of Support Vector Machines (SVM): Mathematical Explanation

Confidence Interval vs Prediction Interval

Formula for Prediction Interval for Linear Regression

AI for Dog Lovers: An End to End AI Pipeline with FlashBlade — Part 3

Reinforcement Learning in Portfolio Management

Introduction to Linear Regression

Automating the Matched Filter using Neural Nets

PEPS based quantum computing simulator

Weight Initialization for Neural Networks — Does it matter?

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Puneet Chandna

Puneet Chandna

More from Medium

End-to-end Pytorch model in five minutes

Custom Dataset with Dataloader in Pytorch

Hand pose controlled car racing.

From scratch to CUDA installation and TensorFlow compilation from the sources on Ubuntu 20.04