Review: RED-Net — Residual Encoder-Decoder Network (Denoising / Super Resolution)

Image Restoration including Image Denoising, Super Resolution, JPEG Deblocking, Image Deblurring and Image Inpainting.

In this story, RED-Net (Residual Encoder-Decoder Network), for image restoration, is reviewed. Suppose we have a corrupted image y:

where x is the clean version of y; H is the degradation function and n is the additive noise. By using the same network architecture but trained with different dataset, i.e. with different sets of x and y, RED-Net can help for the tasks of Image Denoising, Super Resolution, JPEG Deblocking, Image Deblurring and Image Inpainting.

It is published in 2016 NIPS with over 200 citations. It also has a more detailed version of technical report in 2016 arXiv. (Sik-Ho Tsang @ Medium)


What Are Covered

  1. Network Architecture
  2. Ablation Study
  3. Results on Image Denoising, Super Resolution, JPEG Deblocking, Image Deblurring and Image Inpainting

1. Network Architecture

RED-Net Network Architecture

The network contains layers of symmetric convolution (encoder) and deconvolution (decoder).

Convolution

The convolutional layers act as the feature extractor, which capture the abstraction of image contents while eliminating noises/corruptions.

Deconvolution

The deconvolutional layers are then combined to recover the details of image contents. Deconvolutional layers associate a single input activation with multiple outputs. Deconvolution is usually used as learnable up-sampling layers.

Skip/Shortcut Connections

Skip/Shortcut connections are connected every a few (in this case, two) layers from convolutional feature maps to their mirrored deconvolutional feature maps. Thus, the response from a convolutional layer is directly propagated to the corresponding mirrored deconvolutional layer, both forwardly and backwardly. The passed convolutional feature maps are summed to the deconvolutional feature maps element-wise, and passed to the next layer after rectification.


2. Ablation Study

2.1. Different Combinations of Convolution and Deconvolution

PSNR on Image Denoising (σ=70) Validation Set During Training
  • By using only 5 or 10 deconv (conv upsampling), the PSNR obtained is not good.
  • By using only 5 or 10 conv, the PSNR obtained is better.
  • By using 5 conv and 5 deconv, the PSNR obtained is much better.

2.2. Effectiveness of Skip/Shortcut Connections

PSNR on Image Denoising (σ=70) Validation Set During Training
  • With skip connections, the PSNR is much better.
  • The reason may be that deeper networks can destroy the image details, which is undesired for pixel-wise dense regression. Skip connections carry important image details, which helps to reconstruct clean image.
  • Using very deep networks may easily suffer from training issues such as gradient vanishing. Using skip connections can help to address this problem.
Training Loss
  • Without skip connections, network with more layers even increases the loss during training compared with those with fewer layers.
  • With skip connections, 30-layer network is better than 20-layer network with smaller training loss.
Skip Connections Types
  • RED-net, which consists of long and short symmetric skip connections, is better than the ResNet building block in ResNet.

3. Results on Image Denoising, Super Resolution, JPEG Deblocking, and Image Inpainting

3.1. Image Denoising

  • Reduce the noise of noisy images.
  • Datasets: 14 common benchmark image, and BSD Dataset.

3.1.1. One Model for One Noise Level

Average PSNR and SSIM results of σ 10, 30, 50, 70
  • RED2n: n conv and n deconv with symmetric skip connections
  • RED10 has already got the better results than other state-of-the-art approaches
  • RED30 has even better results.

3.1.2. One Model for All Noise Levels

Average PSNR and SSIM results for image denoising using a single 30-layer network
  • PSNR is degraded comparing to separate models, but it still beats the existing methods.
Visual results of image denoising. Images from left to right column are: clean image; the recovered image of RED30, BM3D, EPLL, NCSR, PCLR, PGPD, WNNM

3.2. Super Resolution

  • Enlarge the size of image.
  • Datasets: Set5, Set14, and BSD100

3.2.1. One Model for One Scaling Factor

Average PSNR and SSIM results of scaling 2, 3 and 4
  • RED30 again obtains the highest PSNR, better than SRCNN.
Visual results of image super-resolution. Images from left to right column are: High resolution image; the recovered image of RED30, ARFL+, CSC, CSCN, NBSRF, SRCNN, TSE
Average PSNR and SSIM results of scaling 2, 3 and 4
  • At the mean time for the development of RED-Net, VDSR and DRCN were invented, the concurrent works for super resolution.
  • RED30 nearly performs the best for all datasets and scaling factors.

3.2.2. One Model for All Scaling Factors

Average PSNR and SSIM results of scaling 2, 3 and 4 using a single 30-layer network
  • RED30 still performs quite well.

3.3. JPEG Deblocking

  • Lossy compression, such as JPEG, introduces complex compression artifacts, particularly the blocking artifacts, ringing effects and blurring.
  • Reduce the JPEG compression artifacts.
  • Datasets: LIVE1
JPEG compression deblock: average PSNR results of LIVE1
  • With such deep network, RED30 again obtains the best results compared with Deeper SRCNN and AR-CNN.

3.4. Image Deblurring

  • Reduce the blurs in the image.
The performance on deblurring “disk”, “motion” and “gaussian” kernels
Visual comparisons on non-blind deblurring. Images from left to right are: blurred images, the results of Cho [62], Krishnan [60], Levin [61], Schuler [63], Xu [59] and RED30
  • RED30 performs the best with highest PSNR.

3.5. Image Inpainting

  • Fill the holes or corrupted parts.
Images from left to right are: Corrupted images, the inpainting results of FoE and the inpainting results of RED30
  • RED30 has a better results compared with FoE.