Super Resolution with SRResnet, SRGAN

Sieun Park
Analytics Vidhya
Published in
4 min readMar 15, 2021

Original Paper: Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network

Paper Summary

  • This paper proposes a network architecture named SRResNet that show superior performance at the PSNR benchmark over other previous methods[2].
  • Suggest a perceptual VGG loss function to recover fine texture details, instead of the MSE loss previously used to find the average texture.
  • Integrates GAN(Generative Adversarial Networks) adversarial loss to generate even finer texture details and better perceptual quality evaluated through Mean Opinion Scores.
Super-Resolved images with the method proposed by this paper.

Perceptual Loss for SR

While it might be compelling to use the pixel-wise MSE error as a metric to measure the performance of the model and thus resulting in maximizing the PSNR score, this loss definition has some obvious flaws for generating perceptually high-quality images. This is because the MSE based solution is optimized when it outputs the average of all possible solutions, which might be not on the HR image manifold and can be sometimes blurry, and unreal. This phenomena is illustrated in the figure below with the blue patch as the MSE based optimal solution.

To solve the problem, the authors first proposed a GAN based solution to capture the natural image manifold, and a hybrid loss of summing the context loss and the adversarial loss. To further improve performance, the authors also came up with an improved context loss, which compares more high level features of the image through looking at intermediate activation of the pre-trained VGG-19 network. This loss is described below, φi,j indicates the feature map after the j-th convolution (after activation) before the i-th maxpooling layer within the VGG19 network.

VGG loss

As seen below, the SRResnet recovers very blurry patches of the image, while losses incorporating adversarial loss and VGG context loss enables photo-realistic image super resolution. The difference between VGG22 and VGG54 is the number of layers of the VGG network which was used to calculate the loss(22->4 layers, 54->19 layers).

The PSNR, SSIM, MOS score for each loss was evaluated in the experiments. The PSNR and SSIM score was best for the SRResNet-MSE model, while using more perceptual loss functions resulted in a significantly high MOS Score. Albeit, the regular SRResNet-MSE model also outperformed all previous methods both in PSNR/SSIM and MOS score, showing the effectiveness of the proposed model architecture.

Comparisons with previous methods

Model Architecture

The paper proposes a generator network and a discriminator network each used to super resolve images, and to discriminate super resolution images form high resolution ground-truth images. The generator network is composed of 5 residual blocks that manipulate the image at the lower scale, and a method proposed by ESPCN[3] to reconstruct the super resolution image without needing to manually fill out intermediate pixel values. The method is elaborated in the image and link below. Each residual block for the generator has two sets of the conventional (Conv-BN-ReLU) block with a constant channel number of 64 and kernel size of 3 for all blocks, and a skip connection afterwards.

The discriminator is a conventional CNN which inputs the image, and aims at classifying whether it is a real image or a false image. For more information on GANs, try the Tensorflow DCGAN tutorial.

https://torch.vision/2020/01/14/Efficient_Sub_Pixel_Convolutional_Neural_Network.html

Implementation of this paper will be posted soon.

References

[1] Ledig, Christian, et al. “Photo-realistic single image super-resolution using a generative adversarial network.” Proceedings of the IEEE conference on computer vision and pattern recognition. 2017.

[2] Dong, Chao, et al. “Image super-resolution using deep convolutional networks.” IEEE transactions on pattern analysis and machine intelligence 38.2 (2015): 295–307.

[3] Shi, Wenzhe, et al. “Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network.” Proceedings of the IEEE conference on computer vision and pattern recognition. 2016.

--

--

Sieun Park
Analytics Vidhya

Who knew AI could be so hot? I am an 18 years old Korean student. I started working in ML at 16, but that's it. LinkedIn: https://bit.ly/2VTkth7 😀