An overview of NTIRE 2020 Extreme Super-Resolution Challenge

Sieun Park
Analytics Vidhya
Published in
5 min readApr 5, 2021

NTIRE Challenge

The 2020 NTIRE challenge on extreme super-resolution[1] is about super resolving an image with a scaling factor of x16. The challenge paper reviews 19 methods that were proposed to solve this problem and compete for perceptual performance. We will take an overview of how the competition was conducted and about the intuitions of some high-scoring methods proposed by participants of the challenge.

Compared to the active research in SISR for moderate factors such as x4, not much was made on the research to extreme super-resolution. Classic MSE based solutions inherit a behavior to output smoothed images for SR and this phenomenon gets more extreme in this challenge because more HR patches can be generated from a given LR image.

NTIRE 2020, NTIRE 2020 Challenge on Perceptual Extreme Super-Resolution: Methods and Results

Dataset

The Div8K dataset with 1,700 8K images was used for training and test data in the challenge. The dataset was only available after participating in the Codalabs challenge. I also did upload the dataset in Google Drive, although since I am unsure about the policies of the dataset provider, I would not share the link publically.

Perceptual Evaluation

This challenge was ranked based on perceptual measures instead of PSNR/SSIM. The two measures were Learned Perceptual Image Patch Similarity(LPIPS)[2] and Perceptual index(PI). LPIPS is measured through activations of a specially trained CNN and the PI is evaluated through a user survey.

Trends and Overview

Mainly, the network architecture and loss functions were modified to increase the capability of the model and improve the perceptual quality of super-resolved images.

Several teams extended existing architecture such as RCAB, ESRGAN with progressive upscaling for the problem, and some directly reconstructed the 16x scaling factor.

Most teams adopted the L1 loss or used the same loss of ESRGAN by using the combination of L1, VGG perceptual, and relativistic GAN loss. CIPLAB replaced the VGG loss with the LPIPS loss and some teams differed the discriminator with a U-Net-like architecture.

The SOTA for both perceptual and PSNR measures was improved compared to the results of the 2019 AIM perceptual extreme SR challenge. Although, we can see from the results that the challenge is far from being solved. Especially for the recovery of spatial information, the proposed methods don't seem to output visually pleasing results(Second figure).

sdsdad

Methods

We will review the top 3 methods proposed in the challenge. The table below ranks the performance of various metrics evaluated by different measures. A detailed description of all the methods is provided in the official paper[1].

OPPO-Research[3]

Like many other works in the challenge, the proposed RFB-SRGAN architecture is based on ESRGAN. The first Trunk-A model consists of 16 RRDBs proposed in ESRGAN. The following Trunk-RFB module consists of a DenseNet architecture based on Receptive Field Blocks(RFB) which capture information from different scales efficiently. Then, the feature maps are reconstructed through alternating layers of sub-pixel-convolution and nearest-neighbor interpolation, both of which can greatly reduce time cost. The proposed RFB block and network pipeline are illustrated below.

CIPLAB[4]

The CIPLAB team uses the LPIPS loss for the perceptual loss instead of the VGG loss. Because the VGG network is trained for image classification, it may not be the best choice for SR. The proposed generator consists of two ESRGAN generators and the discriminator is a U-Net architecture with continuous downsampling and upsampling operations, which aims to provide both pixel-wise and global context discriminations.

HiImageTeam[5]

HiImageTeam proposed Cascade SR-GAN (CSRGAN) for perceptual extreme SR. As shown in Figure 6, CSRGAN achieves an upscaling of ×16 via four successive ×2 subnetworks (CSRB). In order to improve the performance, a novel residual dense channel attention block (see Figure 7) is proposed. Final CSRGAN uses VGG perceptual loss and GAN loss to enhance the perceptual quality of superresolved images.

Reference

[1] Zhang, K., Gu, S., & Timofte, R. (2020). NTIRE 2020 challenge on perceptual extreme super-resolution: Methods and results. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (pp. 492–493).

[2] Zhang, R., Isola, P., Efros, A. A., Shechtman, E., & Wang, O. (2018). The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 586–595).

[3] Shang, T., Dai, Q., Zhu, S., Yang, T., & Guo, Y. (2020). Perceptual extreme super-resolution network with receptive field block. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (pp. 440–441).

[4] Jo, Y., Yang, S., & Kim, S. J. (2020). Investigating loss functions for extreme super-resolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (pp. 424–425).

[5] Wang, Z., Ye, M., Yang, F., Bai, X., & Shin’ichi Satoh. (2018, July). Cascaded SR-GAN for scale-adaptive low-resolution person re-identification. In IJCAI (Vol. 1, №2, p. 4).

--

--

Sieun Park
Analytics Vidhya

Who knew AI could be so hot? I am an 18 years old Korean student. I started working in ML at 16, but that's it. LinkedIn: https://bit.ly/2VTkth7 😀