A review on Super-Resolution

Elona Shatri
Analytics Vidhya
Published in
7 min readMar 10, 2020

Conversion from a high-resolution to a low-resolution image happens as a result of transferring, blurring, compression, different artefacts introduced along the way. Alternatively, in most cases, images are taken with low-resolution cameras, meaning the image will look blurred or have other image degradation. Super-resolution is the process of going from a low-resolution image to a higher-resolution one. The relationship between low and high-resolution images is given by this equation:

Where Ix is the degraded image or LR image, whereas Iy is the corresponding HR image. D is the degradation mapping function while δ is the parameters of the degradation process. Generally, we are not given the degradation function D, which makes it difficult to retrieve Iy.

That is why the retrieved image will be Îy. The ideal case would be if Îy and Iy are the same, meaning that we retrieved the original HR image. Here F is the super-resolution function and θ indicates the parameters of F.

Before moving on, worth noticing is that function D is an unknown process which can vary from defocusing, sensor noise, compression artefacts and so on. In order to map these processes in one function, some generalizations are made and researchers model degradation as a single downsampling mapping.

Downsampling is denoted as ↓s, where s is the sampling scale factor. However, there are other studies that model degradation using more operations.

There are three approaches to SR depending on the problem and the available data: (i) Multi-image super-resolution, (ii) Example-based super-resolution and (iii) Single-image super-resolution.

One of the first researchers to take Deep Learning as an approach to solving Super-Resolution (SRCNN) were Dong et al. back in 2014¹. They give a novel deep learning approach for single image Super-Resolution, showing that conventional sparse-coding approaches can be reconsidered utilising Convolutional Neural Networks (CNN). This approach learns an end-to-end mapping between low and high-resolution images, with very little pre/post-processing. Another contribution to this work is the possibility to use all three colour channels.

With the novelty brought by the before-mentioned papers, there were many researchers working towards adapting more Deep Learning methods into the Super-Resolution problem. The work proceeded was mainly focused on using deeper neural networks to tackle the problem. Kim et al.² utilise a very deep CNN inspired by the work of Visual Geometry Group (VGG)³. Their findings show that increasing the network depth results in significant improvement on accuracy. They use contextual region information spread in bigger regions of an image since in small patches not much of the information is contained. Considering that with very deep networks the problem of convergence becomes more relevant, they suggest using residual-learning CNN and high learning rates. Following up on this work, Lim et al.⁴ propose an enhanced deep Super-Resolution network (EDSR). This method removes unnecessary modules in conventional residual networks. Furthermore, they employ residual scaling techniques to stabilise the training process for large models. Kim et al.⁵ also use the deeply-recursive convolutional network (DRCN), up to 16 recursions, proposing to use recursive-supervision and skip-connections.

Another similar approach but using Laplacian Pyramid to reconstruct the sub-band residuals of high-resolution images is taken by Lai et al.⁶ . They propose using a coarse-resolution feature map as an input to each pyramid level. Furthermore transposed convolutions are used for upsampling to subtler levels. Worth considering is that while the before mentioned works use bicubic-interpolation during pre-processing, this one does not use it, instead they use the learned transposed convolutional layers. The loss function they use is Charbonnier which shows to be a more robust loss function, which ultimately optimises the model.

As for evaluating these feed-forward convolutional neural networks, a per-pixel loss between the ground-truth image and the output SR image is mainly used. In contrast, other approaches use perceptual loss function that is purely based on high-level features from pre-trained networks. Nevertheless, another approach⁷ proposes using both sets of features for a perceptual loss function that can be used during training a feed-forward network. While for the optimisation problem, qualitative results do not improve, the training is faster than state-of-the-art. Furthermore, they show that using perceptual loss results in more visually pleasing results compared to the per-pixel loss.

Although using CNN in SR is a breakthrough, CNNs do not solve one of the main problems, that is, recovering finer texture details large upscaling factors. Most of the above-mentioned work focuses on minimising the mean squared reconstruction error, which is why they result in high peak signal-to-noise ratios but still lack high-frequency details. That is why using Generative Adversarial Networks (GANs)⁸ emerged. This method estimates generative models using an adversarial process. Both the generative and discriminative model are trained simultaneously. While the first one is generating data that match the data distribution, the later one is estimating the probability that that sample comes from real training data rather than from the generator. The more it discriminates, the more the generative model will improve, which is the main idea of the technique.

GANs were first introduced in the SR problem by Ledig et al.⁹ (SRGAN) to solve the problem of finer texture details. They create a framework that can output realistic images for a 4x upscaling factor. They propose using GANs in order to produce a perceptual loss function that has both an adversarial and content loss. A discriminator is trained to tell the difference between the super-resolution images generated by the generator and photo-realistic images. In addition to this the content loss which looks at the perceptual similarity, rather than the per-pixel similarity. This work has brought as much novelty as the SRCNN, and it has been a benchmark for many other researchers.

One drawback of this work showed to be the introduction of different artefacts during the process. This issue is further explored by Wang et al.¹⁰, and make further adjustments in SRGANs architecture, perceptual loss and adversarial loss. They propose using Residual-in-Residual Dense Block (RRDB) with no batch normalisation. They let the discriminator predict a relative realness value instead of an absolute one. In order to save the consistency in brightness and texture, during the calculation of perceptual loss features used are the features before the activation.

Even though the mentioned approaches brought much improvement in super-resolution, there are many challenges to work on. Such is the complexity of these networks and effectiveness in degraded images with unknown degradation function. Furthermore, these methods lack model explain-ability, which is the case with most deep learning approaches. The question of why such methods create such good representations, or in some cases do not, is one of the most exciting and powerful questions in today in DL. Furthermore, more accurate and standard evaluation techniques are still an issue in SR.

[1] Dong, C., Loy, C. C., He, K., & Tang, X. (2015). Image super-resolution using deep convolutional networks. IEEE transactions on pattern analysis and machine intelligence, 38(2), 295–307.

[2] Kim, J., Kwon Lee, J., & Mu Lee, K. (2016). Accurate image super-resolution using very deep convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1646–1654).

[3] Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556.

[4] Lim, B., Son, S., Kim, H., Nah, S., & Mu Lee, K. (2017). Enhanced deep residual networks for single image super-resolution. In Proceedings of the IEEE conference on computer vision and pattern recognition workshops (pp. 136–144).

[5] Kim, J., Kwon Lee, J., & Mu Lee, K. (2016). Deeply-recursive convolutional network for image super-resolution. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1637–1645).

[6] Lai, W. S., Huang, J. B., Ahuja, N., & Yang, M. H. (2017). Deep Laplacian pyramid networks for fast and accurate super-resolution. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 624–632).

[7] Johnson, J., Alahi, A., & Fei-Fei, L. (2016, October). Perceptual losses for real-time style transfer and super-resolution. In European conference on computer vision (pp. 694–711). Springer, Cham.

[8] Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., … & Bengio, Y. (2014). Generative adversarial nets. In Advances in neural information processing systems (pp. 2672–2680).

[9] Ledig, C., Theis, L., Huszár, F., Caballero, J., Cunningham, A., Acosta, A., & Shi, W. (2017). Photo-realistic single image super-resolution using a generative adversarial network. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 4681–4690).

[10] Wang, X., Yu, K., Wu, S., Gu, J., Liu, Y., Dong, C., … & Change Loy, C. (2018). Esrgan: Enhanced super-resolution generative adversarial networks. In Proceedings of the European Conference on Computer Vision (ECCV) (pp. 0–0).

--

--

Elona Shatri
Analytics Vidhya

PhD student in Artificial Intelligence and Music. Working on Optical Music Recognition.