Super-resolution using Deep Learning methods: A Survey

Published in

GEHC Tech India Blog

12 min readMar 16, 2022

Image super-resolution refers to the process of increasing the resolution of digital images. While super-resolution can be achieved by passing multiple low-resolution images (reference images) to algorithms, this article mainly focuses on single image supervised super-resolution (SR) techniques. While this problem has been tackled using analytical methods in the Computer Vision community [1][2], recent literature shows an upsurge in the usage of deep learning techniques to perform super-resolution [3][4]. In the field of medical imaging, image resolution is often limited by the constraints on acquisition time, radiation level and hardware costs. Hence, super-resolution techniques come to the rescue, to achieve desirable perceptual quality of the images acquired in such constrained environments.

Problem Definition

Super-resolution aims at recovering high-resolution (HR) images from the corresponding low-resolution (LR) images. The degradation process of generating LR from its HR counterpart can be modeled as :

where x and y denote the HR and LR images respectively, f is the degradation function and s is the scale factor. The mathematical representation of f differs from one domain to another. A general formulation of f can be expanded in the above equation to get :

where * refers to the two-dimensional convolution operator with blur kernel k, followed by a down-sampling operator with a scale factor s, and n refers to the additive white Gaussian noise.

Based on this formulation, the super-resolution (SR) techniques can be divided into two categories — Non-blind SR and Blind SR, depending on whether the target degradation function is known or not. Although the Blind SR techniques try to bridge the generalization gap of the super-resolving methods from simulation to real-world data, they are out of the scope of this article. An interested reader can refer to [5] for such methods.

In the literature of SR techniques, and for fair comparison among different methods, the degradation function f is assumed to bicubic down-sampling :

To this end, the objective of SR is to solve the inverse problem of restoring x back from y :

where 𝜃 refers to the parameters of the super-resolution model and L refers to the loss function used to optimize it with 𝜙(𝜃) being the regularization term, and 𝜆 is the tradeoff value. Due to the inverse nature of the definition, super-resolution is inherently an ill-posed problem.

Deep learning techniques for natural images

In the last section, we formulated the inverse problem of SR and discussed its ill-posed nature. Such a problem was conventionally solved by constraining the solution space using well-defined prior information, typically learnt using example-based methods. As deep learning methods started evolving for low-level computer vision tasks, many prominent works were published that demonstrated the application of deep learning methods for solving the SR problem. This section touches upon some of the landmark papers, aimed at solving the SR problem for natural images. Readers are encouraged to follow the references to get a detailed understanding of each of these methods.

A) Image Super-Resolution Using Deep Convolutional Networks [6]

The conventional example-based methods had three typical steps in their solution pipeline:

i. Patch extraction: Dense cropping of overlapping patches from the input image and preprocessing them.

ii. Encoding the patches: Using a dictionary built on low-resolution images, with the patches being encoded to derive sparse coefficients.

iii. Reconstruction: The coefficients are decoded using a dictionary built on high-resolution images, and corresponding high-resolution patches are reconstructed.

Dong et al. [6] published one of the first works to use convolutional neural networks for the SR problem. The authors claimed and demonstrated that the above solution pipeline can be realized by a deep convolutional neural network, where each of the three tasks — viz. Patch extraction and representation, Non-linear mapping, and Reconstruction — are learnt implicitly by convolutional layers of a deep learning network. They proposed a simple three layer CNN named as Super-Resolution Convolutional Neural Network (SRCNN) that directly learns each of the three tasks respectively, as shown in Figure 1.

The network takes a pre-interpolated image as an input and refines it further. The CNN based solution provides many advantages over the conventional dictionary-based method. It is comparatively faster, provides better reconstruction quality, and has performance that improves with more training data and deeper architecture. The authors also compared the effect of using different kernel sizes and number of layers, to analyze the effectiveness of the proposed schema.

This paper laid a cornerstone in the usage of deep learning methods to solve the SR problem, and inspired many other works in years to come.

B) Enhanced Deep Residual Networks for Single Image Super-Resolution [7]

After the success of deep neural network proposed by Dong et al. [6] in solving the SR problem, many researchers attempted to improve the network architecture and learning objectives to further improve the reconstruction quality. Meanwhile, deep convolutional networks witnessed a plethora of improvements and enhancements in terms of architecture, learning and optimization methods. With the introduction of residual connections [8] and the resulting possibility of training even deeper networks, many works [9][10] were proposed that used residual networks for solving the SR problem. However, the challenge in this was that employing deeper and wider networks resulted in high memory requirement and training instability, if used directly without any architectural modification.

Fig. 2 a) The schema of the residual block proposed in the paper. b) EDSR architecture

With the objectives of better reconstruction quality, stable convergence and lesser memory footprint in mind, Lim et al. [7] proposed a deeper and wider convolutional neural network, named as Enhanced Deep Residual Network for Image SR (EDSR) that employed modified residual blocks and achieved state-of-the-art performance for SR. These modified residual blocks as shown in Figure 2(a) underline two important changes by the authors:

i. Removing the Batch Normalization layers: Batch normalization layers reduce the range flexibility, and also increase the memory footprint by the same amount as their preceding convolutional layers

ii. Adopting residual scaling: Constant scaling layers are placed in each residual block to stabilize the training procedure.

The complete network architecture is shown in Figure 2(b). On the optimization front, the authors empirically suggest that using L1 loss provides better convergence and reconstruction quality than L2 loss. The authors also proposed a multi-scale model in the same paper which reconstructs various scales of HR images. With the power of modified residual blocks, deeper network and better loss function, the method won the NTIRE 2017 SR [11] challenge in the second track of the competition.

C) Image Super-Resolution Using Very Deep Residual Channel Attention Networks [12]

Although deeper CNNs have shown better reconstruction for the SR task, they come with a disadvantage of increased memory consumption and training instability. Zhang et al. noted that the existing deep learning-based solutions for SR treat channel-wise features equally, which can prohibit the representation capability and flexibility in separating the information that needs to be restored and the information that needs to be retained. SR can be thought of as an image restoration problem, where the lost high frequency information in the LR image needs to be restored by the model. Hence, it becomes natural to think of a technique that can pay attention to the high frequency components in the image and retain the original low frequency components already present in the LR image. Zhang et al. [12] proposed a Residual Channel Attention Network (RCAN) exactly on this basis.

Fig. 4 a) Residual CA Block. b) RCAN network architecture

The important features of this method are — 

a) Residual-in-residual (RIR) structure: RCAN mainly consists of three parts — shallow feature extractor (head), RIR module for deep feature extraction (body) and upscale module with reconstructor (tail). The core of the network is its body, which is an RIR module consisting of long and short skip connections, to help increase the network depth and still maintain training stability. The overall structure is as shown in Figure 4b).

b) Channel Attention (CA) for frequency rescaling: The authors are the first ones to apply Channel Attention method in the form of Squeeze-and-Excitation blocks for SR problem as shown in Figure 4a). The CA module consists of a global dimensionality reducer, followed by a series of convolutional layers that squeeze and excite features on channel dimension. This contraction and expansion of channel axis, helps to emphasize the features that are needed for better reconstruction.

As a result, the model outperformed all the previously proposed techniques with the advantage of having lesser parameters compared to most of the earlier network architectures. RCAN was also further used and modified for various other image restoration tasks [15], and the concept of Channel Attention is widely used till date in different domains [16][17].

Comparison: Performance v/s Memory consumption

Table 1 Performance comparison on different test datasets [7]

Figure 5 Performance v/s Model size comparison [7]

The success of recent networks not only demands better performance, but also feasible memory requirements for them to run on limited compute. RCAN is clearly better in both these aspects, as compared to the other two solutions. Further research in this domain continues to be targeted towards proposing methods that can achieve sharper, visually better-looking images while maintaining lesser memory footprint during training and inference.

Deep learning techniques for medical images

Over-exposure to radiation dosage, longer scan times, increased patient discomfort and scanner hardware constraints, are some of the prominent reasons that make developing super-resolution methods extremely important in the world of medical imaging. While researchers continued developing deep learning-based methods that were trained and evaluated on natural images, many works were also proposed to apply such methods with appropriate modifications on medical images.

This section covers some of the papers in this direction, which shows promising results for employing deep learning methods for SR in medical imaging.

A) Super-Resolution Musculoskeletal MRI Using Deep Learning [16]

Clinical musculoskeletal scans offer good in-plane resolution, but there is always a risk associated for missing subtle lesions in slices with a high section thickness. Given a thick slice image, Chaudhari et al. [16] proposed a deep learning-based method, named DeepResolve, that generates a thin-slice image with the same field of view and matrix size. Inspired from the solution proposed by Kim et al. [17] for natural images, the authors employ a similar network but replace the 2D convolutions with their 3D counterparts to capture additional spatial information.

This method takes in pre-interpolated volumes as input, thus running at a risk of increased memory footprint. The reconstructed high-resolution thin-slice images obtained from DeepResolve, outperform simple interpolation-based and other shallow CNN-based methods. The authors also conducted a reader study with two musculoskeletal radiologists, to review the diagnostic quality of the images obtained from DeepResolve. These images were consistently rated higher by radiologists even with a down sampling factor of three, which shows the efficacy of this method.

B) Efficient and Accurate MRI Super-Resolution using a GAN and 3D Multi-Level Densely Connected Network [18]

The advances in deep neural networks targeted towards solving the SR problem, mainly included architectural modifications or learning objectives. However plain adaptation of these methods for medical 3D volumes, increases memory consumption and training issues. Also, these methods are usually trained to optimize L1 loss or L2 loss, in order to achieve better performance metrics. But these metrics do not necessarily represent the visual quality of the resulting images. Chen et al. [18] targeted these two objectives and proposed a 3D multi-level densely connected network, abbreviated as m-DCSRN, that is light-weight, and produces sharper images with better perceptual quality.

In this paper, the authors proposed a 3D densely connected SR network for structural brain MR images. Inspired by the SR networks based on dense connections in natural image domain, the authors employed a similar network for 3D patches extracted from brain MR volumes. Figure 5 shows the network architecture, which consists of pairs of Dense blocks and Compressor modules. Moreover, this network was trained using a combination of pixel-based loss and adversarial loss. The features from all the former layers were made visible to the latter layers. During test time, the inference was done on patches which were then merged without overlapping. This combination of 3D dense network and a loss function targeted towards optimizing perceptual quality, leads to visually sharper images with lesser memory consumption.

Discussion

Deep-learning methods have become the de-facto choice for solving the SR problem. The field has evolved tremendously from simple non-linear mapping networks, to complex attention-based methods. Also, these methods have shown remarkable results in medical imaging. Not only do the resulting images have better image quality metrics, but they also have been picked up consistently by radiologists in the reader study conducted by the authors. With the limitations of radiation dosage, acquisition time, patient discomfort and hardware constraints, deep-learning methods offer an intelligent alternative to acquire images with high in-plane and/or through-plane resolution depending on the anatomy and purpose of the scan.

The methods suffer from generalization gap, as they tend to overfit to the blurring kernel used during the training process. Hence, if the images at the test time are blurred differently as compared to the training images, the performance of these models deteriorate [19]. The field is now moving towards solving this real-world SR problem, where the target blur kernel can be unknown [5] and more complicated than that used in the training simulation.

References

Chang, Hong, Dit-Yan Yeung, and Yimin Xiong. “Super-resolution through neighbor embedding.” Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004. Vol. 1. IEEE, 2004.
Bevilacqua, Marco, et al.. “Low-complexity single-image super-resolution based on nonnegative neighbor embedding.” (2012): 135–1.
Ha, Viet Khanh, et al.. “Deep learning based single image super-resolution: A survey.” International Conference on Brain Inspired Cognitive Systems. Springer, Cham, 2018.
Wang, Zhihao, Jian Chen, and Steven CH Hoi. “Deep learning for image super-resolution: A survey.” IEEE transactions on pattern analysis and machine intelligence (2020).
Liu, Anran, et al.. “Blind Image Super-Resolution: A Survey and Beyond.” arXiv preprint arXiv:2107.03055 (2021).
Dong, Chao, et al.. “Image super-resolution using deep convolutional networks.” IEEE transactions on pattern analysis and machine intelligence 38.2 (2015): 295–307.
Lim, Bee, et al.. “Enhanced deep residual networks for single image superresolution.” Proceedings of the IEEE conference on computer vision and pattern recognition workshops. 2017.
He, Kaiming, et al.. “Deep residual learning for image recognition.” Proceedings of the IEEE conference on computer vision and pattern recognition. 2016.
Kim, Jiwon, Jung Kwon Lee, and Kyoung Mu Lee. “Accurate image super-resolution using very deep convolutional networks.” Proceedings of the IEEE conference on computer vision and pattern recognition. 2016.
Kim, Jiwon, Jung Kwon Lee, and Kyoung Mu Lee. “Deeply-recursive convolutional network for image super-resolution.” Proceedings of the IEEE conference on computer vision and pattern recognition. 2016.
Timofte, Radu, et al.. “NTIRE 2017 challenge on single image super-resolution: Methods and results.” Proceedings of the IEEE conference on computer vision and pattern recognition workshops. 2017.
Zhang, Yulun, et al.. “Image super-resolution using very deep residual channel attention networks.” Proceedings of the European conference on computer vision (ECCV). 2018.
Zhang, Yulun, et al.. “Residual non-local attention networks for image restoration.” arXiv preprint arXiv:1903.10082 (2019).
Lee, Joonhyung, et al.. “Deep learning fast MRI using channel attention in magnitude domain.” 2020 IEEE 17th International Symposium on Biomedical Imaging (ISBI). IEEE, 2020.
Li, Hao, and Jianan Liu. “Edge, Structure and Texture Refinement for Retrospective High Quality MRI Restoration using Deep Learning.” arXiv preprint arXiv:2102.00325 (2021).
Chaudhari, Akshay S., et al.. “Super‐resolution musculoskeletal MRI using deep learning.” Magnetic resonance in medicine 80.5 (2018): 2139–2154.
Kim, Jiwon, Jung Kwon Lee, and Kyoung Mu Lee. “Accurate image super-resolution using very deep convolutional networks.” Proceedings of the IEEE conference on computer vision and pattern recognition. 2016.
Chen, Yuhua, et al.. “Efficient and accurate MRI super-resolution using a generative adversarial network and 3D multi-level densely connected network.” International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, Cham, 2018.
Cai, Jianrui, et al.. “Toward real-world single image super-resolution: A new benchmark and a new model.” Proceedings of the IEEE/CVF International Conference on Computer Vision. 2019.