Journey Through the World of Image Super-Resolution: From Bicubic to Transformers

Abhiruchi
4 min readJul 28, 2024

--

If you’ve ever zoomed into a photo and wished it didn’t turn into a pixelated mess, you’re in the right place. Today, we’re going on a magical journey through the world of image super-resolution (SR), exploring how different models have tackled the challenge of turning low-res images into high-res wonders.

1. The Humble Beginnings: Bicubic Interpolation

Before diving into the deep learning models, let’s start with the OG of image upscaling — Bicubic Interpolation. This method has been around since the dawn of digital images. It’s a simple mathematical technique that estimates the pixel values of the upscaled image by considering the closest 16 pixels in the original image.

Pros:
- Super fast.
- Easy to implement.

Cons:
- Doesn’t add new details.
- Results in blurry images.

Think of it as spreading butter on toast. It gets the job done, but it doesn’t make your toast gourmet.

2. SRCNN: The First Deep Learning Model

Enter the Super-Resolution Convolutional Neural Network (SRCNN), which was introduced in 2014 by a group of researchers who thought, “Hey, let’s throw some neural networks at this problem!” SRCNN is like the granddaddy of all SR models.

SRCNN works in three steps:
1. Patch Extraction: It extracts small patches from the low-res image.
2. Non-linear Mapping: These patches are passed through a couple of convolutional layers to map them to high-res patches.
3. Reconstruction: Finally, the patches are stitched together to form the high-res image.

Pros:
- Better than traditional methods.
- Simple architecture.

Cons:
- Limited learning capacity.
- Not super detailed.

SRCNN was a game-changer. It showed that neural networks could indeed improve image quality beyond what bicubic interpolation could offer.

SRCNN

3. VDSR: Going Deeper

After SRCNN, researchers wanted more power. Enter Very Deep Super-Resolution (VDSR), introduced in 2016. VDSR went deeper (literally) with a network of 20 layers. It also introduced the concept of residual learning, which basically means the network learns to predict the difference between the low-res and high-res images.

Pros:
- High accuracy.
- Can handle large upscaling factors.

Cons:
- Slower training due to deep architecture.
- More computational resources required.

VDSR took SR to the next level, like upgrading from a bicycle to a motorbike.

VSRD

4. SRGAN: The Artistic Touch

If VDSR is a motorbike, the Super-Resolution Generative Adversarial Network (SRGAN) is a luxury sports car. Introduced in 2017, SRGAN brought GANs (Generative Adversarial Networks) into the mix. The magic of GANs lies in their two-part structure: a generator that creates images and a discriminator that evaluates them.

SRGAN’s generator tries to create realistic high-res images, while the discriminator tries to tell the difference between generated and real images. Over time, this back-and-forth game leads to super realistic images.

Pros:
- High perceptual quality (looks good to human eye)
- Produces sharp and detailed images.

Cons:
- Training is complex and unstable.
- Can produce artifacts.

SRGAN is like the artist who doesn’t just upscale your image but adds their artistic flair to make it look stunning.

SRGAN

5. Transformers: The New Frontier

Transformers, the newest kids on the block, have taken the deep learning world by storm. Originally designed for natural language processing, they’ve shown incredible promise in image processing tasks, including super-resolution. Vision Transformers (ViTs) break down the image into patches and process these patches in parallel, capturing long-range dependencies more effectively.

Pros:
- Handles large-scale images efficiently.
- Captures global context better.

Cons:
- Requires large amounts of data and computational power.
- Still in early stages for SR.

Transformers are like the futuristic tech that’s ready to revolutionize how we think about and process images.

SwinV2 Transformer for image super resolution

From the humble beginnings of bicubic interpolation to the cutting-edge transformers, the journey of image super-resolution has been nothing short of fascinating. Each model has contributed to making our digital world a bit clearer, one pixel at a time. Whether you’re a tech enthusiast or just someone who loves clear, sharp images, there’s no denying the magic behind these incredible models.

Stay tuned for more deep dives into the world of AI and machine learning. Until next time, keep those pixels sharp!

--

--