Photo by Adrian Rosebrock on pyimagesearch

Explained like you are 5: Image Super-Resolution Part 2

Published in

Developer Community SASTRA

7 min readMay 10, 2023

The Part 2 of this two-part series demonstrates how to build a simple ISR model. The theoretical concepts are discussed in Part 1, which you can refer here.

Image super-resolution involves building a model capable of generating high-resolution images from low-resolution input images. By leveraging techniques such as Residual Blocks and Depth-to-Space operations, it is possible to build a powerful image super-resolution model that can significantly enhance the quality and visual appeal of images. With the help of frameworks such as TensorFlow and Keras, we can easily build such models.

Click here to view the dataset used for this task. To those of you unfamiliar with Kaggle, To start coding use the new notebook option. Under accelerators, make sure you select T4 x 2 or P100.

Click here to view my notebook which i used to explain my code.

Goal
Dataset
Importing necessary libraries and data
Creating model architecture
Compiling and training
Results and testing
Scope for further improvement

Goal

Our goal is to build a very lightweight 3x ISR model. The original SRGAN implementation had total parameters of 16.8 million. We aim to build a model with less than half a million parameters, without compromising performance.

Dataset

The dataset contains a total of 4 folders. Click here to view the dataset.

High-resolution training: 3500, 510 x 510 RGB images. They will serve as the ground truths for generated images
Low-resolution training: 3500, 170 x 170 RGB images. They will serve as the inputs to our model
High resolution valid : 1000, 510 x 510 RGB images for validation
Low resolution valid : 1000, 170 x 170 RGB images for validation

Importing necessary libraries and data

Importing the libraries

import numpy as np
from PIL import Image
import tensorflow as tf
import matplotlib.pyplot as plt
from tensorflow.keras.models import Model
from tensorflow.keras.optimizers import Adam
from tensorflow.image import ssim_multiscale
from tensorflow.nn import relu,depth_to_space
from keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.layers import Conv2D, Input, Add, Lambda,UpSampling2D, Dense

PIL is an image processing library, which we use further up in our code to load and display images, The rest of the libraries and functions are pretty standard ones. We can use the Add layer to implement skip connections, and ssim_multiscale as a metric.

Importing the dataset

lr_dir = "/kaggle/input/img-superres/AnalyticsArena_DataSet/lowres"
hr_dir = "/kaggle/input/img-superres/AnalyticsArena_DataSet/highres"

lr_datagen = tf.keras.preprocessing.image.ImageDataGenerator()

hr_datagen = tf.keras.preprocessing.image.ImageDataGenerator()

batch_size = 32
target_size = (510, 510)

lr_iterator = lr_datagen.flow_from_directory(
    directory=lr_dir,
    target_size=(170,170),
    batch_size=1,
    class_mode=None,
    shuffle=False
)

hr_iterator = hr_datagen.flow_from_directory(
    directory=hr_dir,
    target_size=(510,510),
    batch_size=1,
    class_mode=None,
    shuffle=False
)
iterator = zip(lr_iterator, hr_iterator)

Why data generators ?

There is a significant difference between using a data generator and loading images manually. Each high-resolution image is of size 510 x 510 x 3. Assuming each integer is 64 bits, we have a total of 510 x 510 x 3 x 64 bits per image, and there are 3500 such images. So overall it adds up to a size of 510 x 510 x 3 x 64 x 3500 bits, which is around 7 GB! Note that all of this is loaded onto the RAM, The program will eventually run out of memory if we load each image. So what else can we do?

We can use an image data generator. They generate batches of data from the directory on the go during training, rather than importing them all at once. These are memory efficient. But how do we use Image data generators to load images without any class and how do we set the labels to another set of images?

To tackle the first issue we can set the class_mode parameter of the DataGenerator to false, this makes sure that just the images are iterated.

To tackle the second issue we can point to another iterator to load the high-resolution images and zip the low-res iterator and high-res iterator.

If we pass an iterable like zip as an argument to model.fit, then it unpacks the iterable and assigns x to index[0] and y to index [1] of this iterable, which is what we want.

Note, we cannot explicitly specify x and y if we are using data generators so we had to use an iterable like zip.

Note, I have not used any form of augmentation, feel free to experiment with different augmentations.

Loading validation data

lrv_dir = "/kaggle/input/img-superres/AnalyticsArena_DataSet/lowresvalid"
hrv_dir = "/kaggle/input/img-superres/AnalyticsArena_DataSet/highresvalid"

lrv_datagen = tf.keras.preprocessing.image.ImageDataGenerator()

hrv_datagen = tf.keras.preprocessing.image.ImageDataGenerator()

batch_size = 32
target_size = (510, 510)

lrv_iterator = lrv_datagen.flow_from_directory(
    directory=lrv_dir,
    target_size=(170,170),
    batch_size=1,
    class_mode=None,
    shuffle=False
)

hrv_iterator = hrv_datagen.flow_from_directory(
    directory=hrv_dir,
    target_size=(510,510),
    batch_size=1,
    class_mode=None,
    shuffle=False
)
iteratorv = zip(lrv_iterator, hrv_iterator)

Creating the model architecture

Define the ResBlock

def ResBlock(input_layer, filters):
    x = Conv2D(filters=filters, kernel_size=3, padding='same', activation='relu')(input_layer)
    x = Conv2D(filters=filters, kernel_size=3, padding='same')(x)
    x = Add()([input_layer, x])
    return x

In our ResBlock implementation, there are just 2 CNNs and a skip connection from the inputs to the output. The Add Layer adds the CNN inputs to the outputs of the CNN, basically creating the skip connection that we require.

Define the Input Layer

def InterConnected(input_shape):
    input_tensor = Input(shape=input_shape)

Add a Convolution layer, 2 ResBlocks and once again a Convolution layer

def InterConnected(input_shape):
    input_tensor = Input(shape=input_shape)  
    x = Conv2D(filters=32, kernel_size=3, padding='same', activation='relu')(input_tensor)
    for i in range(2):
        x = ResBlock(x, 32)
    x = Conv2D(filters=32*9, kernel_size=3, padding='same')(x)

By now the Output shape of x will be (170,170,288)

Depth-to-space layer

The depth_to_space operation is a common technique used in image super-resolution models to convert the depth dimension of the feature maps into spatial dimensions, effectively increasing the resolution of the image.

x = Lambda(lambda x: tf.nn.depth_to_space(x, block_size=3))(x)

The block size parameter specifies the spatial resolution of the output, with a value of 3 indicating that the output will have 3 times the spatial resolution of the input. Thus 3x upscaling. By using this operation, we can increase the resolution of the feature maps output by the Residual Blocks.
The Output shape of the depth-to-space layer will be ( 170*3 , 170*3 , 288/(3*3)) which is (510,510,32).

Output Layer

A CNN can be used to convert the 510 x 510 x 32 into 510 x 510 x 3, which is the shape of the final RGB output that we require. By setting the number of filters to 3, we can reduce the depth from 32 to 3, without altering the resolution.

output_tensor = Conv2D(filters=3, kernel_size=1)(x)

Combining all,

def ms_ssim(y_true, y_pred):
    y_true = tf.cast(y_true, tf.float32) / 255.0
    y_pred = tf.cast(y_pred, tf.float32) / 255.0
    ms_ssim = tf.reduce_mean(ssim_multiscale(y_true, y_pred, max_val=1.0))
    loss = 1 - ms_ssim
    return loss
def ResBlock(input_layer, filters):
    x = Conv2D(filters=filters, kernel_size=3, padding='same', activation='relu')(input_layer)
    x = Conv2D(filters=filters, kernel_size=3, padding='same')(x)
    x = Add()([input_layer, x])
    return x
def InterConnected(input_shape):
    input_tensor = Input(shape=input_shape)
    x = Conv2D(filters=32, kernel_size=3, padding='same', activation='relu')(input_tensor)
    for i in range(2):
        x = ResBlock(x, 32)
    x = Conv2D(filters=32*9, kernel_size=3, padding='same')(x)
    x = Lambda(lambda x: tf.nn.depth_to_space(x, block_size=3))(x)
    output_tensor = Conv2D(filters=3, kernel_size=1)(x)
    model = Model(inputs=input_tensor, outputs=output_tensor)
    print(model.summary())
    return model
model=InterConnected((170,170,3))

Compiling and Training the model

Creating the ms ssim metric

def ms_ssim(y_true, y_pred):
    y_true = tf.cast(y_true, tf.float32) / 255.0
    y_pred = tf.cast(y_pred, tf.float32) / 255.0
    ms_ssim = tf.reduce_mean(ssim_multiscale(y_true, y_pred, max_val=1.0))
    loss = 1 - ms_ssim
    return loss

Note that I’m returning 1-ms_ssim which can be used as a loss function too. If ms_ssim is 1, then it means the images are perfectly similar, thus if the value of 1-ms_ssim is 0 then the images are perfectly similar.

I’m using MSE as the loss and ms_ssim as a metric to evaluate progress. For simplicity’s sake, I have not included early stopping and just adjusted the learning rate hyperparameter.

model.compile(optimizer=Adam(lr=1e-5), loss='mse', metrics=[ms_ssim,'mse'])
model.fit(iterator,validation_data=iteratorv,epochs=20,steps_per_epoch=3500,verbose=1,validation_steps=750)

I ran the training for 20 epochs, each epoch took me around 2 minutes to complete with a T4 x 2 GPU accelerator.

Results and testing

Model evaluation

The final validation mse was around 58, indicating on average pixel values had a difference of only ~7.5 w.r.t to the original image’s pixels. Also, 1-ms_ssim was 0.038 or ms ssim was 0.962 which means the generated images were perceptually similar to the original images. Thus our model can be considered good for ISR tasks.
Our primary goal was to create a model with 0.5 M parameters but we ended up creating a model with 0.1M parameters which is 16800% less in size as compared to SRGAN!

Sample Input-Output

img=Image.open("/kaggle/input/img-superres/AnalyticsArena_DataSet/lowresvalid/LowResolution_3x_Valid/Img_down3503.jpg")
arr=np.array(img)
pred=model.predict(arr.reshape(1,170,170,3))
pred=pred.astype(np.uint8)
plt.imshow(pred[0])

We are using PIL to read an image, convert it to a numpy array, and then pass it to the super-resolution model for prediction. Since pixel values are 8-bit integers in the range of 0 to 255, we are converting the floating point array to an integer array.

Generated 510 x 510 image

Original 170 x 170 image

We can see that the hair, glasses, teeth, etc have their qualities improved by a huge margin! there are just a few random noise pixels in her hand, Apart from that the ISR model gave a very good result!

Scope for further improvements

Since our goal was to build as lightweight a model as possible, we did not include architectures like GAN. But we can use our same model as a generator to a GAN-based model, and add a simple binary image classifier CNN as the discriminator.
Different error functions like PSNR can also be implemented.
Training images can be augmented by adding random noise, blacking out random pixels, etc.
Can experiment with different hyperparameters, more epochs, more ResBlocks etc.

Final note

I hope that my two blogs on Image Super Resolution were informative and you learnt something new from them! Feel free to reach out to me at my email if you have any doubts/queries. Thank you!