IMAGE SUPER RESOLUTION and its application in security and surveillance

VS_Mourya
5 min readNov 1, 2019

--

Hey guys, this blog post will walk through our project “Image Super-Resolution and its application in security and surveillance”. Sit back, grab a cup of warm coffee and read away! Hope you will enjoy the journey.

Example based single picture Super-Resolution (SR) centers for the reconstructing of rich subtleties (high frequencies) in an image subject to a lot of prior models with Low Resolution(LR) and comparing High Resolution(HR) pictures. Most techniques for super-resolution depend on a similar idea: to produce one upsized/upscaled picture utilizing information from a few particular pictures. Our project is to improve and expand the neural systems ‘ upscaling (Super-Resolution) authority by implementing profound neural network architecture/GAN relying upon the system’s results and utilizing it for genuine real-world applications. In spite of the achievements in accuracy and speed of single picture Super Resolution using faster and progressively more profound convolutional neural systems, one central issue still stays unsolved: recuperate the better surface subtleties when we super-resolve at large up-scaling factors?

The direct enhancement based Super Resolution methods is basically dictated by the choice of the objective function. Late work has, as it were, focused on restricting the mean squared reconstruction error. The consequent appraisals have high pinnacle signal-to-noise proportions, yet they are consistently deficient concerning high-repeat nuances and are perceptually unsatisfactory as they disregard to organize the exactness expected at the higher resolution. In this undertaking, we present SR-GAN, a generative antagonistic network(GAN) for picture Super Resolution(SR). To the extent anybody is concerned, it is one of the essential framework prepared for inferring photo sensible basic pictures for 4x upscaling segments. To achieve this, we propose a perceptual loss function that involves an adversarial loss and a content loss. We have evaluated a wide range of structures and various strategies like pix2pix, cycle GANs yet saw the above strategy as the most exact for Super-Resolution. With our application increasingly centered around number plate recognition, we have prepared this model with the number plates information to increase the resolution and afterward pass the High-Resolution picture into the tesseract model for Object Character Recognition (OCR). The whole above thought would be additionally improved and teamed up with another model for a Video Super Resolution. This incorporates limiting the number plate, Super Resolution of the restricted number plate and performing out an OCR.

Our project is aimed at showing the direction to some of the undiscovered/unimplemented problems, few of which are listed below:

CCTV cameras: Upscaling the video of a traffic CCTV footage will help us to easily identify the number plate of the vehicle. They can be also used to detect faces with more accuracy and resolution in case of a bank robbery, making the identification of the thief easier.

YouTube video upscaling: YouTube provides us with the choice to downsample our video resolution ( ex: 480p to 240p) but does not have an option to upscale our present video resolution( ex: 480p to 1080p ).

These are only a portion of the examples that we experience in our day by day lives. Be that as it may, the center thought of our project attempts to take care of significantly more issues. Computer vision is one more blasting field as of late. Utilization of Computer vision for self-driving vehicles, recognizing people dependent on their facial highlights, satellite imaging, galactic distinguishing in space, and so forth. are different fields where our thought can be valuable. We have used Machine Learning to solve this issue of interest. Our goal was to target images to reconstruct them into Higher Resolution pictures.

Concentrating on the Super-resolution of a picture/video, we draw in plenty of customers particularly the individuals who are from the system computational power enhancer companies such as Nvidia as this technique of increasing the dimensionality of the image/video is computationally very expensive and demands high GPU power. Coordinating this system with the most recent of the computerized innovation, for example, the versatile camera or DSLR cameras will amplify the image quality, which will open gateways to an entirely different component of unpixellated and delightful pictures. We accept that camera organizations, for example, Nikon, Canon and so on are the clients and the individuals are the end-users.

How will we measure the success/outcome/quality of your project?

Although this project has plenty of predictions to make, we at first decreased the resolution of a picture and converted it into a low resolution and place it in the model. When the model runs, it produces a high-resolution picture that would then be able to be contrasted and the first high-resolution video. This gives us a measurement to assess the model’s rightness on how close it can recover the original Image.

How did we do it?

Below is an algorithm of SR GAN.

SRGAN is more appealing to a human with more details compared with the similar design without GAN (SRResNet). During the training, A high-resolution image (HR) is downsampled to a low-resolution image (LR). A GAN generator upsamples LR images to super-resolution images (SR). We use a discriminator to distinguish the HR images and backpropagate the GAN loss to train the discriminator and the generator.

Network design for the generator and the discriminator, mostly composes of convolution layers, batch normalization and parameterized ReLU (PRelU).

We have used the VGG 19 network architecture for our project.

Conv 2d Block:

img = Input(shape=self.shape_hr)

x = conv2d_block(img, filters, bn=False)

x = conv2d_block(x, filters, strides=2)

x = conv2d_block(x, filters*2)

x = conv2d_block(x, filters*2, strides=2)

x = conv2d_block(x, filters*4)

x = conv2d_block(x, filters*4, strides=2)

x = conv2d_block(x, filters*8)

x = conv2d_block(x, filters*8, strides=2)

x = Dense(filters*16)(x)

x = LeakyReLU(alpha=0.2)(x)

x = Dense(1, activation=’sigmoid’)(x)

What can be done in the future ?

Our end goal of the project was to achieve an accurate Space-Time Super Resolution model which can be able to work with videos too. But a video is just a set of frames that is in sync with the frames and merged. Therefore, by intuition, for a video Super Resolution to work, we need to have an accurate Image Super-Resolution to work. Therefore, we have dedicated most of our time in increasing the accuracy of Image Super Resolution. And we have achieved some good results on the same. Now, we are currently working on a way to merge our model with the Tesseract OCR model for doing the OCR on high-resolution Images. Therefore, now, we wish to continue our work with our mentor and apply this Image Super Resolution to videos. However, this is a computationally expensive task and we are eager to work on this.

Thank you for reading 😊

Have a nice day.

--

--