Adjusting images to any screen size with deep learning

Maxim Vakurin
Deelvin Machine Learning
4 min readSep 22, 2020

Introduction

New electronic gadgets enter the market every year with a seemingly endless variety of screen sizes, aspect ratios, and screen resolutions. The existence of such a wide variety of gadgets makes the art of creating visual content suitable for all devices extremely challenging. Even though challenging, it is no way an impossible task. There are network solutions which can deliver a satisfactory outcome. Today we will explore one such solution, the SinGAN network. Using SinGAN, programmers could create images of different sizes.

SinGAN is a generative adversarial network that generates content based on one input image with different aspect ratios. More importantly, it preserves the structure of the image in the form of its salient features.

An example of the network’s operation

Main characteristics

The SinGAN network is an improved GAN. However, there are some key differences. Let’s analyze them next.

  1. Generator and Discriminator Architectures

SinGAN developers used the Residual Leaning in Multi-Scale Generation approach. At first, the generator behaves like a regular GAN. Gaussian noise is inserted as an input which passes through the network and receives a picture. Importantly, we start with a picture of a small size, gradually increase it, and then transfer it to the input of the next generator block. At the second stage, as well as at all the subsequent stages, we add the generated image from the previous stage with noise and feed it to the network input, where we get a new image of a slightly bigger size. We continue to iterate the process until the desired picture size is reached.

Generator architecture

All generator blocks have a similar architecture. It consists of Gaussian distribution, Convolutional layer, Batch normalization, and Leaky Relu. The Discriminator architecture is similar to the generator architecture, with the only difference that it does not use noise. At each step of the generator, the generated picture is given to the discriminator, where it is compared with the original.

2. Loss function.

The loss function is determined by the formula:

where:

  • L_adv is a penalty function that calculates how much the generated image differs from the present one;
  • L_rec - an arbiter function responsible for ensuring that all important elements of the image are preserved.

It is crucial to make sure that the generated image is not too different from the original. For this, L_rec is used, where we supply zero for the place of the Gaussian noise for the generator, and the resulting image is subtracted from the original image and squared.

3. Training the network.

A feature of network SinGAN is that a training session comprises of just one image. The architecture has a pyramid structure, consisting of N layers. Thus, the model is trained from below upwards (starting from the bottom layer to the last upper layer). The learning process is divided into N stages. The image scale gradually increases from layer to layer.

Train architecture

The main steps of the SinGAN algorithm:

1) noise z_N is injected into the generator G_N to create the first generated sample x̃_N;

2) the size of the image x̃_N is increased and transferred to the next generator G_(N-1) along with the noise to create a new sample of the image x̃_(N-1);

3) operations 1 and 2 are repeated before G_0;

4) the image from each sample G is compared with the real image by the discriminator D.

As a result, a new image is created from the noise with the main features and attributes of the original image.

Results

At the initial stages of generating an image, the network receives Gaussian noise. The size of the generated image depends on the noise size. Thus, any number of images could be generated, from which one could then choose the best one.

Let’s try to make an image of 250 by 250 pixels from the original, increase it in width, for example, to 325 from 175, or in height from 175 to 325.

As a result of the network operation, we get 50 images of different widths and heights, from which we can now select the best ones and reproduce them with high quality on the screen of our choice.

--

--