How to deal with image resizing in Deep Learning

Published in

Neuronio

4 min readNov 28, 2018

This post continues the post published by Infosimples in 19/oct/2018: https://medium.com/infosimples/does-cnn-learns-modified-inputs-bc16ae1be498

TL;DR: The best way to deal with different sized images is to downscale them to match dimensions from the smallest image available.

If you read out last post, you know that CNNs are able to learn information from images even if its channels are flipped, over a cost in the model accuracy.

This post studies a similar problem: suppose each color channel has a different size. Which are the best ways to train an image classifier in those circunstancies?

First, let's create a simple model to serve as base for some comparisons that will be made in this article:

Layer                        Output Shape              Param #   
=================================================================
InputLayer                   (None, 100, 100, 3)       0         
_________________________________________________________________
Conv2D                       (None, 100, 100, 32)      896       
_________________________________________________________________
MaxPooling2D                 (None, 50, 50, 32)        0         
_________________________________________________________________
Dropout                      (None, 50, 50, 32)        0         
_________________________________________________________________
Conv2D                       (None, 50, 50, 64)        18496     
_________________________________________________________________
MaxPooling2D                 (None, 25, 25, 64)        0         
_________________________________________________________________
Dropout                      (None, 25, 25, 64)        0         
_________________________________________________________________
Flatten                      (None, 40000)             0         
_________________________________________________________________
Dense                        (None, 128)               5120128   
_________________________________________________________________
Dropout                      (None, 128)               0         
_________________________________________________________________
Dense                        (None, 2)                 258       
=================================================================

It's a simple model, able to tell dog pictures apart from non-dog pictures, with only two convolutions. After training it for 10 epochs (using complete 3-channel images, 100x100 pixels), the results are:

The maximum validation accuracy value of 77.58% will be used as reference to the next experiments in this post.

Scaling techniques

We all know that an image loses quality when you apply zoom to it. When you put a small quantity of pixels in a screen with higher resolution, it is necessary to "create" new pixels, so they are able to occupy the holes that would appear. There are many techniques that can do this:

Original picture (160x160) — Nearest-neighbor interpolation — Bilinear interpolation

Bicubic interpolation — Fourier-based interpolation — Edge-directed interpolation algorithms

Each one of those images was downscaled to 40x40 and then upscaled back to 160x160, using each one of the scaling algorithms above. Although we lost a lot of the visual quality, we are still able to perceive that this is a shell picture, even if we have 1/16 of the information we had before.

And what about Neural Networks? Which upscaling algorithm is better for using? Or would we rather downscale the pictures? Let's put an end to this doubt.

Below, we have channel slices and combinations of them using different upscaling algorithms:

We can also test the following architecture, able to reduce bigger channels during training with convolutions:

Let's call this architecture “Multiresolution CNN”

The above architecture was develop with the idea that convolutions are able reduce the channels dimensions, while extracting only the most important features. You can check it in here:

adrianodennanni/multiresolution-cnn

Contribute to adrianodennanni/multiresolution-cnn development by creating an account on GitHub.

github.com

After training the simple neural network presented in the beginning of this post with many upscaling techniques, we got the following accuracy rates:

If we take in consideration only the validation dataset accuracy, we can conclude that any upscaling technique is inferior to downscaling images to the size of the smallest one. The best thing to do in this case is to just downscale the pictures to match the smallest channel dimensions.

The full source code to this experiment can be found here:

adrianodennanni/multiresolution-cnn

Contribute to adrianodennanni/multiresolution-cnn development by creating an account on GitHub.

github.com

The portuguese version of this post can be accessed here: https://medium.com/infosimples-br/como-lidar-com-redimensionamento-de-imagens-em-deep-learning-f4215b30f57a

How to deal with image resizing in Deep Learning

Scaling techniques

adrianodennanni/multiresolution-cnn

Contribute to adrianodennanni/multiresolution-cnn development by creating an account on GitHub.

adrianodennanni/multiresolution-cnn

Contribute to adrianodennanni/multiresolution-cnn development by creating an account on GitHub.

Written by Adriano Dennanni