Significance of Kernel size

Anuja Ihare

Published in

Analytics Vidhya

4 min readNov 5, 2020

Why the kernel size should be odd? What happens if we use even kernel size?

Before we start with significance of kernel size lets first revisit what kernel really is.

In Fully-connected network every vector element in the input is connected to every hidden unit in first layer. So you connect every unit to every other unit.

So if you have 11x11 image then there would be 121 connections to a single unit and in practice there would be 363 connections. Since every image has 3 channels.

In locally-connected network we use a kernel/filter. Kernel is a part of image that a unit can see. So we can say that kernel is like a window. But here the windows may differ i.e the weight of the kernel may not be same.

Convolution network is similar to locally connected network except that there is weight sharing in kernel.

The kernel keep sliding across the image to generate output. The generated output is called feature map.

How does convolution work?

The intent of convolution is to encode source data matrix i.e entire image in terms of a filter or kernel. More specifically, we are trying to encode the pixels in the neighborhood of anchor/source pixels

Therefore, whatever this number of pixels maybe, the length of each side of our symmetrically shaped kernel is 2*n+1 (each side of the anchor + the anchor pixel), and therefore filter/kernels are always odd sized.

What about the boundary/edges ?

There are different ways of dealing with this:

Ignore it
Add zero padding
Mirror reflect the image

Typically, the pooling operation (average pooling or max pooling) will remove your boundary artifacts anyway.

If you are not going to compute an inverse operation, i.e. de-convolution, and are not interested in perfect reconstruction of original image, then you don’t care about either loss of information or injection of noise due to the boundary problem.

Why kernel size is odd ?

For an odd-sized filter, all the previous layer pixels would be symmetrical around the output pixel. Without this symmetry, we will have to account for distortions across the layers.

Why not even sized kernel ?

It is possible to use even size kernel but you might have to suffer with aliasing error.

Aliasing is distortion caused in original input. Mostly when we down sample in signal processing this is solved by low-pass filtering but this cannot be inserted in deep networks as it degrades performance.

Why 3x3 ?

The number of parameters grows quadratically with kernel size. This makes big convolution kernels not cost efficient enough.

Limiting the number of parameters, we are limiting the number of unrelated features possible.

This forces Machine Learning algorithm to learn features common to different situations and so to generalize better.

Hence common choice is to keep the kernel size at 3x3 or 5x5.

We would like to use smaller odd-sized kernel filters. But, 1x1 is eliminated from the list candidate optimal filter sizes as the features extracted would be fine grained and local, with no information from the neighboring pixels. Also, it is not really doing any useful feature extraction

Also 1x1 filter is used in bottleneck conditions to reduce the number of channels and maintain the size of feature map.

I’m an Electronics and Telecommunications Engineer. I find Data Science fascinating which is why I decided to study Machine learning and Big Data Analytics and currently working as AI Engineer. I hope to contribute to this growing Data Science Society. You can connect with me on LinkedIn.

Significance of Kernel size

How does convolution work?

Written by Anuja Ihare