About Convolutional Layer and Convolution Kernel

A story of Convnet in machine learning from the perspective of kernel sizes

Arnault Chazareix
Sicara's blog
3 min readOct 31, 2018

--

Read the full article on Sicara’s blog here.

What kernel size should I use to optimize my Convolutional layers? Let’s have a look at some convolution kernels used to improve Convnets.

Warning: This post assumes you know some basics of Machine Learning mainly with Convnets and Convolutional layers. If you don’t, check a tutorial like this one from Irhum Shafkat.

Convolutional Layers and Convnets have been around since the 1990s. They have been popularized by the ILSVRC challenge (ImageNet),a huge image recognition challenge. To win this challenge, data scientists have created a lot of different types of convolutions.

Today, I would like to tackle convolutional layers from a different perspective, which I have noticed in the ImageNet challenge. I want to focus on the kernel size and how data scientists managed to reduce the weight of their Convnets while making them deeper.

Why do weights matter? This is what we will be trying to answer first by comparing convolutional layers with fully connected ones. The next goal is tackling the question what should my kernel size be? Then we will see other convolution kernel tricks using the previously acquired ideas and how they improved Convnets and Machine Learning.

Fully Connected vs Convolutional Layers

Some properties of local features

Convolutional layers are not better at detecting spatial features than fully connected layers.
What this means is that no matter the feature a convolutional layer can learn, a fully connected layer could learn it too.
In his article, Irhum Shafkat takes the example of a 4x4 to a 2x2 image with 1 channel by a fully connected layer:

Fully connected kernel for a flattened 4x4 input and 2x2 output

We can mock a 3x3 convolution kernel with the corresponding fully connected kernel: we add equality and nullity constraints to the parameters.

The fully connected equivalent of a 3x3 convolution kernel for a flattened 4x4 input and 2x2 output

The dense kernel can take the values of the 3x3 convolutional kernel.
This is still the case with larger input and output vectors, and with more than one input and output channel.
Even more interesting, this is the case with a 3x3, a 5x5 or any relevant kernel sizes.

Note: This requires the network inputs to be of the same size. Most Convnets use fully connected at the end anyway or have a fixed number of outputs.

So basically, a fully connected layer can do as well as any convolutional layer at any time. Well … then a fully connected layer is better than a convolutional layer at feature detection?

… Read the full article on Sicara’s blog here.

--

--