Zero-Padding in Convolutional Neural Networks

3 min readSep 21, 2021

Introduction

Hello everyone! In this post, we are going to learn about what is Zero-Padding and why we are using Convolutional Neural Networks. Before we dive into padding let’s discuss the kernel. The kernel is the neural networks filter that moves across the image, scanning each pixel and converting the data into a smaller, or sometimes larger, format. To assist the kernel with processing the image, padding is added to the frame of the image to allow for more space for the kernel to cover the image. Adding padding to an image processed by a CNN allows for a more accurate analysis of images.

Convolutions Reduce Channel Dimensions

The convolutional layer has several filters and those are applied to an input to create a feature map that summarizes the presence of detected features in the input. When a filter convolves a given input channel, it gives us an output channel. This output channel is a matrix of pixels with the values that were computed during the convolutions that occurred on the input channel. When this happens, the dimensions of our image are reduced. If zero-padding = 1, there will be one pixel thick around the original image with pixel value = 0.

Zero-padding vs. scaling up using interpolation

There are two approaches to resize smaller images up to the fixed size: zero-padding and scaling them up (zooming in) using interpolation. Zero-padding has two advantages in comparison with scaling. The first advantage is that while scaling carries the risk of deforming the patterns in the image, padding does not. The second advantage of zero-padding is that it speeds up the calculations, in comparison with scaling, resulting in better computational efficiency. The reason is that neighboring zero input units (pixels) will not activate their corresponding convolutional unit in the next layer. Therefore, the synaptic weights on outgoing links from input units do not need to be updated if they contain a zero value. This is like a dropout that only concerns border pixels in the input layer. This advantage will be lost if smaller images are enlarged by increasing their resolution (scaling) rather than zero-padding.

How to calculate optimal zero-padding for convolutional neural networks?

The possible values for the padding size, P, depends on the input size, the filter size F, and the stride S. We assume width and height are the same. What you need to ensure is that the output size, (W−F+2P)/S+1, is an integer. When S=1 then you get your first equation P=(F−1)/2 as a necessary condition. But, in general, you need to consider the three parameters, namely W, F, and S to determine valid values of P.

Conclusion

Convolutional neural networks do not learn a single filter; they, in fact, learn multiple features in parallel for a given input. It is common for a convolutional layer to learn from 32 to 512 filters in parallel for a given input. We now understood what zero paddings is, what it achieves when we add it to our CNN.

Zero-Padding in Convolutional Neural Networks

Written by Dharmaraj