CNNs: Padding and Stride

pju
6 min read6 days ago

What do you know about padding and stride in CNNs

I) Padding:

  1. Why we need padding?

When the CNNs are applied on the image, there are 2 main inconvenient:

— It'll shrink after each layer, meaning that the image will become very small if the CNNs is so deep(for example, 100 layers). For example, if you take a 6×6 image pass through a Convolution layer, you can see the size of this is 4×4. There is an equation for this size reduction:

                            m=(n-f+1)x(n-f+1)
m: size of the image after the layer
n: size of the initial image
f: size of the filter(kernel)
Figure 1: n=6x6, f=3×3 — — → m=(6–3+1)x(6–3+1)=4x4

— The second one is losing the information from the edge of the image. The pixels which is in the middle of the image will be overlapped (means that will be used multiple times) meanwhile the pixel is near the edges of the image will be used once. This will lead to the information in these pixels disappear after a few layers.

Figure 2: The pixel in green which near the edge of the image is used once.
Figure 3: The pixel in red is used for multiple times by different kernels.

2. Solutions:

Figure 4: Definition in real life
Figure 5: Padding in CNNs

— I think the meaning of the 'padding' in real life in a real life and in the CNNs are almost the same. Both means the thing which is put outside to protect the insider thing. In CNNs, we just surrounded by one border with 'zeros'

Figure 6: By convention, the 'padding' is full of zero.

2.1) Solving the first problem: Shrink output

— There are one essential parameters in the padding: p (padding amount), in this case p=1(the width of the 'padding' border). With the help of the 'padding', the size of the image after convolution is:

                        m'=n+2p-f+1 x n+2p-f+1
m':size of the image with the help of 'padding' after convolution
p: padding amount(in this case p=1)
f: size of the kernel
n: size of the initial image

After the padding the size of the initial image is 8×8 and after the convolution layer the size is 6×6(which is not shrunk).

p=1,f=3x3,n=6x6
m'=6+2x1-3+1x6+2x1-3+1
m'=6x6

So the first problem is solved. Do the second problem is solved by this method?

2.2) Solving the second problem: Losing the information of the pixels near the edge of image:

Figure 7: Convolution with padding

With the pixel in the top-left corner (which is quite near the edge of the image) is retained after the convolution and now the pixel of 'losing-information' is padding border (which is meaningless). And as result, the second issue is resolved.

3) Some external definitions:

— Valid Convolutions: The convolutions without the help of 'padding'. This is the one which occurs the problem. The size of the output will smaller the size of the input

— Same Convolutions: The convolutions with the help of 'padding'. The size of the output will remain the same as the output.

— If you want to find p ( padding amount), you can use this equation:

                    n+2p-f+1=n =>p=(f-1)/2

— Normally, in computer vision, f (size of the kernel) is odd(for example 3×3 or 5×5). When f is even, we will use the asymmetric padding(padding with 'more on the left' or 'more on the right'). Moreover, when we use f=odd number we will have the central point which is to identify the position of the filters.

Figure 8: Central point of the pad

4)Implement on TensorFlow lib :

On the TensorFlow lib, we will tune the 'padding' parameter:

import tensorflow as tf
from tensorflow.keras.layers import Conv2D
from tensorflow.keras.models import Sequential

model=Sequential()
model.add(Conv2d(filters=32,
kernel_size=(3,3),# parameter f
padding='same',
activation='relu'))
# in this case we modify padding='same' to same convolutions
# padding='valid' to valid convolutions
model.summary()

II) Strided convolution:

  1. Definition:

— Stride convolution is a technique used in (CNNs) to reduce the spatial dimensions of the input data. It involves applying a convolution operation with a larger stride value, which means the filter moves across the input data by skipping some positions.

— Stride convolution can be used to extract high-level features from the input data. By skipping some positions, the network focuses on capturing more abstract and global patterns in the data.

— Stride convolution can reduce the spatial dimensions of the input data, which can be useful in reducing the number of parameters in the network and preventing overfitting.

2. How to do it?

Convolution of m=7x7 and kernel f=3x3

— In essence, the kernel (blue window) will pass through all the image with steps=1. But with the parameter 'stride' we can modify the steps(for example stride=2). This modification will affect on horizontal axis and vertical one.

Image with stride=2
Image with stride(step)=2 (Horizontal axis)
Image with stride(step)=2 (Vertical axis)
This is the result of strided convolution

With the strided convolution, the equation to calculate the dimensions of the image after convolution is modified with a parameter s(stride)

                    M= ((n+2p-f)/s)+1  x ((n+2p-f)/s)+1
n: size of the initial image
p: padding amount
f: kernel size
s: stride (step)
M: size of the image after convolution

In the following example,

n=7
p=0
f=3
s=2
====>M=((7+2*0-3)/2)+1=3

3) Implement on Tensorflow:

import tensorflow as tf
from tensorflow.keras.layers import Conv2D
from tensorflow.keras.models import Sequential

model=Sequential()
model.add(Conv2d(filters=32,
kernel_size=(3,3),# parameter f
padding='same', # padding
stride=2 # stride (steps)
activation='relu'))
# in this case we modify stride=2 meaning that step=2 on the horizontal and vertical axis
model.summary()

III) Conclusion:

— In reality, the tuning of ‘padding’ parameter is quite simple. But to deeply understand it is also necessary. So I write this blog to help every one can understand the parameter ‘padding’.

— Strided convolution is essential technique to reduce the spaital dimensions of the image and extracting the essential features in the image.

Reference:

This blog is highly inspired by the course Deep learning Specialization of Andrew Ng.

--

--