Part 1: Backpropagation for Convolution with Strides

Loss gradient with respect to the input tensor

Mayank Kaushik
4 min readMay 2, 2019

There are several examples of Backpropagation with Convolution, but almost all of them assume a stride of 1. This article provides a visual example of Backpropagation with a stride > 1. Along the way, this hopefully exercise also provides some intuition for why the filter needs to be rotated for Backpropagation.

(Part 2 can be found here: Loss gradient with respect to the filter)

We’ll use the following example dimensions for our input, filter, and output tensors. Note that we’re using horizontal and vertical strides of 2 here.

Dimensions of the tensors we’re working with

To set things up, let us first express look at the forward propagation step to express the output pixels as functions of the input activations and filter contents (the weights). This, of course, is quite straightforward.

Forward propagation

Here’s what we need to do in the Backpropagation step: given the gradient of the loss with respect to the output pixels, we need to calculate the gradient of the loss with respect to the input activations.

Backpropagation of gradients

Each input x contributes to one or more output pixels (this is visualized in more detail below). Using the chain rule and partial differentiation principles, the total gradient of the loss with respect to each input pixel can be expressed in terms of gradients of each output pixel with respect to the input, multiplied by the gradient of the loss with respect to that output pixel:

Remember that the only output pixels y that appear in the above equation for a given x are the ones that x contributes to during forward propagation. This can be inferred from the following equations we calculated during forward propagation:

With this knowledge, let us calculate the gradients for a few example inputs step by step, before we see the full Backpropagation in action.

Example 1: the input pixel contributes to one output pixel
Example 2: different input pixel contributes to the same output pixel
Example 3: this input pixel contributes to two output pixels
Example 4: this input pixel contributes to all four output pixels

At first glance, it might be hard to see a pattern in the values of the these input gradients. But what if we modified the output gradient and filter tensors?

We’re going to make one modification to the output gradient tensor:

Pad the output gradient tensor with R-1 zeros at the top and bottom, and S-1 zeros to the left and right. Also dilate the output gradient pixels with stride_R-1 zeroes vertically, and stride_S-1 zeros horizontally. Remember that for our convolution, stride_R = stride_S = 2

We’re also going to make one modification to the filter:

Rotate the filter horizontally and vertically. This is the “flipping” of the filter involved in Backpropagation for Convolutions

With these modifications to the output gradient tensor and the filter, we’ll see that the values for the input gradient we calculated in Examples 1 to 4 above fit into a nice pattern:

It turns out that the Backpropagation operation is identical to a stride = 1 Convolution of a padded, dilated version of the output gradient tensor with a flipped version of the filter!

I hope this example helped you understand Backpropagation for Convolution with Strides.

In Part 2 we will go over calculating the gradient of the loss with respect to the filter (weights) tensor.

--

--