Part 2: Backpropagation for Convolution with Strides

Loss gradient with respect to the filter (weight) tensor

Mayank Kaushik
3 min readMay 3, 2019

There are several examples of Backpropagation with Convolution, but almost all of them assume a stride of 1. This article provides a visual example of Backpropagation with a stride > 1. Along the way, this hopefully exercise also provides some intuition for why the filter needs to be rotated for Backpropagation.

(Part 1 can be found here: Loss gradient with respect to the Input)

We’ll use the following example dimensions for our input, filter, and output tensors. Note that we’re using horizontal and vertical strides of 2 here.

Dimensions of the tensors we’re working with

To set things up, let us first express look at the forward propagation step to express the output pixels as functions of the input activations and filter contents (the weights). This, of course, is quite straightforward.

Forward propagation

Here’s what we need to do in the Backpropagation step: given the gradient of the loss with respect to the output pixels, we need to calculate the gradient of the loss with respect to the filter pixels (aka the weight tensor).

Backpropagation of gradients

Unlike the input pixels x that may only contribute to a few output pixels, each filter pixel f in our example contributes to all output pixels (this is visualized in more detail below). Using the chain rule and partial differentiation principles, the total gradient of the loss with respect to each filter pixel can be expressed in terms of gradients of each output pixel with respect to the filter, multiplied by the gradient of the loss with respect to that output pixel:

We’ll examine the following equations that were generated during forward propagation to determine the filter gradients.

With this knowledge, let us calculate the gradients for a few example filter pixels step by step, before we see the full Backpropagation in action.

Example 1
Example 2
Example 3
Example 4

To uncover the underlying pattern in the values of these filter gradients, we’ll need to modify the output gradient tensor.

Dilate the output gradient pixels with stride_R — 1 zeroes vertically, and stride_S — 1 zeros horizontally.

With this modification to the output gradient tensor, we’ll see that the values for the filter gradients we calculated in Examples 1 to 4 above fit into a nice pattern:

It turns out that the backpropagation operation is identical to a stride = 1 convolution operation of the input tensor with a dilated version of the output gradient tensor!

This concludes Part 2 of the Backpropagation for Convolution with Strides series. I hope this helped enhance your knowledge of Backpropagation for Convolution.

--

--