Part 1: Backpropagation for Convolution with Strides
Loss gradient with respect to the input tensor
There are several examples of Backpropagation with Convolution, but almost all of them assume a stride of 1. This article provides a visual example of Backpropagation with a stride > 1. Along the way, this hopefully exercise also provides some intuition for why the filter needs to be rotated for Backpropagation.
(Part 2 can be found here: Loss gradient with respect to the filter)
We’ll use the following example dimensions for our input, filter, and output tensors. Note that we’re using horizontal and vertical strides of 2 here.
To set things up, let us first express look at the forward propagation step to express the output pixels as functions of the input activations and filter contents (the weights). This, of course, is quite straightforward.
Here’s what we need to do in the Backpropagation step: given the gradient of the loss with respect to the output pixels, we need to calculate the gradient of the loss with respect to the input activations.
Each input x contributes to one or more output pixels (this is visualized in more detail below). Using the chain rule and partial differentiation principles, the total gradient of the loss with respect to each input pixel can be expressed in terms of gradients of each output pixel with respect to the input, multiplied by the gradient of the loss with respect to that output pixel:
Remember that the only output pixels y that appear in the above equation for a given x are the ones that x contributes to during forward propagation. This can be inferred from the following equations we calculated during forward propagation:
With this knowledge, let us calculate the gradients for a few example inputs step by step, before we see the full Backpropagation in action.
At first glance, it might be hard to see a pattern in the values of the these input gradients. But what if we modified the output gradient and filter tensors?
We’re going to make one modification to the output gradient tensor:
We’re also going to make one modification to the filter:
With these modifications to the output gradient tensor and the filter, we’ll see that the values for the input gradient we calculated in Examples 1 to 4 above fit into a nice pattern:
It turns out that the Backpropagation operation is identical to a stride = 1 Convolution of a padded, dilated version of the output gradient tensor with a flipped version of the filter!
I hope this example helped you understand Backpropagation for Convolution with Strides.