Learning Day 55: back propagation in CNN — pooling and conv layers

De Jun Huang
dejunhuang
Published in
3 min readJun 9, 2021

Building the knowledge on top of BP in NN (Day 51)

  • For a typical conv layer structure: conv layer →pooling layer →Output
  • How does BP work from output to pooling layer
  • How does BP work from pooling to conv layer

BP for Output → pooling layer

  • Based on Day 51, the δ for NN is calculated as follow:
δ for NN layers in matrix form (above) and formula form (below)
  • But the δ to pooling layer does not involve the Σ portion. Instead, replace it with upsampling
δ for pooling layer. σ’ here = f’ in the previous image
  • For average pooling, upsample the δ to match the shape of the conv layer by splitting each value evenly to nearby cells.
  • In such a way, the δ won’t get increase in the process of BP
Upsample step: The sum of δ is the same before and after upsampling
  • For max pooling, we need to record which cell before pooling has the max value.
  • Upsample δ and place the value to the max position based on record. Other places are filled with 0s.
Result after upsampling from max pooling layer

BP for Output → pooling layer

  • If we break down the convolution operation between the image input and filter, the calculation is actually a special type of NN layers.
  • Special NN layers: not fully connected, weights are shared
Notice the example: The purple cell is represented by the purple solid line, which is multiplied by the last node of the NN blue layer (corresponds to the last cell in 3x3 image) to get the last node in the NN orange layer (corresponds to the last cell in 2x2 feature map). Weight sharing is represented here since the purple line is repeatedly used by other nodes. Not fully connected characteristic is represented here since each node in the NN blue layer only connects to some nodes in the NN orange layer.
  • Based on Day 51, the δ for NN is calculated as follow:
  • Since conv layer is not fully connected to the next layer, the W here cannot be used since this matrix multiplication uses all weights in W.
  • Instead, only choose the weights that are used in the convolution operation, resulting in:
Use zero-padding around the δs passed from the latter layer (eg. pooling layer) during BP. Notice that the weights have been rotated 180°.
  • In the above case, the matrix calculation (above) assumes the latter layer is a pooling layer. So σ’ in the equation (below) was ignored since pooling layer has no activation function

Reference

link1

--

--