Demystify Transposed Convolutional Layers

Sicong Zhao
Analytics Vidhya
Published in
6 min readApr 25, 2020

The transposed convolutional layer is widely applied in Auto Encoder and Generative Adversarial Networks (GAN), which serves as one way to upsample the data.

This is a simple concept, but in my learning process, I was confused by a lot of inconsistent materials. Therefore, I would like to create this tutorial, use both animation and PyTorch code to clearly explain the parameters and math of transposed convolutional layers. All source files are on my Github Repo.

Acknowledgment: The style of animation design is inspired by conv_arithmetic. (Although, it confused me a lot in my learning process.)

1.Default case

Let’s start with the simplest case.

Fig 1. Default case.

Figure 1 shows the calculation process of a transposed convolutional layer with kernel_size to be 3, and other parameters set to default. The dimensions of input (2x2) and output (4x4) could be easily recognized.

Following is the by-step calculation process. As the animation shows, there are 4 steps to generate the final output.

Fig 2. Calculation steps for Basic case

Let’s verify the same calculation using PyTorch:

2.Stride

Next, we will change the parameter stride, leave everything else to be the same as the 1st case.

The PyTorch document indicates:

stride controls the stride for the cross-correlation.

The document is only for reference. Personally, I did not understand it at first. But the following visualization should be clear. The default value of stride is 1, here we set stride to be 2.

Fig 3. Set parameter stride equal to 2

As you can see, after each multiplication step, the kernel matrix moves 2 steps horizontally until it hits the end, and then move 2 steps vertically and start from the beginning.

Let’s see the calculation processes:

Fig 4. Calculation steps when stride = 2

Let’s verify by pytorch:

3.Padding

We will keep building based on stride case, this time we change parameter padding to 1. In previous cases, the padding has default value 0.

Fig 5. Set parameter padding equal to 1.

The final output, in this case, is the center 3x3 matrix. You can interpret it as, after calculation, drop the border cells of the matrix. You should be able to imagine if we set padding equal 2, the result would be the center cell (1x1).

Fig 6. Calculation steps when stride = 2, padding = 1

Figure 6 shows the calculation processes, as you can see, it is almost identical to figure 4. The only difference is we ‘removed’ the outer cells.

Let’s see if PyTorch agrees with us:

4.Output Padding

Yes, we have another kind of padding. Their difference is simple:

Output Padding adds cells to one side of the output, while padding removes cells from both sides of the output.

Fig 7. Set output_padding equal to 1.

In this case, we set parameter output_padding to be 1 (default is 0), and stride to be 2. As shown in figure 7, one side of the output matrix has been added cells, which has value 0.

If you have any difficulties in understanding this, please feel free to compare figure 7 with figure 3.

Below is the calculation steps:

Fig 8. Calculation steps when output_padding = 1, stride = 2

Let’s confirm with PyTorch again.

5.Dilation

Dilation influence the structure of the kernel matrix.

The PyTorch documentation puts,

dilation controls the spacing between the kernel points;

I have no idea when I first saw this because it is very abstract. However, look at figure 9, you might be able to understand it. To make things easier, let’s use the 2x2 kernel in this example. (In the previous examples, we used 3x3 kernel.)

Fig 9. Kernel matrices with different dilation values.

Above is what the kernel matrix looks like with different dilation values. Basically, if dilation value is n, then the kernel matrix will be interjected n-1 cells filled with 0. At this point, it should not be hard to imagine the same transformation for bigger kernel matrices. And the rest calculation remains the same as before, as shown in figure 10.

Fig 10. Set kernel to 2, stride to 2, dilation to 2.

To clarify, in figure 10, I ignored 0-valued kernel cells by making it transparent.

Below is the calculation steps:

Fig 11. Calculation steps when kernel = 2, stride = 2, dilation = 2.

Below is the PyTorch implementation.

6.Math behind the output shape

Finally, let’s close this tutorial by deriving the formula of output size. You only need to read this section if you want to develop a deeper understanding of transposed convolutional layers, otherwise please feel free to skip.

The of the formula for output size is:

Where n is the output size (n x n matrix), m is the input size (m x m matrix). Besides, there are 5 parameters in the formula: K is the kernel size, S is the stride value, P is the padding value, D is the dilation value and P_out is the output_padding value.

It looks complicated, but it is in fact very simple. Let’s look it step by step.

(1) Only consider S (stride) and K (kernel size)

Because the input size is m, so we have m*m steps of calculation. But we really only need to consider the first m steps, since the first m step would fix the width of the output matrix.

We can imagine the output progressively grow as the calculation proceeds, just as shown in figure 2,4,6,8,11.

  • In the 1st step, the output size is K.
  • In the 2nd step, the intermedia matrix shift by S, so the output size is K + S.
  • In the 3rd step, the intermedia matrix shift by S, so the output size is K + 2S.
  • In the m-th step, the intermedia matrix shift by S, so the output size is K + (m-1)S.

Therefore, if we only consider S and K, the formula would be:

(2) Consider D (dilation)

As we have discussed, the dilation changes kernel size. Here, let’s use K’ to denote the transformed kernel size. As shown in figure 9, the dilation transformation interjects (K-1)(D-1) cells in the kernel. Therefore, the relationship between K’, K, D should be:

Thus we have:

Substitute K in (2) with K’, we have:

Now we are almost done, the rest parameters are easy to understand.

(3) Consider P (padding) and P_out (output_padding)

Since padding removes cells in 2 sides, so its influence on the output size is -2P. Similarly, output_padding add cells in 1 side, so its influence on the output size is +P_out. Adding these pieces into (4) we have derived (1).

That concludes this tutorial. Thanks for reading! I hope this tutorial has helped you developed a deeper understanding of transposed convolutional layers.

Please feel free to leave comments. All suggestions, questions are welcomed!

--

--

Sicong Zhao
Analytics Vidhya

Founder of Learniverse / Duke Alumni / Former Data Scientist @Credit Suisse / Former Computer Vision Fellow @Etsy /Former Product Designer @Baidu.