How PyTorch Transposed Convs1D Work

9 min readJul 27, 2018

WARNING: I’ll be assuming you know what neural networks and convolutional neural networks are. Also, this post is written in PyTorch (https://pytorch.org/) notation. Although it applies to the mentioned structures in any framework, it would be good to know the very basic syntax of pytorch to follow the few code snippets appearing creatings Tensors and layers.

Hello there! I am a recurrent PyTorch user as I do loads of deep learning everyday, and today I want to clarify in this post how do transposed convolutions work, specially in PyTorch. I got to know these operations whilst I was developing my deep convolutional auto-encoder systems. Precisely, our speech enhancement GAN worked entirely with these mechanisms to transform 1D waveforms into 1D waveforms (SEGAN: https://arxiv.org/abs/1703.09452). I have been reading through the community posts (https://discuss.pytorch.org/) about the doubts on how these transposed convolutions and their padding work, and perhaps this could be a helpful tool to get out of the darkness.

In this brief article I want to describe what is a transposed convolution and how it generates the outputs we get (for the 1D case, but you can just draw extra dimensions to go 2D and beyond).

First things first! Let’s go across the convolutional road, and then we’ll turn around to meet our transposed friend in the way back.

Conv1d Layers

A conv1d layer (https://pytorch.org/docs/stable/nn.html#conv1d) contains a set of convolutional neurons , also named kernels, and from now on this will be how we call them. We know that conv layers are composed of K kernels, and each kernel has N input channels. In this case though, just to reflect the way convs and transposed convs work, I am focusing on one input channel (like a mono waveform of audio or speech) and one convolutional kernel of width D. Let’s begin exemplifying such a simple kernel of width D = 3, which has to process an input signal of length 7 (7 time-steps).

We know that, as the input signal X = [1, 1, 1, 1, 1, 1, 1] has no padding, we will get Y = [y1, y2, y3, y4, y5], and we understand why by looking at the blue sliding window going from left to right. We can see that, given that W = [1, 1, 1], X is all ones too, and we are applying a 3-to-1 multiply-and-add per window, we should obtain Y = [3, 3, 3, 3, 3]. Let’s check it with a PyTorch code snippet.

There we go. We have our shiny vector of dimension 5 filled with 3s. Now if we add padding we can enforce the output to have a certain length. For instance if we want the output to have length 7 like the input, we may put zeros on both sides of X such that:

That simple. There are other padding methods than injecting zeros to avoid certain artifacts in the output and whatever, but this is not the most important part in this post.

ConvTranspose1d Layers

The important part comes from how do we convert a convolution to a transposed convolution! Before I said we had to get started with our trip going through the convolutional road, and then turn around to go back. Well here comes our friend, reverting the operations flow such that we change a many-to-one mapping to a one-to-many!! This is what ConvTranspose1d layer does (https://pytorch.org/docs/stable/nn.html#convtranspose1d).

Picture of reverting the convolution operation to go Y -> X rather than X -> Y

So now we have a different way of mapping things, now we go 1-to-3. We can see the transposed convolution as sending portions of our data in Y to our destination X. Ok this picture is pretty simple, but we like PyTorch confirmations, so let’s ask the great PyTorch oracle about our doubts. Consider we have Y = [1], and W = [1, 1, 1](as before):

It seems like we have the correct response again. Things are going well with the resulting vector X = [1x1, 1x1, 1x1] = [1, 1, 1] !

Now we have seen a pretty simple case, but I’m sure you are wondering what if I have more than 1 input value to apply this “reverted” operator. Well it is pretty simple actually, we just stride the amount of samples we specified in the ConvTranspose1d instantiation, in this case 1:

We have an additional output dimension to fulfil kernel width needs. Also we see overlap now.

Hey, now we see how things overlap and sum up together! BUT WAIT! There are 4 output elements now!! How come? Well the transposed convolution will create as many dimensions as required to fill its kernel “messages” with portions of yi. We also see how portions of both y1 and y2 are being sent into their allocated destinations and then summed up! Let’s ask PyTorch about this, having Y = [1, 1] and W = [1, 1, 1]:

The result is X = [1x1, 1x1 + 1x1, 1x1 + 1x1, 1x1] = [1, 2, 2, 1] , so correct! Ok now we have two final misteries to solve to completely understand the operations under ConvTranspose1d . First, what about stride > 1 ? Second, what about padding > 0 ?

I want to REALLY upsample my signal, gimme a big stride my friend

Setting stride > 1 will upsample our signal to even higher scales than just a few samples that fulfil our kernel allocations. Let’s see how does it work with stride = 4 :

Mmmmmm…. Interesting huh? There are misterious boxes now. Somewhere where space and time collide, and weights put no portions of data in there. BUT… being fair with strides and our data dimensions… Our output size MUST-BE-8. So, what’s within the black holes of mistery? Well I bet there is some zero. Let’s ask PyTorch about this:

Woah! We do have a zero! But what about the other mistery? It should be zero too if it was to appear, but I guess it is trimmed because it has nothing and we didn’t ask for that dimension specifically (we’ll see what this means with padding). In any case, we retain a zero in the middle of the vector because it was specified that there is a blank space in the middle without interaction of our convolutional kernel. What should we do to cover all the spaces? At least have width = stride . Let’s try it incrementing the kernel width by 1.

This looks way better. We covered the whole space with portions of our input data. We can see how misteries are gone, and both question marks are filled with ones now, because each has w4 sending a portion of data to them. Moreover, now we have an actualx4 interpolation, as expected with stride=4 .

The last part of this section about stride is talking about overlapping in the deconvolution. What if we have a width D = 8 , which is double the stride?

The second kernel pass is shown in green to reduce confusion, but green and blue are the same kernel weights.

We have 50% overlap when the kernel slides, and we can see that the output is larger than 8, so it is not just x4 the input, but a bit more to fit our kernel size, like in the example of one single input. That simple! No more! PyTorch tells us this is correct too:

Can we, however, ensure we have a proper x4 length in the output with the same large kernel to enforce the overlap (for whatever reason we prefer to have this overlap)? Of course we can! we can do LOTS of things in deep learning if we glue the right pieces together. Here comes our padding colleague, let me introduce it.

Let me pad to fit my data

Padding in transposed convolutions can be seen as allocating fake outputs that will be removed.

Look at the yellow blocks which pad our interpolated signal storing whatever, and then we just drop them and stick to the balck boxes, containing our output data with the desired overlap and stride . Let’s check the result we obtain with PyTorch, our friend and neighbour.

So we told PyTorch about our wish to pad the signal to just get 8 output results, and here we are with our splendid 8-dimensional vector: X = [1, 1, 1+1, 1+1, 1+1, 1+1, 1, 1] = [1, 1, 2, 2, 2, 2, 1, 1] Note the 50% intersection where kernel weights send portions of both y1 and y2 to 4 fields!

Now things are even more interesting… what is the parameter output_padding in ConvTranspose1d ? Well it serves to pad IN THE END of the signal (right side) with a certain amount of zeros we put. We can take a look at ConvTranspose2d documentation which clearly states:

So we can use it to set an asymmetric padding to obtain a desired output shape. It is only needed if we require a specific output shape for which assymetric padding is necessary.

Note how we add a padding value, but it does not get filled with info for the same case as we’ve seen before! No kernel weight is sending info there. PyTorch result is reflecting this zero though, as we told it we want that padding there to MATCH the dimension!

padding=0 (default) and output_padding=1

The picture above shows correctly the arrow contributions. Now what if we put the padding (padding argument) we had before?

The extra paddings will be dropped again, and we will keep the output_padding element that will be filled with a portion from y2 . What is PyTorch response?

We have 9 elements in our output result like the arrows indicate! There is a 1 exactly where the output_padding is situated, so the yellow block is not a yellow block in the end of the day (not something to be dropped). It is an actual output box that adds at mostmin(D, stride) dimensions.

Finally, to understand why this block is called transposed convolution, Theano’s convolutional arithmetic tutorial and Naoki Shibuya tutorial on transposed convolutions are very useful, and I leave them as proper references to explore this.

Also, it is recommended to read the distill publication about checkerboard artifacts with transposed convolutions by Odena et al. (https://distill.pub/2016/deconv-checkerboard/) to further understand the intrinsics of this method.

Conclusions

In this post I showed the operations behind the one-dimensional transposed convolutions in PyTorch. We have seen how kernel widths affect the output shape, as well as strides (to obtain a higher interpolation factor). We have also seen padding / output_padding options to control the desired output shape after operating with our data in the upsample method.