Upsampling and Transposed Convolutions Layers

Surya Teja Menta
3 min readSep 29, 2022

--

This blog is about what are Upsampling and Transposed Convolutions layers and how they works. In U-Net Architecture, The image used to reduce its dimensions from 256x256 to 16x16 and again increased from 16x16 to 256x256 dimensions to get the segmented region from an image. U-Net Architecture is used in Image Segmentation, Image to Image translation and e.t.c.,

In Generative Adversarial Networks(GAN’s) like SRGAN, ESRGAN and Real-ESRGAN is used to generate high dimensional and qualitative image from poor or low dimensional Image.

From the Both U-Net and GAN’s, the image dimensions are used to increase from layer to layer. At my starting stages of learning, i felt a bit strange about how an image dimensions are increased naturally. After deep study, i got to know about it.

Upsample2D Layer:

This method will increase the dimension of each and every element/pixel in a matrix/image. it also depends on the Size Parameter.

tf.keras.layers.UpSampling2D(
size=(2, 2), data_format=None, interpolation=’nearest’, **kwargs
)

Here, the size is defined as (2,2), So, the each element/pixel will increased into (2,2) size.

Upscaling in Upsampler2D layer

Here, we can observe that each pixel is upscaled by repeating itself w.r.t size. The range of Upsampling effect depends on the size that we represents.

Likewise we can use the Upsampling1D, Upsampling2D and Upsampling3D layers based on the requirement and input shape.

Conv2DTranspose Layer:

The Conv2DTranspose layer is basically a Deconvolution method simply which is opposite to Convolution operation. In more simply, Conv2DTranspose is a convolution but in backward direction. Convolutional Neural Network provides various features to perform different tasks of image processing but in any convolutional neural network, most of the layers like convolutional layers and pooling layers downsample the height and width of the input or by using padding, keep them unchanged. In many processes like semantic segmentation, object detection which processes classification at the pixel level, it is convenient to not change the size of output or keep the dimensions of input and output the same. This can be achieved by the transposed convolution in a better way.

tf.keras.layers.Conv2DTranspose(
filters,
kernel_size,
strides=(1, 1),
padding='valid',
output_padding=None,
data_format=None,
dilation_rate=(1, 1),
activation=None,
use_bias=True,
kernel_initializer='glorot_uniform',
bias_initializer='zeros',
kernel_regularizer=None,
bias_regularizer=None,
activity_regularizer=None,
kernel_constraint=None,
bias_constraint=None,
**kwargs
)

Convolutions & Deconvolutions:

Architecture follows at Auto Encoders and U-Nets.

While comparing about the Upsampling2D Layer and Conv2dTranspose, The Convolutions are basically expensive and but upsampling layer is cheap and fast while comparing to Convolutions.

Actually, The Conv2DTranspose to Con2D and Upsampling is similar to the pooling layers. But, Both of them will give similar results in upscaling factor. While, sometimes we need to get the features which are lost while convolutons, i think Unpooling is the advanced technique like Max-Unpooling. Also, we can use Bilinear-Intepolation function somewhat which helps in both scenarios.

This is my view based on my study. Please ping me if any questions and if you feel any bugs.

Portfolio: https://suryatejamenta.co.in/
LinkedIn: https://www.linkedin.com/in/surya-teja-menta/
Github: https://github.com/Surya-Teja-Menta
Insta: https://www.instagram.com/suryatejamenta/

Happy Coding…,

--

--