UNet with ResBlock for Semantic Segmentation

Nishank Singla
3 min readDec 12, 2019

UNet architecture was a great step forward in computer vision that revolutionized segmentation not just in medical imaging but in other fields as well. The long skip connection between each level of contracting path and expanding path is the key feature of the UNet. It’s like FCN is pulled upwards from both ends.

Another revolutionary advancement in computer vision was ResNet. The residual blocks in ResNet with skip connections helped in making a deeper and deeper convolution neural network and achieved record-breaking results for classification on the ImageNet dataset.

Now by replacing convolutions in U-Net on each level with ResBlock, we can get better performance than the original UNet almost every time. Below is the detailed model architecture diagram.

Here are a few feature improvements to the architecture:

  1. All convolution operations are with 3*3 filters and with the SAME padding thus the size of the feature map remains the same on each level of contracting path and corresponding expanding path. With the same padding, the boundary information is preserved and it also allows for more convolutions to be added.
  2. Because the feature size remains the same on a single level, cropping of the feature map from the contracting path is not required in order to concatenate with the corresponding feature map of the expanding path. No cropping means no loss of information.
  3. Along with the long skip connection between every level of contracting and expanding paths, we have local skip connection between convolutions on each level. Skip connection helps in getting a smooth loss curve and also helps to avoid gradient disappearance and explosion.

All these features make it very powerful for semantic segmentation. In this particular architecture, ResBlock of ResNet34 is used but ResBlock of ResNet50 or 101 can be used as well. In the original paper, UNet has 5 levels with 4 down-sampling and up-sampling operations. In this model, if the size of the input image is multiple of 2⁴ then cropping will not be required.

UpSampling operation in the model

There are 2 approaches for expansion. In the original paper of UNet, expansion is a combination of upsampling and 2*2 convolution with half the number of filters. The second approach is to use transpose convolution. Now, it looks like both are doing the same thing as they both result in feature maps that have the same dimensions but mathematically they are different operations. I prefer the first option of doing up-sampling and then convolution instead of transpose convolution as the latter results in checkboard artifacts. Refer to this great article https://distill.pub/2016/deconv-checkerboard/.

Here is the link to my Git repo for the Keras code implementation of this architecture. https://github.com/Nishanksingla/UNet-with-ResBlock/blob/master/resnet34_unet_model.py

Reference: https://course.fast.ai/videos/?lesson=7

--

--