Semantic Segmentation in Computer Vision

Rina Mondal
2 min readJan 2, 2024

--

Semantic segmentation is a task of computer vision which focuses on understanding the pixel-level semantics of the entire scene. The goal is to partition an image into coherent segments and assign a semantic label to each segment.

It can be implemented by many approaches:

a. Sliding window- We take an image as an input then we divide it into many tiny crops and then we apply classification algorithms on those crops rather than applying on the image. In other words, sliding window approach involves systematically moving a small window across an input image, applying a semantic segmentation model to each window, and assembling the predictions to form the final segmentation map. As it is computationally too expensive it is considered as a terrible idea.

b. Fully convolutional- We take an image as input and apply stack of fully convolutional networks (FCN), then the final convolutional output generates a tensor providing classification score for every pixel of the image. We could compute this all at once. FCN helps in maintaining the spatial information throughout the network which makes it computationally too expensive and will take a lot of memory.

c. Design a network which does Downsampling and Upsampling: Instead of Fully conventional, a different method can be applied where the network contains a bunch of convolutional layers, with downsampling and upsampling inside the network. So, rather than doing all the convolutions of the full spatial resolution of the image, we will may be go through a small number of convolutional layers at the original resolution then downsample that feature map using something like max pooling or strided convolution we have convolutions in downsampling that look much like a lot of the classification networks, then we increase the spatial resolution of our predictions during the second half of the network by upsampling which has become much more computationally efficient.

Usage: This technique is crucial for various applications, including autonomous driving, medical image analysis, and scene understanding in robotics and augmented reality.

--

--

Rina Mondal

I have an 8 years of experience and I always enjoyed writing articles. If you appreciate my hard work, please follow me, then only I can continue my passion.