DISPARTING

IMPROVING SEMANTIC IMAGE SEGMENTATION USING CLUSTERING

CYBORG NITR
7 min readMay 4, 2020

In the field of Deep Neural Network and Deep Learning, it is commonly observed that it is very difficult for deep convolution neural network (DCNN) based semantic segmentation to estimate correct object boundary. We think that the reason lies in assigning labels for pixels on the object boundaries because the cascaded feature maps generated by DCNNs blur them. In order to segment foreground objects from background object the DCNN algorithm should classify the boundary pixel precisely.

This problem poses major hindrance to the accuracy in the semantic segmentation and it also leads to very high losses in ventures involving satellite imaging. Due to the involvement of very fine edges in ‘Satellite images’ missing them would lead to unbearable expenses and losses. Hence this becomes a major problem to be dealt with.

We solve this anomaly through the use of clustering methods. Clustering helps in Fine Edge Detection of the images and together with DeepLabV3 proves to be an indispensable tool for the solution.

After clustering the images, the segmentation of the Neural Network (here DeepLabV3) is enhanced by the use of our merging algorithm known as Disparting.

SEMANTIC IMAGE SEGMENTATION

Semantic image segmentation, also called pixel-level classification, is the task of clustering parts of the image together which belong to the same object class. In semantic image segmentation, we assign or link each pixel to a class label. The pixels belonging to the same class take the same color value resulting in a masked image that gives finer detail of the objects in the image.

Semantic Image Segmentation

HOW DEEP LEARNING DOES THE SEGMENTATION

Neural Networks models vary in implementations but have the same set of structures that facilitate feature extraction and give out a masked output depicting different classes.

Basic structures of most of the Neural Network model includes :

  • The Encoders: These sequences of deep convolution layers extract features from the input image with progressively narrower and deeper filters.
  • The Decoders: These sequences of layers transform the output of the encoder into masks having the same pixel resolution of the input image.
  • Skip Connections: These are the connections connected across the encoder-decoder layers that help in model accuracy. These connections carry some features that help in reconstructing the mask images with better accuracy.
Model including feature extraction and upsampling for Image Segmentation

OUR OVERALL PIPELINE

WHY DEEPLABV3

We used the DeepLabV3 with ResNet101backbone by Google because of it’s better performance in segmenting the images than other models and has better IoU (Intersection over Union) performance. Deeplabv3 models are also found to be better in computational complexity maintaining similar or better performance because of Atrous depthwise convolution.

DeepLabV3 architecture

LIMITATIONS OF DEEPLABV3

There are some limitations in the deeplabv3 that our algorithm tries to better out. All the Semantic Image Segmentation algorithm has two major drawbacks in results. Edges of the objects are not properly segmented resulting in less IoU.

Our main objective of the algorithm is to better out edge detection.

OUR APPROACH OF CLUSTERING FOR IMPROVISING RESULTS

We address solving the segmentation problem by using different clustering methods for better segmentation which generates better segments of the input images especially around the edges which tends to either get left out or gets overshoot when segmented using neural networks. This, in turn, increases IoU value between the predicted result and Ground truth, and hence a more accurate and precise result is obtained.

The need for clustering arises to the fact that the output of the images generated by the DeepLabV3 or any other neural network in many cases tends to be inaccurate hence and clustering and our “disparting” method could solve this problem.Clustering is a technique for grouping the data points (in this case different pixel points) having similar features into different clusters. Then the curve corresponding to the boundary of these different clusters is used to differentiate the clusters. This boundary thus generated is used to define the borders accurately and hence these boundaries prove to be vital in our segmentation process.

Clustering Images

The clustering methods like KNN clustering uses a very hard-line to differentiate the clusters but with this, the method does not take into account the little dissimilarities during clustering as there is no smoothing involved but here a more advanced approach is used to cluster the image. Here, kernel density estimation(KDE), a smoothing method is used; it works by placing a kernel-a weighting function that is useful for quantifying density on each data point in the data set and then summing the kernels to generate a kernel density estimate for the overall region. Areas of greater point density will sum out with greater kernel density, while areas of lower point density will sum out with less kernel density.

a)Felzenszwalb, b)SLIC, c)Quick-Shift, d)Compact Watershed

After this smoothing additional clustering methods which are felzenszwalb, SLIC and quick-shift are used to improve our segmentation

Another clustering method, Compact Watershed method is also used after the image is passed through a Sobel operator.

All these clustering methods prove to be useful in getting the segmented images of the DeepLabV3 neural network more accurate.

As observed the outputs of the DeepLabV3 Neural Network generally do not get the edges accurate. And this leads to inaccurate results in cases where there is a need for precise detection, especially near the boundary. So, at this juncture, the technique proves useful in removing this particular problem.

This clustering along with “disparting” generates well-segmented images, but for different types of images, different clustering methods with different kernel size and smoothing factors gives the best results. In cases where the edges of DeepLabV3 output are just slightly out of the correct shape at those cases a large number of clusters with smaller kernel size gives better result whereas, in the images where there is a large difference between the ground truth and DeepLabV3 output at these cases, larger clusters provide the best results.

PROBLEMS IN CLUSTERING ALGORITHMS

There are nevertheless some limitations for effective clustering which are-

1. The whole clustering requires pixelwise traversal of the whole image, so it has the time complexity of O(N²) for an image with dimension NxNxC.

2. It proves to work very well for some type of images with a single set of parameters, but for other types of images, the parameters need to be tuned accordingly, so that the best results would be obtained.

3.Again for images where there is an overlap of different classes in multi-class segmentation this clustering method doesn’t seem to be that helpful.

4. This method is not a substitute to the neural network segmentation, hence if a neural network predicts a totally wrong segmentation of a particular image then, in that case, the clustering method couldn’t solve it completely.

HOW WE IMPLEMENT OUR DISPARTING ALGORITHM

Here we use a class voting method named disparting, which basically merges the DeepLabV3 segmentation output and the output of the Clustering classes.

After the generation of segmented outputs from both the methods, our algorithm named Disparting is used to merge this and produce a final segmentation with better IoU than that done with only Deeplab output. This Disparting algorithm performs another pixel-wise traversal and comparison of the DeepLabV3 and clustered output producing a final merged image with better segmentation.

The algorithm is as follows

we have clustered_output(H, W) and deeplab_output(H, W)

init clusterStat(a, b) = 0

for all a and b

init clusterSelect(a) = 0

for all a

Here,

a= number of cluster centers

b= number of segmentation classes in DeepLabV3

for i = 0:W-1 :

for j = 0:H-1 :

clusterStat(clustering(i,j), deeplab(i,j)) += 1

clusterSelect(i) = max_index(clusterStat(i,j)) of j for each i

for i = 0:W-1 :

for j = 0:H-1 :

finalSeg(i,j) = clusterSelect(clustering(i,j))

return finalSeg(W, H)
Input Image compared with Ground Truth, DeepLab result and Disparting result

CONCLUSION

Our Disparting method is very useful in improving the edges and depicting the input image to more like the ground truth. There are different clustering algorithms that have different impacts on different images, thereby we have to tune for every image. In most cases, quick-shift stood out to have better results. Therefore, implementing this method of clustering on semantic segmentation would surely generate a segmented mask with better edges.

The post-processing of the segmented output of DeepLabV3 model, involving the clustering method to make the edges more distinct makes the approach a bit more CPU intensive but it’s worth that complexity because of increased similarity of the final segmented image thus produced with the expected output.

IoU values using different clustering algorithms

Even though this proposed method works in most of the cases but may fail in distinguishing the boundaries of the object from the surrounding background in a dark environment.

GitHub link: Find the source code here

--

--

CYBORG NITR

Cyborg, the robotics and automation club of National Institute of Technology, Rourkela, where we design intelligence and redefine technology.