Segmentation in Computer Vision: Unraveling the Enigma of Image processing

8 min readFeb 28, 2024

Why is segmentation important in computer vision? What are the different methods and techniques used in segmentation? This article explains the concepts of segmentation in computer vision.

Introduction

When talking about computer vision, the main 3 tasks are:

Image Classification
Object Detection
Segmentation

This article will focus on the last task, Segmentation. Segmentation is the process of dividing an image into different segments. When each pixel is classified as a class, it is called segmentation. There are different methods and techniques used in segmentation.

Is Segmentation a classification problem, clustering problem or a regression problem?

In order to correctly answer this adequately, we need to understand the main premise of the question.

Classification

The process of categorizing the data into different classes where each pixel is classified as a class is a basic principle Classification

Clustering

Clustering is the process of grouping the data into different clusters based on the similarity of the data however, it is not the same as classification as the data is not labeled and the algorithm will decide the classes.

Regression

Regression is the process of predicting the continuous value of the data. One might think that there are no continuous values in pixel-wise classification but the moment we convert each pixel to a continuous value representing the color of the pixel the problem becomes continuous.

Based on the above explanation, we can conclude that Segmentation has a classification problem as each pixel is classified as a class but also has a regression problem as each pixel is represented as a continuous value and by the end of the process we are clustering the pixels into a class based on the similarity of the pixels in terms of the position

Evaluation Metric for Segmentation

Disclaimer: There is not a single magical evaluation metric which can be used for all the segmentation problems. The choice of the evaluation metric depends on the problem and the data. However, there are some common evaluation metrics used in segmentation.

DICE Coefficient

The most used evaluation metric in segmentation is the DICE coefficient. It is used to quantify the similarity between two sets.

The Dice coefficient formula calculates the similarity between two sets by measuring twice the intersection of the sets divided by the sum of their sizes. The formula is given by:

Where:

A is the predicted set
B is the true set
|A ∩ B| is the intersection of the predicted and true set
|A| is the area of the predicted set
|B| is the area of the true set

Jaccard Index

The Jaccard Index is another great evaluation metric used to determine the similarity between two sets. Jaccard Index also carries the name of Intersection over Union (IOU). The alternative name is very descriptive of the formula which is given by:

Where:

A is the predicted set
B is the true set
|A ∩ B| is the intersection of the predicted and true set
|A| is the area of the predicted set
|B| is the area of the true set

Pixel-wise Segmentation

The ground basis of the pixel-wise segmentation lies in the fact that each pixel is classified as a class. The number of classes depends on the problem and how granular the segmentation is. The larger the number of classes, the more granular the segmentation is. The smaller the number of classes, the less granular the segmentation is. By increasing the number of classes the model will be able to detect more details in the image. However, the more classes the more complex the model will be and the more data will be needed to train the model properly. Categorical labels can range from 0–N. Where: {N ∈ N}

The most simple example of pixel-wise segmentation is a binary mask where the number of classes is 2. The classes are usually the object and the background. In medical imaging, the classes can be the tumor and the background or a specific organ like lungs and the background. By utilizing the pixel-wise segmentation, the model can detect the exact location of the object in the image.

Random Taxonomy

Relies on the concept of dividing and categorizing a topic into sub-topics and groups

Histogram-based thresholding
Edge-based segmentation: Filters, Contours
Region-based segmentation: KNN, GMM, DBSCAN
Combination of the above methods: Edge-based + Region-based (Canny Edge Detection)

Segmentation via Classification

Segmentation via classification utilizes models like

KNN: K-nearest neighbors
SVM: Support Vector Machine
NN: Neural Networks

The features we use in the classification model are:

Voxel values
Voxel Position
Gradient magnitude
Neighboring voxel values

The very important part of classification are labels. Based on the number labels we can divide it into 2 categories: Binary (2 classes) and Multi-class (more than 2 classes). The example of Binary classification is the segmentation of the object and the background. The example of Multi-class classification is the segmentation of the object, background, sky, tree, car, road, building, etc. The major issue with segmentation via classification is the fact that it requires labeled data for training. The labeling process is time-consuming and requires a lot of resources.

Segmentation via Clustering

Segmentation via clustering utilizes models like:

K-means
GMM: Gaussian Mixture Model

The main process of the algorithm workflow is to pick a K number of clusters and then assign each data point to the nearest cluster. Calculate the respective centroids and then repeat the process until the centroids do not change.

Thresholding

Thresholding answers the question: Is this a background or a foreground? The thresholding method is the simplest method of segmentation.

The concept of thresholding is to convert the grayscale image to a binary image.

The following image shows a histogram of an image. The X-axis represents the pixel values and the Y-axis represents the frequency of the pixel values. On this histogram, we can see the peaks of the pixel values. The valley between the peaks is the threshold value. This distinction in the classes allows us to have clearly separated pixels in the picture and thus having a clear segmentation.

How to choose the threshold value?

The most well-known method is the Otsu’s method which searches the point `t` that minimizes the variance of foreground and background pixel values, weighted by class probabilities. The class probabilities is a count of how many pixels with those colors belong to the foreground, and how many to the background

Histogram of color distribution with a red line created by Otsu’s method

The formula for the Otsu’s method is given by:

Otsu’s method for threshold calculation

Otsu’s method is a great method for thresholding and works especially well for bimodal histograms.

Connectivity

The concept of connectivity explains to us how the pixels are connected to each other.

2D connectivity is defined by the 4-connectivity and 8-connectivity. Meaning the central pixel is connected to the 4 or 8 neighboring pixels.

4-connectivity

(x-1, y)
(x+1, y)
(x, y-1)
(x, y+1)

8-connectivity

(x-1, y-1)
(x-1, y)
(x-1, y+1)
(x, y-1)
(x, y+1)
(x+1, y-1)
(x+1, y)
(x+1, y+1)

The 3D connectivity scales the 2D connectivity to the 3D space by adding the z-axis to the equation and therefore allowing to have up to 26 neighboring pixels.

The options are

6-connectivity
18-connectivity
26-connectivity

Region Growing

Region Growing is a method of segmentation that is based on the concept of the similarity of the neighboring pixels and trying to find continuous regions.

The Algorithm goes as follows:

PsSelect “seed” pixels according to some criteria
Add to a region and push to the back of the queue
While the queue is not empty
    For each neighbor of the front of the queue:
        If the neighbor meets the criteria and isn’t in the region 
            add to the region and push to the back of the queue 
    Pop head of the queue
Example Implementation
Notes:
Criteria can be anything (e.g., global threshold)
* If you are familiar with Graph Theory: This is a breadth-first search

Local Adaptive Thresholding

The basic concept of the Local Adaptive Thresholding is calculating the thresholds separately on different regions of the image. By doing so, we can have a more granular segmentation of the image allowing us to spot potential patterns which on a global level would not be visible.

Conclusion

As we can see there are many different segmentation methods and techniques. The main premise of a successful workflow is understanding your domain and mastering the knowledge of the data. The choice of the segmentation method depends on the problem and the data. There is no one-size-fits-all solution.

By having a clear understanding of the data and the problem, we can choose the best segmentation method and achieve the best results. The future of segmentation is bright and we can expect many new methods and techniques to come up soon. This is a very exciting time for the field of computer vision and image processing.

At the conclusion of the article, it is with great appreciation that I acknowledge Mr. Liad Magen for his guidance and provision of materials pertaining to the discussed topics. His expertise and support have been instrumental in enriching the content and insights presented.