[CV] 7. Segmentation as clustering (K-Means, Mixture of Gaussians, Mean-Shift)

jun94
jun-devpBlog
Published in
9 min readNov 25, 2020

1. What is Image Segmentation?

It identifies groups of pixels that go together by separating an image into coherent objects. In other words, Image segmentation refers to finding boundaries of different objects within an image. It might be easier to understand by considering Image segmentation as a pixel-level classification task where each pixel needs to be figured out which class it belongs to.

Figure 1. original image (left) and after segmentation (right), from [1, 2, 3]

The goal of segmentation is to simplify the representation of an image into something that is more meaningful and easier to analyze.

2. Segmentation as Clustering

Image segmentation can be done with various approaches, e.g. clustering, energy minimization, etc. In this article, we focus on clustering methods to solve image segmentation tasks. Let’s take a look at a toy example below showing how Image segmentation can be performed by clustering.

Figure 2. sample image with three colors (left) and its corresponding pixel intensity map (right), from [1, 2]

Taking a look at pixel intensities, one can notice that the intensity map (right panel of Fig 2) can be divided into three groups. Using this information, we can assign a class label (either 1, 2, or 3) to each pixel according to the corresponding pixel intensity. For example, if pixel intensity = ‘0’ -> assign class label ‘1’, pixel intensity = ‘190’ -> assign class ‘2’ (injective mapping). By taking the pixel intensities as a feature, we can complete the image segmentation in Fig 2.

In practice, however, the image is not so simple as the one given in Fig2.

Figure 3. sample image with more than three colors (left) and its corresponding pixel intensity map (right), from [1, 2]

Fig 3 represents a little harder example. While the image still has three classes (circle, rectangle, background), there are no longer completely-separated groups in the intensity map. Therefore, we cannot use the injection-mapping strategy in Fig 2 anymore, since the number of intensity values and the number of distinctive class are not matching.

Then how should determine the class-label assignment while still taking a pixel intensity as a feature, as we did in Fig 2?

The solution to this is clustering since pixels in the same objects within an image tend to have the same pixel intensities. Recall the grass in Fig 1. Almost all of the pixels belonging to the grass has the same color (green). Therefore, by choosing pixel intensities that most suitably represent each class in image, we can utilize these intensities (also called ‘centers’ below) to perform image segmentation. Fig 4 illustrates how to make use of centers to assign class labels to pixels.

Figure 4. Upper panel: pixel intensities for pixels in the image. Lower panel: given image, from [1,2]

Let’s assume we already obtain ‘three centers: 0, 190, 255’ that represent intensities as a result of successful clustering. In Fig 4, the chosen centers are colored red. Then we can give a label to every pixel according to which center is closest to their respective intensity. More specifically, for each pixel in image, check the corresponding pixel intensity, and find the nearest center to it. Afterward, assign the class label of the found center to the pixel. By repeating this for all pixels, we can complete the image segmentation with clustering.

2.1 Feature space

Before getting into more detail in clustering, we need to know what is and how to define feature space. Feature space is where the clustering is actually performed based on the points whose coordinates imply the chosen features that represent a pixel. The chosen feature of the example in Fig 4 is intensity values and therefore the feature space is one-dimensional, and grouping pixels is done based on intensity similarity.

Figure 5. an example of a feature space: color value (3D), from [1, 2]

Another example of feature space is when features are three color values (RGB), and therefore, the feature space is three dimensional. Here, grouping pixels is performed according to the color similarity.

Figure 6. an example of a feature space: pixel intensity (1D) and position (2D), from [1, 2]

One can also consider choosing a pixel position (x, y) as a feature. Fig 6 illustrates the feature space with pixel position and intensity as its features. Note that pixel intensity (1D) is used instead of color values (RGB, 3D). If pixel intensity is replaced by features of color values then the corresponding feature space will be 5D (3D from color values and 2D from pixel position).

From now on this article deals with three conventional clustering algorithms to determine the good centers, but the described flow of Image segmentation with clustering remains the same.

3. K-Means Clustering

3.1 Basic idea of K-Means algorithm

K-Means start with random initialization of k cluster centers. Note that k is the parameter indicating the number of cluster centers that must be defined.

(1) Given random cluster centers, assign a cluster ID to each point such that returns the minimum L2 distance between the point and cluster center.

(2) Given labeled points, update the k cluster centers. This is done by computing the average of points belonging to the same cluster.

Repeat the steps until the cluster centers converge.

By doing so, the K-Means algorithm always outputs some solution. However, note that the retrieved solution is a ‘local minimum’ in most cases, instead of the global minimum of objective function:

3.2. To avoid arbitrarily bad local minima

As the cluster centers are initialized completely random, there is a chance of when some initial cluster centers are very close to each other. In this case, the K-Means algorithm does not work well and outputs an unallowable bad local minima. In order to prevent this, we can add one more step after random initialization as follows.

Advanced cluster center initialization method: K-means++, from [9]

As a result, the further a point is from existing cluster centers the higher chance it gets to become a cluster center. By adding the step, now cluster centers have a higher chance of being further away from each other, i.e. they are likely to be well distributed over the feature space than before. Eventually, the updated K-Means algorithm, which is called K-Means++, produces bad local minima less often than K-Means.

3.3 Pros and Cons

pros

  • Simple concept and fast to compute
  • It is guaranteed to converge to, at least, local minimum

cons

  • We have to manually decide the k (number of cluster centers)
  • The algorithm is very sensitive to the initial position of cluster centers and outliers
  • It detects spherical clusters only (because the K-Means algorithm uses squared error)

4. Mixture of Gaussians (MoG)

In order to solve the problem that K-Means only can detect spherical clusters, one can consider applying the Mixture of Gaussians, which is a probabilistic approach. As its name implies multiple Gaussians are mapped to points in feature space in a way that they suitably represent the clusters. Since D-dimensional Gaussians with D higher than 1 can also be in the shape of an ellipse by adjusting the covariance matrix, MoG can detect non-spherical-shape clusters.

Figure 7. The mathematical definition of the objective function for MoG, from [1, 3]

To find appropriate Gaussians which are defined parameters ϴ, the goal is to determine ϴ such that maximize the likelihood function. For this, Expectation Maximization (EM) is applied.

4.1 Pros and Cons

pros

  • MoG is a probabilistic interpretation of the clustering tasks
  • MoG is a generative model, indicating it can predict novel data points.
  • Unlike K-Means, MoG can also detect ellipse-shape clusters.

cons

  • Need to find appropriate initialization of Gaussian parameters ϴ.
  • Need to know how many Gaussians (k) are needed in advance.
  • numerical problems might occur.

5. Mean-Shift

Considering drawbacks, from K-Means and MoG, that they can only decetect certain shapes of cluster (e.g. spherical, ellipse), one can use the Mean-shift clustering which is (1) completely data-drivien (independent of cluster shape) and (2) no need to define number of clusters k in advance. The algorithm will find suitable value for k.

5.1 Basic idea: Iterative mode search

The key component of Mean-Shift clustering is the Mode Search algorithm. Mode refers to the local maximum of the density of a given distribution, according to [3]. For searching the mode, the calculate the center of gravity (or ‘mean’) of points within a search window W. The center of window W is, then, shifted to the computed mean.

The mode search algorithm repeats these steps until the window W is no longer shifted, i.e. when the current window center is the mean of all points within the window. In short, the flow of algorithm is summarized as follows:

  • Iterative Mode Search

(1) Initialize random seed, and window W. In other words, randomly distributes windows over the feature space.

(2) compute the centor of gravity(mean) of the window W

(3) shift the search window to the computed mean

(4) repeat from the step 2 until convergence

The Fig 8 Illustrates the described mode-search algorithm when there is only one window in the feature space.

Fig 8. visual illustration of mode search algorithm, from Y. Ukrainitz & B. Sarel

As we can see in Fig 8, the initialized window W moves in the direction towards the most dense region within the window over iterations by the definition of ‘mode search’. This phenomenon is also well shown in the following figure.

Fig 9. results of mode(peak) search algorithm with three different starting points given two clusters in 2D feature space

5.2 Clustering

The approach to utilize the searched mean(s), which is also called mode, centor of gravity, or peak, is rather simple. Firstly, spread a number of windows over the feature space. Secondly, perform the mean search until it converges for all windows. Lastly, Points, which belong to windows converged to the same mode (peak) after iterations, are given the same cluster id. Let’s take a look at an example below for better understanding.

Figure 10. Illustration of clustering process of Mean Shift with multiple windows, from Y. Ukrainitz & B. Sarel

Note that the same cluster ID is given to not only points inside windows at Initial positions, but also to the points on the path of shifted windows during iterations.

Figure 11 below shows another example. Let’s say our feature space is given like the left panel, and we performed the Mean-shift clustering after initializing window positions.

Figure 11. Points in feature space (left) vs Mean-Shift result (right), from [1, 5]

The clustering result is shown on the right panel of Fig 11. Points with same colors are traversed by windows towards the same peak. Therefore, points with each color indicate an unique cluster.

5.3 Window size

By now, one might notice that the Mean-Shift clustering does not require us to define the number of clusters to be found, because it merges windows and finds optimal number of clusters itself after iterations. This is nice. However, there is a still one parameter that significantly influence the clustering performance, which is the window size (also called d the radius ‘r’ since our window is spherical). And of course, searching the optimal radius is not trivial.

Fig 12 is showing Mean-Shift clustering results by different radius ‘r’. As we can see the clustering qualities deviate a lot depends on the radius.

In order to successfully run the Mean-Shift clustering, therefore, searching the appropriate window size (r) must be preceded.

5.4 Pros and Cons

pros

  • Mean-Shift is a general, application-independent tool. This implies it can be applied to a feature space with any types of features.
  • Mean-Shift is completely model-fee. we do not need to assume any prior shape (spherical, elliptical, etc.) on clusters.
  • We just need to define a single parameter: window size
  • Find appropriate number of clusters by itself

cons

  • clustering output quality heavily depends on window size.
  • Searching the appropriate window size is not trivial.
  • Mean-shift is relatively computation-heavy compared to other approaches.

6. Reference

[1] RWTH Aachen, computer vision group

[2] CS 376: Computer Vision of Prof. Kristen Grauman

[3] S. Seitz
[4] Forsyth & Ponce

[5] Prof. Svetlana Lazebnik

[6] Prof M. Irani and R. Basri

[7] Prof. Li fei-fei

[8] David Lowe

[9] K-Means++

Any corrections, suggestions, and comments are welcome

--

--