K Medoids Clustering — An approach to Unsupervised Learning Algorithm.

Zubair Ahmed Saood Ahmed Ansari
4 min readOct 20, 2021

--

The k-medoids problem is a clustering problem similar to the k-means. The name was coined by Leonard Kaufman and Peter J. Rousseeuw with their PAM algorithm. Both the k-means and k-medoids algorithms are partitional (breaking the dataset up into groups) and attempt to minimize the distance between points labeled to be in a cluster and a point designated as the center of that cluster. In contrast to the k-means algorithm, k-medoids chooses actual data points as centers (medoids or exemplars), and thereby allows for greater interpretability of the cluster centers than in k-means, where the center of a cluster is not necessarily one of the input data points (it is the average between the points in the cluster). Furthermore, k-medoids can be used with arbitrary dissimilarity measures, whereas k-means generally requires Euclidean distance for efficient solutions. Because k-medoids minimizes a sum of pairwise dissimilarities instead of a sum of squared Euclidean distances, it is more robust to noise and outliers than k-means.

k-medoids is a classical partitioning technique of clustering that splits the data set of n objects into k clusters, where the number k of clusters assumed known a priori (which implies that the programmer must specify k before the execution of a k-medoids algorithm). The “goodness” of the given value of k can be assessed with methods such as the silhouette method.

The medoid of a cluster is defined as the object in the cluster whose average dissimilarity to all the objects in the cluster is minimal, that is, it is a most centrally located point in the cluster.

Algorithm:

  1. Initialize: select k random points out of the n data points as the medoids.
    2. Associate each data point to the closest medoid by using any common distance metric methods.
    3. While the cost decreases:
    For each medoid m, for each data o point which is not a medoid:
    1. Swap m and o, associate each data point to the closest medoid, recompute the cost.
    2. If the total cost is more than that in the previous step, undo the swap.

Let’s consider the following example:

If a graph is drawn using the above data points, we obtain the following:

Step 1:
Let the randomly selected 2 medoids, so select k = 2 and let C1 -(4, 5) and C2 -(8, 5) are the two medoids.

Step 2: Calculating cost.
The dissimilarity of each non-medoid point with the medoids is calculated and tabulated:

Each point is assigned to the cluster of that medoid whose dissimilarity is less.
The points 1, 2, 5 go to cluster C1 and 0, 3, 6, 7, 8 go to cluster C2.
The Cost = (3 + 4 + 4) + (3 + 1 + 1 + 2 + 2) = 20

Step 3: randomly select one non-medoid point and recalculate the cost.
Let the randomly selected point be (8, 4). The dissimilarity of each non-medoid point with the medoids — C1 (4, 5) and C2 (8, 4) is calculated and tabulated.

Each point is assigned to that cluster whose dissimilarity is less. So, the points 1, 2, 5 go to cluster C1 and 0, 3, 6, 7, 8 go to cluster C2.
The New cost = (3 + 4 + 4) + (2 + 2 + 1 + 3 + 3) = 22
Swap Cost = New Cost – Previous Cost = 22 – 20 and 2 >0

As the swap cost is not less than zero, we undo the swap. Hence (3, 4) and (7, 4) are the final medoids. The clustering would be in the following way

The time complexity is O(k*(n-k)²).

Advantages:

  1. It is simple to understand and easy to implement.
  2. K-Medoid Algorithm is fast and converges in a fixed number of steps.
  3. PAM is less sensitive to outliers than other partitioning algorithms.

Disadvantages:

  1. The main disadvantage of K-Medoid algorithms is that it is not suitable for clustering non-spherical (arbitrary shaped) groups of objects. This is because it relies on minimizing the distances between the non-medoid objects and the medoid (the cluster centre) — briefly, it uses compactness as clustering criteria instead of connectivity.
  2. It may obtain different results for different runs on the same dataset because the first k medoids are chosen randomly.

--

--