Unsupervised Learning: Three Main Clustering Methods

Published in

CodeX

6 min readJan 28, 2022

Photo by Mikael Blomkvist: https://www.pexels.com/photo/a-person-holding-a-digital-tablet-6476585/

Unsupervised learning is a field of machine learning that uses algorithms on data where the labels are unknown. This makes it suitable for specific tasks, such as outlier analysis and data reduction. One of the major paradigms in unsupervised learning is clustering. This article discusses the ins and outs of the three main clustering methods, namely:

Partitioning based methods
Hierarchical based methods
Density based methods

Clustering basics

But first things first… Before going to the specific methods, it is important to get a general understanding of what clustering actually is. Clustering identifies a finite set of categories, classes or groups in a given data set. It groups data points based on their similarity. Thus, the goal is that data points within the same cluster shall be as similar as possible and data points of different clusters shall be as dissimilar as possible.

Cluster types

When comparing different cluster methods with each other, it is important to know the types of clusters that can exist in a data set. Depending on the data, clusters can look very different. Some data contains very notable clusters, while other data contains overlapping and vague clusters. There are generally two types of cluster shapes to distinguish:

Convex clusters: a cluster shape with a curvature that extends outwards or bulges out, think for example of a circle or rectangle
Non-convex clusters: a cluster shape with a curvature that extends or bends inward, think for example of a star

Clustering methods

There are three main clustering methods in unsupervised learning, namely partitioning, hierarchical and density based methods. Each method has its own strategy of separating clusters. Depending on the data, one method can be superior to the other method. However, experimentation is key in the process of finding the right strategy for your data set.

Partitioning based methods

Partitioning based methods divide the data set in a predefined number of partitions (clusters). A data point belongs to the cluster it is closest to. There are a lot algorithms and variations on algorithms that use this method. Examples are K-Means, K-Medoids, CLARA (Clustering large applications) and CLARANS (Clustering large applications based on randomized search).

One of the most famous partitioning based methods is the K-Means algorithm. There are a number of variations on this algorithm, but the intuitive idea remains the same.

Choose the number of clusters (k)
Put k cluster representative points arbitrarily in the data set
With each iteration, the algorithm moves the cluster representative to the mean of the closest data points around it
When the cluster representatives stop moving after several iterations, it means that the optimal cluster partitioning is found.

The biggest challenge in partitioning based methods is that the number of clusters the algorithm searches for needs to be specified preliminary. The number of clusters need to be provided as a parameter, which can be quite hard. One way to do this is using human judgement to estimate the number of clusters. It is also possible to do this more precise by experimenting with different number of clusters and using a quantitative evaluation method (e.g. silhouette score).

Partitioning based methods can be used for a wide variety of applications. An example is segmentation of customer groups in a commercial application. It is also possible to do document clustering. Keep in mind that for document clustering, very specific preprocessing steps are required to transform a set of documents to a representation that the algorithm can understand.

Finally, it is almost impossible to find non-convex clusters in a data set with partitioning based methods. Whenever your data set requires the finding of such clusters, you should look at other methods.

Hierarchical based methods

Hierarchical based methods are usually a bottom-up approach when it comes to clustering. It seeks to create a hierarchy of clusters by merging clusters that are closest to each other. This can be both a top-down and a bottom-up approach. Examples of algorithms are Agglomerative (bottom-up) and Divisive (top-down).

Let’s zoom in on Agglomerative clustering. This is a bottom-up approach, but the general idea for the top-down (Divisive) approach remain the same. The Agglomerative clustering algorithm performs the following steps:

Calculate the distance between each cluster (in the beginning each data point represents a single cluster)
Merge the clusters that are closest to each other
Repeat until there is one big cluster left

The biggest pro of this method is that you don’t have to specify the k preliminary, as with partitioning based methods. This reduces the need of human input, which is especially helpful when you don’t know the number of clusters you are looking for. Also, hierarchical based methods are able to find some of the arbitrarily shaped clusters (non-convex).

There are several methods to calculate the distance between clusters (like single linkage, complete linkage and ward), but I won’t go in depth about that. However, it is important to understand that with each iteration, the distance between all the clusters needs to be calculated. This increases the computational complexity immensely, making it really slow on big data sets.

The biggest con is that it is not able to detect noise. When the algorithm is in its final iterations, outliers that are the furthest away are seen as separate clusters. This is because it only takes distance into account.

Density based methods

Probably the most promising of all are the density based methods. The basic idea of density based methods is that clusters are dense areas separated by areas of low density. This idea will get more clear when we zoom in on a specific algorithm. There are a lot of density based clustering algorithms. Examples are DBSCAN (Density based spatial clustering with noise) and OPTICS (Ordering points to identify the clustering structure).

Let’s zoom in on DBSCAN. DBSCAN has two input parameters, the epsilon and the minimal points. The epsilon is the space the algorithm searches in around a data point. The minimal points is the number of data points the target needs to have in the epsilon to be part of the cluster. The DBSCAN algorithm performs the following steps iteratively:

Select a random data point (target)
Search how many data points there in the epsilon of the target
Are there more than the minimum amount of points in the epsilon? Then the algorithm starts to form a cluster and chooses a next target in the epsilon of the current target.
Are there not enough points in the epsilon of the target? Then the target is considered noise and the algorithm chooses a next target randomly.

Key concepts in this algorithm is the difference between core objects, border objects and noise. The image below demonstrates the difference well.

Different object types in density based clustering

The main advantage of using density based methods over the other methods is that it can detect noise. Also, because it looks for dense areas, it can find non-convex clusters. However, it does not perform well on data sets with large differences in density.

Conclusion

Your head is probably about to explode after reading this article, but we’ve managed to come a long way in covering the basics of clustering. This article was intended to give a brief explanation over the three main clustering methods. It isn’t an exhaustive list, there will be much development in the space as technologies advances and new research is added. However, in order to understand future new methods, it is of utmost importance to get a grasp of the basics.