Hierarchical Clustering

4 min readJul 28, 2021

In our previous post we have studied about the intuition behind clustering and also about one of the type of clustering algorithm i.e. K-Means Clustering.

In today’s blog, we will be having a look at the second major type of Clustering i.e. Hierarchical Clustering.

Hierarchical clustering, also known as hierarchical cluster analysis, is an algorithm that groups similar objects into groups called clusters. The end goal is a set of clusters, where each cluster is distinct from each other cluster, and the objects within each cluster are broadly similar to each other.

Hierarchical Clustering. Image by author

The cluster outcome obtained after Hierarchical Clustering is similar to that of K-Means however the approach used to obtain those cluster totally differs.

Hierarchical Clustering is a process of maintaining memory which stores the process followed to obtain the clusters. The main point of Hierarchical Clustering is to make the dendrogram (A dendrogram is a diagram representing a tree), because you need to start with one single cluster, then work your way down to see the different combinations of clusters until having a number of clusters equal to the number of observations. And it’s the dendrogram itself that allows to find the best clustering configuration.

Types of Hierarchical Clustering: It is further divided into 2 types namely,

Agglomerative Hierarchical Clustering
Divisive Hierarchical Clustering

Agglomerative Hierarchical Clustering: Bottom-Up Approach. Steps involved in determining clusters are as follows:

Make each data point a single cluster, so that forms a total of N clusters.
Take the 2 closest data points (using appropriate distance metrics) and make them one cluster, now we are left with N-1 clusters.
Take the 2 closest clusters and make them one cluster, now we are left with N-2 clusters.
Repeat Step 3 until there is only one cluster.
STOP

Different ways to measure distance between 2 clusters:

Simple Linkage: In the Single Linkage method, the distance of two clusters is defined as the minimum distance between an object (point) in one cluster and an object (point) in the other cluster.
Complete Linkage: In the Complete Linkage technique, the distance between two clusters is defined as the maximum distance between an object (point) in one cluster and an object (point) in the other cluster.
Average Linkage: In the Average Linkage technique, the distance between two clusters is the average distance between each clusters’ point to every point in the other cluster.
Centroid Linkage: In the Centroid Linkage approach, the distance between the two sets or clusters is the distance between two mean vectors of the sets (clusters).
Ward’s Linkage: Ward’s Linkage method is the similarity of two clusters. Which is based on the increase in squared error when two clusters are merged, and it is similar to the group average if the distance between points is distance squared.

Divisive Hierarchical Clustering: Top-Down Approach. Steps involved in determining clusters are as follows:

Initially, all points in the dataset belong to one single cluster.
Partition the cluster into two least similar cluster.
Repeat Step 2 i.e. Proceed recursively to form new clusters until the desired number of clusters is obtained.
STOP

Examples:

Market Segmentation
Fraud detection
To simply identify some clusters of your customers in your company or business.

Selecting K or The Appropriate Number of Clusters:

We can use both Elbow Method or Dendrogram (it’s faster to try both the methods), just to double check that optimal number. However if we really need to test one method, we should go for the elbow method. The dendrogram is not always the easiest way to find the optimal number of clusters. But with the elbow method it’s very easy, since the elbow is most of the time very obvious to spot and select optimal number of clusters.

Advantages of Hierarchical Clustering algorithm:

Easy to understand and implement.
No need to specify optimal number of cluster since we can obtain desired number of cluster by setting an appropriate threshold value and cutting the dendrogram at the proper level.
Dendrograms are very easy to interpret.

Disadvantages of Hierarchical Clustering algorithm:

Does not work well with large datasets.
Selection of appropriate similarity measure between the cluster is critical.
In hierarchical Clustering, once a decision is made to combine two clusters, it can not be undone.
Prone to noise and outliers.

Implementation: Refer the following link for Python and R implementation of Hierarchical Clustering:

Hierarchical Clustering

Hierarchical Clustering

Written by Vivek Salunkhe