Hierarchical Clustering

Unlike Kmeans, in Hierarchical clustering we don’t need to define the number of clusters at the beginning.

Atulanand
CodeX
4 min readNov 3, 2022

--

Types

Agglomerative Hierarchical Clustering

The Agglomerative Hierarchical Clustering is the most common type of hierarchical clustering used to group objects in clusters based on their similarity. It’s a bottom-up approach where each observation starts in its own cluster, and pairs of clusters are merged as one moves up the hierarchy.

Suppose there are 4 data points. We will assign each of these points to a cluster and hence will have 4 clusters in the beginning.

Then, at each iteration, we merge the closest pair of clusters and repeat this step until only a single cluster is left.

Divisive Hierarchical Clustering

Divisive hierarchical clustering is not used much in solving real-world problems. It works in the opposite way of agglomerative clustering. In this, we start with all the data points as a single cluster. At each iteration, we separate the farthest points or clusters which are not similar until each data point is considered as an individual cluster. Here we are dividing the single clusters into n clusters, therefore the name divisive clustering.

It doesn’t matter if we have 10 or 1000 data points. All these points will belong to the same cluster at the beginning:

Now, at each iteration, we split the farthest point in the cluster and repeat this process until each cluster only contains a single point:

Hierarchical Clustering use two important parameters

  1. Measure of distance (similarity)

a. Similarity can be calculated using the below metrics

b. Hamming Distance

c. Manhattan Distance (Taxicab or City Block)

d. Minkowski Distance

2. Linkage Criteria

After selecting a distance metric, it is necessary to determine from where distance is computed. The linkage criteria refer to how the distance between clusters is calculated.

Single Linkage

The distance between two clusters is the shortest distance between two points in each cluster.

Complete Linkage

The distance between two clusters is the longest distance between two points in each cluster.

Average Linkage

The distance between clusters is the average distance between each point in one cluster to every point in other cluster.

Ward Linkage

The distance between clusters is the sum of squared differences within all clusters.

Algorithm Work-flow:

The basic algorithm of Agglomerative is straight forward.

  • Compute the proximity matrix
  • Let each data point be a cluster
  • Repeat: Merge the two closest clusters and update the proximity matrix
  • Until only a single cluster remains

Key operation is the computation of the proximity of two clusters

To understand better let’s see a pictorial representation of the Agglomerative Hierarchical clustering Technique. Lets say we have six data points {A,B,C,D,E,F}.

  • Step- 1: In the initial step, we calculate the proximity of individual points and consider all the six data points as individual clusters as shown in the image below.

Agglomerative Hierarchical Clustering Technique

  • Step- 2: In step two, similar clusters are merged together and formed as a single cluster. Let’s consider B,C, and D,E are similar clusters that are merged in step two. Now, we’re left with four clusters which are A, BC, DE, F.
  • Step- 3: We again calculate the proximity of new clusters and merge the similar clusters to form new clusters A, BC, DEF.
  • Step- 4: Calculate the proximity of the new clusters. The clusters DEF and BC are similar and merged together to form a new cluster. We’re now left with two clusters A, BCDEF.
  • Step- 5: Finally, all the clusters are merged together and form a single cluster.

The Hierarchical clustering Technique can be visualized using a Dendrogram.

A Dendrogram is a tree-like diagram that records the sequences of merges or splits.

--

--

Atulanand
CodeX

Data Scientist @Deloitte | Infosys Certified Machine Learning Professional | Google Certified Data Analytics