Day 31 of 100DaysofML

Charan Soneji
100DaysofMLcode
Published in
4 min readJul 17, 2020

AGNES and DIANA. Sound like siblings to be honest :P. But no, they aren’t.
AGNES and DIANA are types of clustering algorithms and they both end up giving the same result, it's just that they have different approaches. I shall be covering about AGNES in today’s blogs.

What is AGNES?
For all those reading here, I just hope y’all have a basic understanding of what clustering algorithm does and how they work. Alright, so I’ll try to explain the concept with the help of a simple example. Shoutout to draw.io for helping me with the diagram.

Okay, let us consider 5 random points and I have plotted them as you may see below:

5 random points

Each of these points are separated by a specific distance and our aim is to group them into clusters.

Now, let us start grouping them based on the distance between the points. Note that, I’m going to just show it diagrammatically but while implementing it, you need to calculate the distance between the points and then cluster them based on the distance between the points. Now, if you simply look at the diagram, you can see the points which are close to each other (visually) and cluster them or put them into a common cluster.

First round of clustering

We have clustered them based on the distance between the nodes and put them into a common cluster based on the distance, but we can further put them into a cluster. Let us try understanding using the diagrams given below:

Left with 2 clusters after clustering

Now, we can see that we are left with 2 main clusters and we can put into one main cluster.

One main cluster

So like you can see, based on the distance between the points, we can group the points into clusters. AGNES works on the same principle but we build a diagram known as dendrogram. See the picture given below to understannd the same analogy.

Dendrogram representation

From a given set of points, we extract the dendrogram based on the distance between the points.

Let us try to implement this using Python. We are using an inbuilt dataset of sklearn called make_blobs

#Import basic libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
#Importing dataset
from sklearn.datasets import make_blobs

We have imported our libraries and the dataset library, we shall now load the dataset into a variable.

#Creating blobs in the dataset with 4 different cluster centers the function make_blobs is used to make 4 different clusters blobsdata = make_blobs(n_samples=200, n_features=2, centers=4, cluster_std=1.6, random_state=50)

You can refer to the documentation of sklearn to understand the given line but we have basically define 4 different cluster centers.

#Creatinng an array with all the data pointspoints = data[0]#Creating a visual scatter plot with all the points that have been chosenplt.scatter(data[0][:,0], data[0][:,1], c=data[1], cmap='viridis')
plt.xlim(-15,15)
plt.ylim(-15,15)

The scatter plot that we obtain is shown below:

Scatter plot for the given points.

At this stage, if you get confused between K-Means and AGNES, just understand this. The above plots is just to mention the different clusters but K-Means and Hierarchical Agglomerative clustering work in very different ways. In AGNES, we begin with every point in our dataset as a “cluster.” Then we find the two closest points and combine them into a cluster. Then, we find the next closest points, and those become a cluster. We repeat the process until we only have one big giant cluster.

#Import AGNES librariesimport scipy.cluster.hierarchy as sch
from sklearn.cluster import AgglomerativeClustering

We have now imported our AGNES libraries from sklearn.

#Creating the dendogram classifierdendrogram = sch.dendrogram(sch.linkage(points, method='ward'))#Creating the different clustershc = AgglomerativeClustering(n_clusters=4, affinity = 'euclidean', linkage = 'ward')hc = AgglomerativeClustering(n_clusters=4, affinity = 'euclidean', linkage = 'ward')#Creating the dendogram plot for the given set of data points which are fitted on the classifiery_hc = hc.fit_predict(points)

The dendrogram that we obtain is shown below:

Dendrogram obtained

I hope I covered the essentials of AGNES and covered the concept for the most. Thanks for reading. Keep Learning.

Cheers.

--

--