Day 32 of 100DaysofML

Charan Soneji
100DaysofMLcode
Published in
2 min readJul 18, 2020

DIANA. So this is in continuation to yesterday’s blog which I covered on AGNES. Now, the output of both the algorithms are the same, we end up with a dendrogram at the end but it is the approach that we use in order to come up with the dendrogram. This could be a top-down approach or a bottom up approach.

DIANA has a top-down approach form of hierarchical clustering where all data points are initially assigned a single cluster. Further, the clusters are split into two least similar clusters. This is done recursively until clusters groups are formed which are distinct to each other.

Let us dive into the implementation using python.

#Importing libraries
import pandas as pd
import numpy as np

Next, let us initialize and import our dataset:

#Importing library to normalize the data of the datasetfrom sklearn.preprocessing import normalize
data_scaled = normalize(data)
data_scaled = pd.DataFrame(data_scaled, columns=data.columns)
data_scaled.head()
Dataset head
#Importing library to plot the dendogramimport scipy.cluster.hierarchy as shc
plt.figure(figsize=(10, 7))
plt.title("Dendrograms")
#Plotting dendogram with normalized datadendogram = shc.dendrogram(shc.linkage(data_scaled, method='ward'))

In the above syntax, while creating our dendrogram, we have specified the method as ‘ward’ which means that we take the general agglomerative hierarchical clustering procedure, where the criterion for choosing the pair of clusters to merge at each step is based on the optimal value of an objective function.

That is just an overview of DIANA or Divisive clustering model.

One important point to note while working with datasets is that divisive clustering is good at identifying large clusters while agglomerative clustering is good at identifying small clusters.

Check out the below given video to understand the algorithm.

That’s it for today. Keep Learning.

Cheers.

--

--