What is the Difference Between Hierarchical and Partitional Clustering

Emre Can Yesilyurt
Machine Learning Turkiye
4 min readMar 26, 2022

This article will mainly explain hierarchical and partitional clustering analysis and their differences. To understand this article, you must have an idea of cluster analysis.

https://www.mygreatlearning.com/blog/clustering-algorithms-in-machine-learning/

Let’s start by hierarchical methods;

Hierarchical clustering gathers objects with the same features or oppositely divides data step by step. The main aim is to find the global optimum by finding all the possibilities of state-space.
Hierarchical clustering has two different approaches, Agglomerative and Divisive clustering.

Agglomerative clustering(also named bottom-up) begins with assuming that all the data points are different, and each data point is a cluster. After, It merges the data points that much similar to between them. The data point associates each step by similarity. AGNES is an agglomerative clustering algorithm that is much more famous than others.

Flow chart of agglomerative hierarchical clustering. https://www.researchgate.net/figure/Flow-chart-of-agglomerative-hierarchical-clustering_fig1_313238175

Divisive clustering (also named top-down) begins with gathering all the data points in one cluster. After, it divides the data points who much difference between them. DIANA is a divisive clustering algorithm that is much more famous than others.

https://subscription.packtpub.com/book/big_data_and_business_intelligence/9781789952292/2/ch02lvl1sec13/agglomerative-versus-divisive-clustering

It is tough to find a global optimum by researching all the possibilities in big data. For this reason, we can use partitional cluster analysis.

Partitional Cluster Analysis

Partitional Cluster Analysis uses the heuristic clustering approach. The heuristic clustering approach assumes one point as a global optimum, and after it optimizes that point each step.
K-Means is known to every person who does scientific work and is a popular algorithm that uses a heuristic approach. However, the biggest bottleneck for this approach is to give cluster amount by begin.
The user gives the max iteration number or convergence criterion at the start. Because the loop has to end somehow. Convergence criterion, when the algorithm arrives acceptable error rating, it ends the algorithm.
You can identify the cluster centroids amount through the value of ‘k’ or use the initiator algorithm rules, such as “kmeans++”. After identifying the centroids, it calculates arithmetic means to all objects by features value. The next step is the decision to move to objects of different centroids, and the algorithm applies this transaction until the max iteration or criterion convergence arrives.

Partitioning Method (K-Mean) https://www.geeksforgeeks.org/partitioning-method-k-mean-in-data-mining/

What is the difference between hierarchical and partitional algorithms?

The most significant difference between hierarchical and partitional clustering is the running time. The partitional algorithms handle one piece of data, finding this piece’s optimum point, and it’s faster than the hierarchical method. (If you are working embedded system, this difference can be a life-changer.) However, the higher the value of k, the higher the transaction cost.
Another difference is input-output. This difference consists of different approaches of the hierarchical and partitional algorithms.
The partitional algorithm finds many optimum points, and due to these optimum points, it generally gives us the local optimum, not the global optimum. However, sometimes if you are so lucky, this local optimum can be the global optimum. Or the global optimum can be found by deterministic annealing and genetic algorithms.

https://math.stackexchange.com/questions/4297323/number-of-global-optima-in-single-solution-metaheuristics

But hierarchical methods handle all of the data, and it provides an optimum point that handles it. Also, for this reason, there is an assumption difference between the two algorithms, and due to handling all the data in one step, the result will equal the global optimum.

Additional information: The hierarchical clustering is sensitive to outliers. Because the clusters merge the way cumulative, the cluster containing outlier data accumulates too slowly. This reason can advise us of the cluster that contains outlier data.

Thank you for reading, and you can reach me from the links below.

--

--