UNSUPERVISED LEARNING IN PYTHON: Hierarchical clustering / t-SNE

Hierarchical clustering(also called hierarchical cluster analysis or HCA) is a method of cluster analysis which seeks to build a hierarchy of clusters. Strategies for hierarchical clustering generally fall into two types:

Shawn

4 min readAug 31, 2022

Agglomerative: This is a “bottom-up” approach: each observation starts in its own cluster, and pairs of clusters are merged as one moves up the hierarchy.
Divisive: This is a “top-down” approach: all observations start in one cluster, and splits are performed recursively as one moves down the hierarchy.

How it work:

Step 1: First, we assign all the points to an individual cluster

Step 2: Next, we will look at the smallest distance in the proximity matrix and merge the points with the smallest distance. We then update the proximity matrix:

Here, the smallest distance is 3 and hence we will merge point 1 and 2:

Repeating:

Hierarchical Clustering | Hierarchical Clustering Python

It is crucial to understand customer behavior in any industry. I realized this last year when my chief marketing…

www.analyticsvidhya.com

Here we got 3 clusters when distance =15

The linkage method.https://dataaspirant.com/hierarchical-clustering-algorithm/

Case Study:

Use the linkage() function to obtain a hierarchical clustering of the grain samples, and use dendrogram() to visualize the result. A sample of the grain measurements is provided in the array samples, while the variety of each grain sample is given by the list varieties.

Case 2:

use the fcluster() function to extract the cluster labels for this intermediate clustering, and compare the labels with the grain varieties using a cross-tabulation.

The hierarchical clustering has already been performed and mergings is the result of the linkage() function. The list varieties gives the variety of each grain sample.

t-SNE

t-Distributed Stochastic Neighbor Embedding

It use for Dimensionality reduction, same as PCA.

PCA is a linear dimensionality reduction method, but if the correlation between features is nonlinear, using PCA may lead to underfitting.

t-SNE is also a dimensionality reduction method, but uses a more complex formula to express the relationship between high and low dimensions. t-SNE mainly approximates the high-dimensional data with the probability density function of the Gaussian distribution, while the low-dimensional data is approximated by the t distribution method, and uses the KL distance to calculate the similarity, and finally uses the gradient descent (or stochastic gradient). drop) to find the best solution.

How to Use t-SNE Effectively

A popular method for exploring high-dimensional data is something called t-SNE, introduced by van der Maaten and Hinton…

distill.pub

t-SNE is not a linear dimensionality reduction and will take much longer to execute than PCA
The distance between groups may be meaningless
What Cluster Size Does Not Mean in a t-SNE Chart
The t-SNE algorithm is random, and multiple experiments can produce different results, while the common PCA is deterministic, and the result after each calculation is the same.

https://sonraianalytics.com/what-is-tsne/

If you like my content please clapped for me and follow me, thank you :)

There’ll be more article and more content related to Data Science. Hope you enjoy it!

Reference:

UNSUPERVISED LEARNING IN PYTHON/datacamp/Benjamin Wilson

Hierarchical Clustering | Hierarchical Clustering Python

It is crucial to understand customer behavior in any industry. I realized this last year when my chief marketing…

www.analyticsvidhya.com

How the Hierarchical Clustering Algorithm Works

Hierarchical Clustering is an unsupervised Learning Algorithm , and this is one of the most popular clustering…

dataaspirant.com

https://mortis.tech/2019/11/program_note/664/

UNSUPERVISED LEARNING IN PYTHON: Hierarchical clustering / t-SNE

Hierarchical clustering(also called hierarchical cluster analysis or HCA) is a method of cluster analysis which seeks to build a hierarchy of clusters. Strategies for hierarchical clustering generally fall into two types:

Hierarchical Clustering | Hierarchical Clustering Python

It is crucial to understand customer behavior in any industry. I realized this last year when my chief marketing…

t-SNE

How to Use t-SNE Effectively

A popular method for exploring high-dimensional data is something called t-SNE, introduced by van der Maaten and Hinton…

Hierarchical Clustering | Hierarchical Clustering Python

It is crucial to understand customer behavior in any industry. I realized this last year when my chief marketing…

How the Hierarchical Clustering Algorithm Works

Hierarchical Clustering is an unsupervised Learning Algorithm , and this is one of the most popular clustering…

Written by Shawn