K-Means Clustering Explained Visually In 5 Minutes

Walk through this unsupervised learning algorithm with images and python code

GreekDataGuy
DataSeries

--

Photo by Catarina Sousa from Pexels

K-Means is an unsupervised clustering algorithm, which allocates data points into groups based on similarity.

It’s intuitive, easy to implement, fast, and classification are MECE.

In my work, I’ve used it to help find clusters (aka. select labels) in situations when I haven’t been a domain expert.

Let us walkthrough an example, understand how it works, and add it to our data science toolbox.

How does it work?

K-means is unsupervised model so the data is unlabelled. But the model mathematically allocates each data point to a cluster.

Upon initializing the model, we must pre-decide on a number of clusters. Having to do this in advance is a drawback of the model. I’ll choose k=2 (aka. 2 clusters) for this example.

Here we have some 2-dimensional unlabelled data.

1. Randomly create centroids (cluster centres) in the same…

--

--