Understanding K-mean Clustering In Depth

Ipsita Shee

Follow

Published in

Analytics Vidhya

4 min readDec 13, 2020

--

In this article, we are going to see how K-mean clustering actually works. I have taken a table containing sample data points and their coordinates or X and Y values. We will assign these data points to a specified number of clusters and assign these clusters their respective centroids by doing some calculations manually.

Suppose we have the following 6 data points(as shown in the diagram).

Sample data points. (Image source: Author)

For K-means clustering, we decide the “K” value or the number of clusters. Suppose here we choose to assign the given data points to two clusters, that is, we choose K = 2.

Initially, we will take data point number 1 (182,72) and data point number 2(170,56) as clusters “K1” and “K2” respectively.

“K1” and “K2” are clusters that have only one point currently, which is also their respective centroids. The values of the centroids are given in the table below.

Now we will find the Euclidean distance between each of the data points and the 2 clusters. The data point will belong to the cluster that lies closer to it.

The formula for calculating Euclidean distance. Here we will take (x,y ) as the values for initial centroids, and, (a,b) to be the values of the data points. (Image source: Author)

Let us start calculating!

Out first data point is 1 (182,72), which is also the only point in K1 and centroid of cluster 1 (K1). The same can be said for our second point (170,56) also, with respect to cluster 2(K2).

Now we will calculate the distance of point 3 from K1 and K2, and tabulate the results.