Understanding K-mean Clustering In Depth
In this article, we are going to see how K-mean clustering actually works. I have taken a table containing sample data points and their coordinates or X and Y values. We will assign these data points to a specified number of clusters and assign these clusters their respective centroids by doing some calculations manually.
Suppose we have the following 6 data points(as shown in the diagram).
For K-means clustering, we decide the “K” value or the number of clusters. Suppose here we choose to assign the given data points to two clusters, that is, we choose K = 2.
Initially, we will take data point number 1 (182,72) and data point number 2(170,56) as clusters “K1” and “K2” respectively.
“K1” and “K2” are clusters that have only one point currently, which is also their respective centroids. The values of the centroids are given in the table below.
Now we will find the Euclidean distance between each of the data points and the 2 clusters. The data point will belong to the cluster that lies closer to it.
Let us start calculating!
Out first data point is 1 (182,72), which is also the only point in K1 and centroid of cluster 1 (K1). The same can be said for our second point (170,56) also, with respect to cluster 2(K2).
Now we will calculate the distance of point 3 from K1 and K2, and tabulate the results.
Since the distance of the data point number 3 from K2 is less than the distance between it and K1, it will belong to the second cluster, which is K2.
The cluster K2 now has two data points, point 2 (170,56) and point 3 (168,60). Now we will recalculate the new centroid for K2.
The new centroid for K2 is (169,58).
Now we will find the distance of data point 4 from centroids of K1 and K2(the new centroid).
Since the distance of the data point number 4from K1 is less than the distance between it and K1, it will belong to the first cluster, which is K1.
The cluster K1 now has two data points, point 1(185,72) and point 4(179,68). Now we will recalculate the new centroid for K1.
The new centroid for K1 is (182,70).
Now we will calculate the distance for data point 5 (182,72).
We can clearly see that data point 5 belongs to K1. Now we will recalculate the centroid for K1.
The new centroid for K1 is (182,71). Now we will calculate the distance for our sixth and last point (188,77).
We can see that data point 6 belongs to the first cluster, that is K1. Now we recalculate the centroid for the K1 cluster.
The new centroid for K1 is (185,74).
The final clusters and centroids are as shown in the table below.
Let us summarize the steps.
I hope this helps you have a better understanding of K-mean clustering.
Before You Go
Thanks for reading! If you want to get in touch with me, feel free to reach me on ipsitashee4@gmail.com or my LinkedIn Profile.