ML: K-Means Clustering

Jeheonpark
The Startup
Published in
4 min readSep 23, 2020

--

K-means is partitional clustering, the method to partition n data points into k partitions. It is a weird term because clustering is partitioning the data. Actually, partitional clustering gets through the whole data from the beginning to find the k partition. On the other hand, hierarchical clustering starts from a single point. Now, let’s look at K-means

K-means

K-means is just finding the k centroid of the clusters. Centroid means the average of each coordinate of data points in the cluster. The initialization of centroids is really important. I will explain it later and we will just start with random initialization. We pick the first centroid randomly. We assign each data points to the centroid that is close to it. We have k groups. We can calculate the new centroid by averaging coordinates. We reassign the data points. Repeat the process until the centroids do not change.

  1. Initialization (Randomly Pick the point as the centroid)
  2. Assign each data point to the centroid that is close to it.
  3. Calculate the new centroid by averaging coordinates(We can use other statistics other than the mean, K-medoid)
  4. Repeat 2,3 until it is converged. (The centroids are not changed.)

Note: Using K-means with categorical values is not recommended because the distance and centroid problem is not easy to solve. => K-medoid/PAM can be used to easily find the centroid.

--

--

Jeheonpark
The Startup

Jeheon Park, Software Engineer at Kakao in South Korea