Understand K-Means Classification Algorithm

Understand the K-Means model by creating one from scratch

Andrew Zhu (Shudong Zhu)
CodeX
Published in
6 min readMay 24, 2022

--

K-Means classification

K-Means model is one of the unsupervised machine learning models. This model is usually used to partition observed data into k clusters. You give the model a bunch of data with defined features and tell it how many clusters you want it to output. The model will classify the dataset into the number of clusters assigned by you.

Since K-Means is a non-supervised model, means you don’t have to label your train dataset and the model will automatically classify the input data.

In this article, you will read:

  1. How the K-Means model works.
  2. How to use the K-Means model from the scikit-learn package.
  3. Build a K-Means classifier from scratch using Python.

How K-Means works

The idea underlining is pretty simple and straightforward, while the result is amazing. The core ideas of k-means are:

  1. Guess some center points.
  2. Repeat until no new center points are found:
    2.1 Assign the points to the currently known centers;
    2.2 Set the new center to the mean of current points;

Here let’s go through the process using a real case step by step. Assume I have 9 data points:

points = [[2,3]  ,[3,4]  ,[1,2]
,[10,12],[12,10],[13,11]
,[10,3] ,[11,2] ,[12,4]]

Draw out these points:

initialize 9 points

Say I want to classify the 9 points into 3 clusters.

Step 1. Randomly generate the 3 points as the centers of our desired clusters. At this stage, those orange dots are obviously not the desired center points.

Randomly give 3 center points

Step 2. Iterate through all the points(exclude center points), and calculate the point’s distance to the three center points. Find the nearest center by measuring…

--

--