An Introduction to K-Means Clustering

3 min readOct 20, 2022

My last note looked at how neural networks can be used for classification models. This week, we’ll step away from supervised learning (which was the focus of the last 7 notes) and look at a specific type of unsupervised learning called K-means clustering.

What is K-Means Clustering?

K-means clustering is a very simple algorithm that segments a set of data into K number of groups.

How does K-Means work?

For a really great visual of how K-means works, I recommend watching the first 3 minutes and 45 seconds of Andrew Ng’s lecture on the topic.

It’s one of the most simple machine learning algorithms, and it’s only a few simple steps:

Take all of the data points that you want to cluster/segment.
Cluster Initialization: Randomly place K number of “cluster centroids” into the data space with your data points. You can use the same location as K random data points as the starting location for your cluster centroids.
Cluster Assignment: Assign each data point to the cluster with the closest centroid.
Move Centroids: Take the average of each cluster’s data points’ locations to find the center of that cluster, then move that cluster’s centroid to the center of the cluster.
Repeat steps 3 & 4 until the clusters’ memberships stop changing (i.e., until the clusters stabilize).

What can K-Means be used for?

K-means can be used to segment many types of data sets. Even if a data set does not have naturally separated clusters with space in between them, you can use K-means to create clusters based on the different dimensions/features of your data, as is often the case in segmenting markets, customers, or products.

For example, you could use K-means to create a behavioral segmentation of a business’ customers based on features like average weekly spend, average purchase cost, and frequency of purchases.

How do I know it worked?

Before applying K-means to a data set, it’s always good to try and visualize the data. Afterwards, you can visualize the data to see if the clustering passes the common sense test.

You can run K-means multiple times on your data set to see if you end up with roughly the same clusters.

You can try different values of K to see if one works better for your data set than another. For example, maybe when you use 4 clusters instead of 3, you end up with clusters that are more stable (i.e., similar compared to other iterations).

Scaling Up

In Andrew Ng’s video, we are looking at data that is clustered based on 2 dimensions (i.e., two features), but K-means will work with as many features as you would like to include in your segmentation. A tradeoff is that it becomes more difficult to visualize.

Up Next

The next note in this series will look at how supervised learning can be used for anomaly detection.