Customer Segmentation Using K-Means Clustering in R

A beginner’s guide to the great and powerful k-means algorithm

Published in

The Startup

7 min readJan 15, 2021

In this article, we will discuss k-means clustering, an unsupervised learning algorithm and learn how to implement it in R.

Introduction to unsupervised learning and k-means clustering

First of all, what is unsupervised learning?

In contrast to supervised learning where the label (output) of a predictive model is explicitly specified in advance, unsupervised learning allows the algorithm to identify the clusters within the data itself and subsequently label them accordingly.

K-means clustering is an example of an unsupervised learning algorithm and it works as follows:

Choose the number of clusters, K (this is what the k stands for in k-means clustering), which the data are to be divided into
Assign arbitrarily K number of cluster centres
Assign data points to cluster centre which is nearest to them, most commonly using the Euclidean distance formula
Once all the data points have been clustered according to their cluster centres, calculate the centroid of each cluster, using the mean values of the data points…

Customer Segmentation Using K-Means Clustering in R

A beginner’s guide to the great and powerful k-means algorithm

Introduction to unsupervised learning and k-means clustering

Written by Jason Chong