Customer Segmentation Using K-Means Clustering in R

A beginner’s guide to the great and powerful k-means algorithm

Jason Chong
The Startup

--

Photo by Daniel Bernard on Unsplash

In this article, we will discuss k-means clustering, an unsupervised learning algorithm and learn how to implement it in R.

Introduction to unsupervised learning and k-means clustering

First of all, what is unsupervised learning?

In contrast to supervised learning where the label (output) of a predictive model is explicitly specified in advance, unsupervised learning allows the algorithm to identify the clusters within the data itself and subsequently label them accordingly.

K-means clustering is an example of an unsupervised learning algorithm and it works as follows:

  1. Choose the number of clusters, K (this is what the k stands for in k-means clustering), which the data are to be divided into
  2. Assign arbitrarily K number of cluster centres
  3. Assign data points to cluster centre which is nearest to them, most commonly using the Euclidean distance formula
  4. Once all the data points have been clustered according to their cluster centres, calculate the centroid of each cluster, using the mean values of the data points…

--

--