Customer Segmentation Using K-Means Clustering in R
A beginner’s guide to the great and powerful k-means algorithm
Published in
7 min readJan 15, 2021
In this article, we will discuss k-means clustering, an unsupervised learning algorithm and learn how to implement it in R.
Introduction to unsupervised learning and k-means clustering
First of all, what is unsupervised learning?
In contrast to supervised learning where the label (output) of a predictive model is explicitly specified in advance, unsupervised learning allows the algorithm to identify the clusters within the data itself and subsequently label them accordingly.
K-means clustering is an example of an unsupervised learning algorithm and it works as follows:
- Choose the number of clusters, K (this is what the k stands for in k-means clustering), which the data are to be divided into
- Assign arbitrarily K number of cluster centres
- Assign data points to cluster centre which is nearest to them, most commonly using the Euclidean distance formula
- Once all the data points have been clustered according to their cluster centres, calculate the centroid of each cluster, using the mean values of the data points…