K — Means Clustering for Customer Classification

Sanjoli Chauhan
Capillary Data Science
3 min readMar 26, 2020

Analysing data along with generating future insights is the new way to grow a business. K-Means algorithm is one such technique to cluster or segment groups which depicts similar behaviour.

What is clustering?

Clustering is basically dividing the given data into various segments. These segments show some kind of behavioural similarity which helps us to analysis data of a given segment and relate it with the issues which are being addressed. Similarly data can also be segmented on the basis of dissimilarity.

Why is clustering used?

Once the data set is divided into clusters, it is easy to analyse them. As mentioned earlier the data in each cluster depicts homogeneous behaviour. This behaviour for each cluster can be studied further for various different purposes. In this case we are considering retail market hence clustering would help us understand leads like purchase pattern and other key business key performance indicators related to customer This understanding would eventually lead to having different types of campaigns or plans for better customer engagement and retention.

How is it done?

There are various clustering algorithms which are available for use. K means algorithm is one such algorithm which is predominantly used as a clustering technique. K-Means is an unsupervised algorithm. This algorithm works on the principle of K-Means Clustering.

K-Means Algorithm

K-Means is an algorithm which follows iteration and divides the given data set into K predefined clusters which are distinct. It assigns data points to a cluster such that the sum of the squared distance between the data points and the cluster’s centroid is at the minimum. The less variation we have within clusters, the more homogeneous the data points are within the same cluster.

The algorithm starts with randomly generating centroids which are the start points of the different K types of clusters. K-Means being and iterative algorithm then starts it’s iteration to optimise and stabilise the position given points. The process stops when either the centroids are optimised (Stabilised) or predefined number of iteration are achieved.

Sample Case of Application

We are provided with the data of a pharma retail having multiple stores globally. So, in this case our analysis would be on different parameters of individual customers along with stores and so would be the clustering.

The flow of the analysis would be as follows:

· Customer Level Clustering

· Graphical Representation

· Behavioural Analysis

Customer level clustering

From the data set provided we would add all the considerable information customer wise to the code for K-Means algorithm along with further information like number of iterations. The output segment would give us distinct clusters having set of customers portraying similar behaviour.

Graphical Representation of customers

Behavioural Analysis

The above graph shows us the different clusters obtained consisting of different customers of a single enterprise. On further analysis all the customers in different segments are studied on the basis of behaviour they demonstrate while shopping. All the cluster of customers are studied on the basis of key performance indicators like:

· Average Basket Size

· Average Transaction Value

· Preferred Stores

Conclusion

As far as segmentation is concerned, it is one of the most important aspect in analysing customer behaviour. Having homogenous segments of customers would help us target in the right audience in terms of campaigns. Along with this we can target the cluster for products wise offers and make the low spending segment shift to the other higher segments.

Along with this customer segmentation would also be beneficial in increasing ROI and also overall brand performance in the market. Having targeted the right set of customers we would be able to increase the campaign propensity and campaign effectiveness which you eventually lead to customer satisfaction.

--

--