Customer Segmentation using K-Means Clustering

2 min readMay 11, 2022

Learn and implement k-means algorithm in customer segmentation in less than 5 minutes

Simple and short definition: “Identify similarity among customers based on different attributes and group them together”

Next, few essential notes on customer segmentation is provided followed by the python code.

Types of customer segmentation

RBC of Customer Segmentation

Reason

Failure to understand target audience
Ineffective marketing campaign, marketing and product mix
Limited resource and non-optimised cost allocation
Lack of activity prioritisation
Lack of guidance in product development resulting in longer time-to-market

Benefits

Better understand target audience by improved order fulfilment
Improved customer response, high customer engagement, loyalty and retention, better brand visibility
Explore and grab new business opportunities
Improves customer lifetime value, optimises sales funnel

Cautions

Segments must align with business goals and context
Lack of quality data and volume can lead to inferior segments
Segment performance must be monitored over time for any drift from expectations

Python code for Segmentation using K-Means Clustering

Import necessary python libraries

Import all necessary libraries and update necessary notebook properties

Import python libraries

Compute Recency, Frequency, Monetary values

Three features are critical for feature engineering, which are as follows:

Recency — Days since last purchase w.r.to specific customer
Frequency — Number of purchase over a specified time w.r.to specific customer
Monetary — Total monetary contribution w.r.to specific customer

Compute recency, frequency, monetary

Re-Scale Data using Normalisation

Since, a distance based algorithm will be used as a modelling approach, all data columns must be scaled adequately. In this case, the data is scaled using min-max method within [0, 1]. However, other methods such as normalisation can also be used.

Scale data

Group customer segments using K-Means clustering

Given, the problem has no label data, an unsupervised algorithm is chosen. In this case, a k-means algorithm is chosen which is a popular distance based algorithm. K-means algorithm provides significantly good results within reasonable time and cost. Algorithms such as hierarchical algorithm is considered as a good alternative.

K-Means clustering algorithm

Visualise R-F-M distribution based on K-Means cluster

The distribution of recency, frequency and monetary values across multiple clusters should be visualised to provide clarity and drift of segments over time. It enables to develop necessary strategies to counter any unexpected behaviours.

Conclusion

Based on the results obtained, i.e. customer segments, tailored recommendations can be provided to the consumers that improves the ability to fulfil customer expectations. Therefore, it is quite crucial for every organisation to analyse customer segments occasionally and roll out different marketing campaigns and promotions.