Customer Segmentation using K-Means Clustering
Learn and implement k-means algorithm in customer segmentation in less than 5 minutes
Simple and short definition: “Identify similarity among customers based on different attributes and group them together”
Next, few essential notes on customer segmentation is provided followed by the python code.
Types of customer segmentation
RBC of Customer Segmentation
Reason
- Failure to understand target audience
- Ineffective marketing campaign, marketing and product mix
- Limited resource and non-optimised cost allocation
- Lack of activity prioritisation
- Lack of guidance in product development resulting in longer time-to-market
Benefits
- Better understand target audience by improved order fulfilment
- Improved customer response, high customer engagement, loyalty and retention, better brand visibility
- Explore and grab new business opportunities
- Improves customer lifetime value, optimises sales funnel
Cautions
- Segments must align with business goals and context
- Lack of quality data and volume can lead to inferior segments
- Segment performance must be monitored over time for any drift from expectations
Python code for Segmentation using K-Means Clustering
Import necessary python libraries
Import all necessary libraries and update necessary notebook properties
Compute Recency, Frequency, Monetary values
Three features are critical for feature engineering, which are as follows:
- Recency — Days since last purchase w.r.to specific customer
- Frequency — Number of purchase over a specified time w.r.to specific customer
- Monetary — Total monetary contribution w.r.to specific customer
Re-Scale Data using Normalisation
Since, a distance based algorithm will be used as a modelling approach, all data columns must be scaled adequately. In this case, the data is scaled using min-max method within [0, 1]. However, other methods such as normalisation can also be used.
Group customer segments using K-Means clustering
Given, the problem has no label data, an unsupervised algorithm is chosen. In this case, a k-means algorithm is chosen which is a popular distance based algorithm. K-means algorithm provides significantly good results within reasonable time and cost. Algorithms such as hierarchical algorithm is considered as a good alternative.
Visualise R-F-M distribution based on K-Means cluster
The distribution of recency, frequency and monetary values across multiple clusters should be visualised to provide clarity and drift of segments over time. It enables to develop necessary strategies to counter any unexpected behaviours.
Conclusion
Based on the results obtained, i.e. customer segments, tailored recommendations can be provided to the consumers that improves the ability to fulfil customer expectations. Therefore, it is quite crucial for every organisation to analyse customer segments occasionally and roll out different marketing campaigns and promotions.