Customer Segmentation using K-Means Clustering

Dr. Samiran Bera (PhD)
2 min readMay 11, 2022

--

Learn and implement k-means algorithm in customer segmentation in less than 5 minutes

Simple and short definition: “Identify similarity among customers based on different attributes and group them together”

Next, few essential notes on customer segmentation is provided followed by the python code.

Types of customer segmentation

Types of customer segmentation approaches

RBC of Customer Segmentation

Reason

  • Failure to understand target audience
  • Ineffective marketing campaign, marketing and product mix
  • Limited resource and non-optimised cost allocation
  • Lack of activity prioritisation
  • Lack of guidance in product development resulting in longer time-to-market

Benefits

  • Better understand target audience by improved order fulfilment
  • Improved customer response, high customer engagement, loyalty and retention, better brand visibility
  • Explore and grab new business opportunities
  • Improves customer lifetime value, optimises sales funnel

Cautions

  • Segments must align with business goals and context
  • Lack of quality data and volume can lead to inferior segments
  • Segment performance must be monitored over time for any drift from expectations

Python code for Segmentation using K-Means Clustering

Import necessary python libraries

Import all necessary libraries and update necessary notebook properties

Import python libraries

Compute Recency, Frequency, Monetary values

Three features are critical for feature engineering, which are as follows:

  1. Recency — Days since last purchase w.r.to specific customer
  2. Frequency — Number of purchase over a specified time w.r.to specific customer
  3. Monetary — Total monetary contribution w.r.to specific customer
Compute recency, frequency, monetary

Re-Scale Data using Normalisation

Since, a distance based algorithm will be used as a modelling approach, all data columns must be scaled adequately. In this case, the data is scaled using min-max method within [0, 1]. However, other methods such as normalisation can also be used.

Scale data

Group customer segments using K-Means clustering

Given, the problem has no label data, an unsupervised algorithm is chosen. In this case, a k-means algorithm is chosen which is a popular distance based algorithm. K-means algorithm provides significantly good results within reasonable time and cost. Algorithms such as hierarchical algorithm is considered as a good alternative.

K-Means clustering algorithm

Visualise R-F-M distribution based on K-Means cluster

The distribution of recency, frequency and monetary values across multiple clusters should be visualised to provide clarity and drift of segments over time. It enables to develop necessary strategies to counter any unexpected behaviours.

Conclusion

Based on the results obtained, i.e. customer segments, tailored recommendations can be provided to the consumers that improves the ability to fulfil customer expectations. Therefore, it is quite crucial for every organisation to analyse customer segments occasionally and roll out different marketing campaigns and promotions.

--

--

Dr. Samiran Bera (PhD)

Senior Data Scientist | PhD | Machine Learning & Optimisation