Introduction to clustering-based customer segmentation

Kaixin Wang
Data Science at Microsoft
7 min readNov 7, 2023
Photo by Guillermo Ferla on Unsplash.

Customer segmentation is a key technique used in business and marketing analysis to help companies better understand the user base and usage patterns of their products and services. Clustering refers to a set of unsupervised learning approaches derived from statistical and mathematical methods, which are frequently used to discover the underlying patterns and groupings of otherwise complex datasets. This article introduces a clustering-based customer segmentation method and a case study where it gets applied to identify customer cohorts in a real-world setting.

Methodology

The market for any given product or service is known as the market potential or the total addressable market (TAM). Because this is the market to be segmented, the first step in the segmentation process is to identify the size of the potential market. Figure 1 defines the domain of marketing.¹

Figure 1: The marketing domain.

Selecting a segmentation base that fits the business justification is also an important step in any type of segmentation analysis. Four major types of segmentation bases are as follows:

  • Demographic: Quantifiable characteristics of the market. Examples include age, gender, income, education, social and economic status, and more.
  • Geographic: Division of the market according to geographic attributes. Examples include country, region, size of population, and more.
  • Psychographic: Incorporation of customer measures such as attitudes, beliefs, or personality attributes. For example, customer attitudes toward a certain statement are typically used to infer their perspectives on particular beliefs that are important to a brand.
  • Behavioral: Historical usage, which can be a good indicator of future actions. Examples include usage frequency, level of user engagement, the tendency to retain the product or service, and so on.

Behavioral segmentation has received particularly high attention in industry due to its special focus on historical usage patterns and usage habits. One of the most frequently applied behavioral segmentation approaches is the Recency, Frequency, Monetary (RFM) model, introduced next through a case study.

Case study: K-means Recency, Frequency, Monetary (RFM) segmentation

RFM segmentation is a commonly used segmentation approach in marketing analysis to discover high-value customers and to improve user retention and engagement.² The building blocks of RFM are as follows:

  1. Recency, which measures how recently the customer used the product or service.
  2. Frequency, which tells how frequently the customer used it.
  3. Monetary, which is related to how much monetary value was invested toward it.

RFM modeling has many variations tailored to different usage scenarios. For instance, Recency, Frequency, Duration (RFD) focuses more on the dwell time or usage time of the product instead of the monetary values. Additionally, Recency, Frequency, Engagement (FRE) is an extension of RFD in which Engagement can include visit duration, the type of and number of activities during each visit, and many other types of metrics that reflect the level of user engagement with the product or feature.

Clustering encompasses a set of unsupervised learning methods with the goal of creating clusters where the within-group difference is minimized, and the between-group difference is maximized. Among different clustering techniques, K-means clustering is one of the most frequently and widely applied. As the name suggests, the algorithm calculates the “distance” between different groups of data points and creates K segments. K-means clustering has different types of variations; for instance, kernel K-means clustering utilizes a kernel function to discover the clusters within a set of data points that are not linearly separable.

Because the goal of customer segmentation is to discover the cohorts of a large dataset based on customer attributes while there are no existing segment labels, clustering is applicable in solving the challenge.³ Suppose a company has an e-product where visitors can interact with it through browsing the content, scrolling up and down along the canvas, and clicking on different components. The company wants to segment the users into groups so that it can enhance the usage retention of the product and increase the number of users with high engagement. This is an example where the K-means RFE analysis is a good fit, because the objective is to segment users based on their historical usage behaviors (RFE) and no prior segments exist (clustering).

Now assume we are given a dataset where each row represents one customer’s usage history of the product and each column corresponds to one of the RFM metrics observed in a 28-day window — the number of days since the last time the user invoked the product (Recency), the number of days each user used the product (Frequency), and the total number of “sessions” where the user had any type of interaction with the content (Engagement). Figure 2 depicts the distribution of each predictor variable for a sample data set based on N = 1414 customers.

Figure 2: Exploratory Data Analysis (EDA) of the predictor variables.

We now apply K-means to discover the user cohorts. Figure 3 shows the elbow curves where two criteria were used to select the optimal number of clusters (K): the distortion (left) and inertia (right). The elbow method is commonly used in clustering analysis to determine the optimal number of clusters (K), a value that explains the maximized amount of variation of the data with a minimized number of clusters, or graphically, the value K is the point where an elbow in the curve is observed. Distortion and inertia are referred to as the measures of variation in the elbow method; distortion calculates the average squared distance between each data point and the center of the cluster, whereas in comparison, inertia is the sum of squared distances of each sample to the closest cluster centroid. From the plot, we see the elbow point occurred at K = 4 based on both the distortion and inertia criteria, suggesting the optimal number of clusters in segmenting the given set of users.

Figure 3: Determining the number of clusters (K) in K-means.

Having determined the number of clusters, we apply K-means to create the customer segments. Because the goal is to identify the group of users with high engagement, we assign a label (or name) to each segment based on one of the three pivots — the number of usage sessions in the observed 28-day window — and we refer to the cluster with the highest number of engaged usage as Champion, the second most active cohort as Loyalist, the third group as Potential, and the last segment as At Risk. Figure 4 is an animation of the convergence of K-means using K = 4 after the first 15 iterations. Figure 5 shows an animation of the distribution or placement of the segments in a three-dimensional feature space.

Figure 4: Three-dimensional animation of the first 15 iterations of K-means clustering using K=4.
Figure 5: Three-dimensional distribution of the segments using 4-means clustering.

Notice that a linear monotonic trend among all three dimensions is observed even if the segment labels were assigned based only on one dimension — this indicates the RFE metrics selected to segment the customers were good representations of the usage of the product (e.g., Champions are the customers who used the product most recently, had high frequency of visits, and had high usage engagement). Figure 6 shows the centroid of each segment in three-dimensional space. Figure 7 contains the post-segmentation distribution of the feature space, where we see clear cut-offs along each of the three dimensions and a high contrast among the distributions of Champion, Loyalist, Potential, and At Risk cohorts.

Figure 6: Centroid of each cluster. The size of the scatter corresponds to the volume of each segment.
Figure 7: Post-segmentation EDA.

Summary

This article looked at a clustering-based customer segmentation approach that combines K-means clustering with RFE modeling. The approach is simple and intuitive, and yet it yields the power of segmenting customers into groups where users with similar usage behaviors and habits are clustered together, which then provides insights that help in the understanding of usage patterns of the product or service for the purpose of converting more users into the “champion” state. An additional merit of this methodology is its versatility — depending on the nature of the product or service and the goal of the segmentation, the type of attributes or metrics used to create the segments can be easily adjusted. Have fun segmenting!

Kaixin Wang is on LinkedIn.

References

1. Market segmentation — the bedrock of successful marketing. in Market segmentation 1–19 (John Wiley & Sons, Ltd, 2012). doi: https://doi.org/10.1002/9781119207863.ch1.

2. Christy, A. J., Umamakeswari, A., Priyatharsini, L. & Neyaa, A. RFM ranking — an effective approach to customer segmentation. Journal of King Saud University — Computer and Information Sciences 33, 1251–1257 (2021).

3. Nabeel Mustafa, S. M., Akhtar, A., Peter Noronha, J. T., Salman, M. & Baig, M. A. Customer segmentation using machine learning techniques. in 2023 international multi-disciplinary conference in emerging research trends (IMCERT) vol. I 1–7 (2023).

4. A, R. S., Jaiswal, A., P, S. & L, S. Customer segmentation using machine learning. in 2023 third international conference on advances in electrical, computing, communication and sustainable technologies (ICAECT) 1–5 (2023). doi: 10.1109/ICAECT57570.2023.10117924.

5. Ketchen, D. J. & Shook, C. L. The application of cluster analysis in strategic management research: An analysis and critique. Strategic Management Journal 17, 441–458 (1996).

6. Pedregosa, F. et al. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research 12, 2825–2830 (2011).

--

--