Introduction to the Practice of Customer Segmentation

May 27, 2020

Customisation… This word is being heard across meeting rooms of many businesses these days. Well performed customisation can improve your customers’ loyalty and make them more engaged with your campaigns, but unfortunately not many businesses do it right, cultivating a simple spray-and-pray, decorated in nice phrases.

Fortunately growing amount of customer data collected every year, makes it easier than ever to create a meaningful customisation and the first step to customise customer experience is to build Customer Segmentation.

Customer segmentation is going to help you improve your Live-Time-Value to Customer-Acquisition-Cost ratio by decreasing churn rate and maximising the value of existing customers. On the acquisition side you, it is going to help focusing on the customer profiles that are really fitting to your business model.

This article will explain what Customer Segmentation is and how it should look like from the commercial and analytical standpoint. It focuses mostly on marketing considerations from the data side than on scripts or the logic behind particular algorithms. I will explain those in a separate article.

What are customer segments?

In the simplest terms, Customer Segments are groups of customers sharing similar characteristics or attributes. Depending on industry or business model the characteristics on which the customers are segmented may vary between companies.

The attributes may be of various kinds and will come as an application of customer data the business collects.

There are multiple places you can source your data from. Some of it the customers might give away during registration, some of it can be derived from their transaction history. Product or web analytics data describing their interaction with your digital services can be tracked by tools designed for the purpose. External providers, like credit reporting houses, etc can also help for a considerable remuneration.

As a thought experiment we might take Medium as an example and think how would they want to segment their subscribers. The attributes they could use include:

  • Age
  • Geographical location
  • Interest area
  • Tenure (time since the first subscription)
  • Volatility (loyal customers vs on and off subscribers)
  • Reading engagement
  • Posting engagement
  • Output of product or churn propensity Machine Learning models

Having subscribers described by each of the above characteristics, we could start splitting the whole base into segments, for example non posting readers interested in marketing or long term subscribers with high churn risk, and analyse those groups separately to identify opportunities associated with them like encourage the non-writers to write about their marketing experience that will drive engagement or provide loyalty incentives for customers at high churn risk.

Key design concerns

When planning on how the Segments should look like the requirements to take into account should be that the segments are:

  • Related to business problems — aligned with business’ KPIs and contact strategy
  • Explainable — non technical users need to understand who is in each segment
  • Robust — deployed in an environment enabling error free and regular execution
  • Stable — the proportion of customers in particular segments should not radically change in short periods of time
  • Thorough — all the customers with enough data points should be assigned to some segment

Now, I will explain each of those requirements in more detail.

Relate to business problems

Why bother doing anything that is not going generating any ROI (at least at work)? Make sure your segments are designed so you can drive your KPIs using them!

To make sure the segmentation solves business problems it is important to look at it in the context of the company’s contact strategy and the KPIs leadership wants to drive.

In the simplest example, one of the company’s priorities might be to minimise churn in a certain age group. Having this in mind the analyst might want to make sure the churn metrics are included in the input, but also the output of segmentation should provide the indication of customers’ churn scores by age enabling to target the segment efficiently.

The venture of segmenting customers should be a collaboration between Analytics/Data Science and Commercial teams to make sure the right requirements have been collected on the one hand and the data experts are providing an efficient and accurate solution.

Start the project by brainstorming between the two sides and then keep them involved in periodic steering meetings. I saw many projects fail because the key stakeholders did not feel sufficiently involved in the planning process or because the output did not satisfy their requirements. Make it easy for everyone to relate to the project!


Many of the users of your work will not be very technical. They also need to understand who is in a particular segment.

Having criteria like being between mean and mean +1 standard deviation are fine as long as they are explained in plain human language (e.g. medium-high) and don’t combine too many features in the description.

Even if you used some Machine Learning algorithms like clustering, make sure you are able to easily explain the contents of each of your clusters and point the differences and opportunities or risks associated with the customers encapsulated in them.


How can you say your product was successful? It is when the users do not want to come back to the times when your solution was not there for them to use. Once the segments become a vital part of the marketing strategy, there is a high demand to keep them regularly refreshed and error-free.

This is probably true for every productionised data processing pipeline, but make sure the scripts are scheduled on production grade VMs rather than as an ad hoc process on a laptop machine of analyst who happens to currently be on an annual leave.

The data sources the models use need to be reliable. If you are not sure about certain data’s future availability in the future, it might be worth to either improve the data pipeline or reconsider including it in the algorithms.


Some segmentation techniques, like clustering, might create very different results when ran over the customer base across different time points. This might be true especially for some highly seasonal industries or the ones relying heavily on big irregular events.

Any radical changes in the proportions between different segments need to have a good business explanation and the models should be tested for tackling seasonality and other factors introducing volatility.

Inconsistencies can also be caused by changes in the data processing logic. Those might require recalculating the historical views using the new logic to get consistency over time.


Each of the customers needs to have some segments assigned to them. There should be some strategy in place to handle customers with not enough data on them, new customers or long lapsed ones.

K-means will assign a cluster to customer even if they do not show any similarities with others. Techniques like density based clustering might cluster only customers showing some similarities with others while leaving “loners” without any cluster assigned to them. Both of those circumstances are not ideal and might mean that you might be either inaccurately targeting some of the customers or your reaches will go down. Those considerations need to be addressed by either providing another layer of segments or by choosing a different method.

Data considerations

Getting the right amount and quality of data is the key for building the right segmentation. Especially new business or business that only started to think seriously about their data strategy might struggle with obtaining enough transactional data.

Some attributes, especially the ones driven by models like propensity or churn scores will require a large amount of data points, probably at least a year cycle in order to factor in the seasonality.

If you do not have enough data yet, you can start building segmentation with what you currently have, but plan what is your desired state, so you can start preparing right now because it is going to take a while before your data matures.


Building segmentation is not a task of simply running a clustering algorithm and analysing the output. It requires a significant amount of work, both from business and data perspective.

It is a collective work between the commercial, analytics and engineering teams to ensure that the segments not only meet the business requirements, are methodically correct and productionised.



