Identify Potential Customers With Unsupervised and Supervised Machine Learning

Aigerim Shopenova
The Startup
Published in
10 min readDec 3, 2020


Unleash new opportunities with data

Photo by Hello I’m Nik 🎞 on Unsplash

Bringing new customers to a service is a common business problem, which can be solved by analyzing data of existing customers and the general population.

In this blog post, I would like to talk about how a company can bring new customers through customer segmentation and analyzing the general population using supervised and unsupervised machine learning.

Outline of the post:

1. Explaining types of customer segmentation

2. Getting to know the data

3. Data cleaning

4. Data preprocessing

5. Applying k-means clustering to find segments within existing customers

6. Calculating Euclidean distances to find similar people in the general population

7. Predicting customer’s subscription using XGBoost and Gradient Boosting classifiers and Kaggle competition

1. Types of customer segmentation

In today’s increasing competition within markets, it is important to understand the different behaviours, types, and interests of customers. Using market segmentation, marketers can tailor their marketing campaigns and focus on one specific audience at a time. This approach can help to target specific groups of customers with different pricing options, promotions, and product placements that can capture a wider audience in the most cost-effective way [2].

Market segmentation can be done in 3 ways by dividing a market into:

1) Demographic groups

For example, “men between the ages of 25 and 35”. This approach is easy to implement. However, there is no reason to believe that those men have similar needs and/or reasons to buy/subscribe

2) Need groups

Such as “a man who wants to save time for transportation”

3) Behavior groups

Such as “a woman who buys fashion goods on an e-commerce platform”. This group is defined, not just by needs, which can be helpful to find common characteristics that they…