Published in

Guide for applying clustering to Marketing Sales strategy

Leveraging marketing campaigns for your product.

Marketing has a critical role when it comes to feeding the company with customers that are aligned with the product or service that such company is offering. In order to do that properly it is important to understand your target audience generating different ICPs or Ideal Customer Profiles.

An ideal customer profile (ICP) is an attribute-description of the type of customer that fits better to your company strategy.

Bravo Studio ICP are non-technical founders that want to build apps

ICP generation is all about creating different groups or segments of customers based on their behaviors.

Understanding your ICP will help you to adjust your marketing strategy and lead generation tactics.

This segmentation aims to help businesses to understand their customers allowing them to:

  • Identify pain points and concerns in customers.
  • Modify products and behaviors to specific needs.
  • Improve marketing campaigns and targeting.

Back to the basics: The supermarket problem

There are many ways of extracting your ICP such as interviewing your customers or doing surveys and polls. However I will explain a basic go-to example about how to do that using Python and Clustering.

The process is really simple to follow and you can extrapolate it in order to adapt your data to each step in this guide.

That being said, let’s start with the following challenge. A supermarket provides us a data file with their customers and some attributes:

  1. The main goal is to generate main ICPs.
  2. Also we need to think about how to improve the marketing campaigns strategy in the future.

So the main schema for this kind of problem is:

  1. 📊 Exploratory Data Analysis
  2. 🧪Feature engineering + Data Checking
  3. 📐Preprocessing : Label encoding + Scaling + Dimensionality reduction
  4. 🧬Clustering
  5. 📝Model Evaluation
  6. 💡Generate ICPs
  7. 📢Marketing Strategy

📊 Exploratory Data Analysis

This stage covers the first analysis of the data. The goal of this is to understand better the initial data that we have and to do an initial cleaning as follows:

  • Filling empty values
  • Removing outliers
  • Drop useless information
Top 5 rows in the dataset

Taking a quick look at the data it is possible to see the following:

  • We don't have the age of the user, instead we have the YearBirth.
  • Column Dt_Customer is not parsed as DateTime. Also, this column represents the date of customer’s enrollment with the company but we don’t have as a number the seniority of a client. We’ll let that to the feature engineering stage later.
  • Column Income has 1.07% of empty values. As this percentage represents just 24 instances at the moment, we’ll remove it for now.

Last but not least, we have several columns with categorical information. We need to check the proportion of those categories and also to encode them as numeric. We’ll let that to the feature engineering stage later.

Marital_Status is quite specific currently so we'll create just 3 categories in order to simplify the data.

Categorical Features that need to be encoded

🧪Feature engineering + Data Checking

After having cleaned the data, we can focus on generating new features about these customers. In order to do that, we will:

  • Create a new feature Age from the Year_Birth column.
  • Create a new feature Seniority from the Dt_customer.
  • Create a new feature Partner in order to replace Marital_Status. We just want to know if the customer has a partner or not.
  • Create a new feature Children for replacing Kidhome and Teenhome
  • Create a new feature FamilySize in order to know how many people live with the customer.
  • Create a new feature IsParent from the previous feature.
  • Create a new feature Bill as the summation of amounts spent by the customer in the different categories over the 2 years.
Feature Engineering — Adding columns

Also I will remove useless columns for the project

Feature Engineering — Removing columns

Once that we have finished generating features it is time to check the final dataset information.

Data checking usually has three different approaches

1. Data completion: How many empty values do we have after the feature engineering?

2. Data outliers: How many outliers do we have? And of course, fixing the outliers as well

3. Data coherence: Does each feature contain coherent data with the rest of the dataset?

Dataset main stats

It is possible to see that there are some incoherence in our data as:

  • 👵 Ages above 128 years old.
  • 💰 Mean in Income is quite high and might contain outliers

We can see this with a cool pair plot in seaborn

Columns relationships in the dataset

So let’s remove the outliers in our dataset. In this case we are removing outliers that are easy to detect like age and income values but I would recommend to follow more accurate methods for detecting those outliers

Removing outliers in Age and Income

Now that the data is cleaned from outliers and weird values, it is time for preprocessing the dataset for feeding the models.

📐Preprocessing Stage

Before feeding the models with our data we need to preprocess a little bit the current dataset. The preprocessing stage has most of the time the following steps

1.Label encoding: To encode categorical features, that means from string to a cardinal number for representing the category.

Label Encoding — Columns Education and Partner

2. Scaling features: Scaled data makes it easy for a model to learn and understand the problem as the algorithms can calculate the distance between the data points easier for making better inferences out of the data.

Scaling features using StandardScaler from sklearn library

3. Dimensionality Reduction: As the current dataset have too many factors for doing the classification, algorithms might struggle for doing the calculations.

There are several features that are correlated and therefore they are redundant. Using dimensionality reduction we’ll keep just the features that are worth to consider for the calculation. With this step we are aiming to:

1. To reduce the dataset size

2. To increase interpretability and features managing

3. To minimize the loss of information

In order to do this I will use one of the main algorithms that allows to reduce datasets in an easy way, the Principal Component Analysis or PCA.

For interpretability reasons, I will reduce the dimensionality to 3 features.

Dimensionality Reduction using PCA

🧬Applying clustering to our data

Once the preprocessing stage is finished, we can feed our data to the algorithm.

As we don’t have labels in our data to predict, I will use Agglomerative Clustering algorithm for grouping the customers. This is a hierarchical clustering method that involves merging examples until the desired number of clusters is achieved.

In order to do that, it is necessary to perform several steps:

1. To find the number of clusters using the Elbow Method: I recommend to use the Yellobrick python package for doing that.

2. To apply the Agglomerative Clustering algorithm.

3. Plot the clusters in order to analyze them.

Elbow calculation for pulling the number of clusters
We can see that the elbow is in 4 clusters.
3D clusters representation

📝Model Evaluation

As this is an unsupervised algorithm and we don’t have labels for evaluating the results, it is necessary to understand the output and analyzing the cluster in order to check that the patterns make sense.

There are several approaches for doing that:

1. Clustering distribution + boxplot

2. Clustering against the main interesting feature: Income and bill and also age and bill

Clusters distribution

So if we take a look at the different clusters comparing Income vs Bill and Age vs Income, we can see that

2D clusters representation
Clusters Insights — Bill vs Income
Clusters Insights — Bill vs Income

If we take a look just the the main clusters 0 and 1 it is possible to see better the difference between the 2 types of main customers

2D clusters 0 and 1 representation

💡Generate ICPs

Now that we have identified the clusters, let’s take a look at the previous campaigns using the AcceptedCmpX columns (being X the number of the campaign).

  • It is possible to see that the marketing campaigns aren’t too effective. Most of the customers just accepted 0 campaigns.
  • Also, although there are 5 different campaigns, no one has reached the fifth one.

It is clear that it is necessary to put some course of action in order to improve the performance of the campaigns.

Marketing Campaigns performance per cluster

So we need to create the Ideal Customer Profiles knowing all of this information, in order to do that I will go through each customer attribute and seeing the clusters in a jointplot

Jointplots for Age, IsParent, Children and Education columns
Jointplots for FamilySize , Seniority, and Partner columns

So looking at these charts, we can create four different ICPs:

Ideal Customer Profiles

📢Marketing Strategy

Last step, once we have created our Ideal Customer Profiles, is targeting our marketing campaigns. In this case it seems that the most profitable users are the ones from the Cluster 0 and 1 as they are spending more money.

However our campaigns didn’t perform very well for cluster 1 as we can see in the results of the first campaign, so it will be important to change the approach of the marketing campaigns.

Useful Links

Contact me!



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store