User Segmentation for Better Marketing Strategy

William Ong
6 min readAug 19, 2021

--

👈 Part III-I| TOC

Customer Segmentation with Clustering Illustration

How do we know our customer? How to group them together? How do we target the user with similar behavior?

Getting more information about user similarity will help our ecommerce in many ways. It helps us to make more personalized product recommendation and marketing strategy. Not only that, we would also benefited from the reduced cost when directing their actions to the right targets: the ones that are most likely to convert into becoming customers.

Data Overview

First of all, we need to understand the data structure for our dataset. The dataset that we use was composed of 12803 rows (user) and 3 columns (features) and built using features from our data warehouse from Part I.

ps : The user in our ecommerce is too large to be fitted in our model simultaneously. Thus, this project will only create a PoC (Proof of Concept) that we can segment our user and get much more insightful information from the clustering process.

The features included for our customer segmentation can be explained below :

  • Recency : Who have purchased recently? Number of days since last purchase (least recency)
  • Frequency : Who has purchased frequently? It means the total number of purchases. ( high frequency)
  • Monetary : Who have high purchase amount? It means the total money customer spent (high monetary value)

Exploratory Data Analysis

Before we perform any kind of clustering, let’s check the distribution of our data.

Monetary Distribution
Frequency Distribution
Recency Distribution

Normalization

To deal with the skewness of monetary and recency, we adopt some normalization strategy to normalize the numerical feature, such as:

  • Log Transformation (for monetary feature)
  • Sqrt Transformation (for recency feature)
Monetary Feature after normalization
Recency Feature after normalization

This transformation is important because one of the approach that we will use (KMeans) is using distance based algorithm. Thus, it is important to normalize and scale our data before inputing the data into the model.

ps : We don’t do any kind of normalization on frequency feature because this feature lacks in variant (most of the time, user only order 1 time). Any normalization done to this feature will result the same distribution (with different position for the skewness only).

Clustering

Before we cluster our data, we need to scale our data. In our case, I would use MinMaxScaler, because I would like the to have similar range for all the features (0–1).

As a side note, we will use silhoette score when trying to figure out the number of cluster for each approach. Silhoette score is a metric used to calculate the goodness of a clustering technique. Its value ranges from -1 to 1. The higher the score, the more the cluster apart from each other and clearly distinguished.

Approach I : KMeans

KMeans Illustration

In order to determine the number of cluster when using KMeans as clustering algorithm, kindly check below plot:

We can see that the best number of cluster (after 2 cluster) is 4 cluster (k=4). We will use this number of cluster (k=4) when trying to segment our user using KMeans. To see the distribution of each attributes, please check below plots.

Scatterplot for each cluster attributes distribution (KMeans)

Approach II : KMedoids

KMedoids Illustration

In order to determine the number of cluster when using KMedoids as clustering algorithm, kindly check below plot:

We can see that the best number of cluster (after 2 cluster) is 4 cluster (k=4). We will use this number of cluster (k=4) when trying to segment our user using KMedoids. To see the distribution of each attributes, please check below plots.

Scatterplot for each cluster attributes distribution (KMedoids)

Approach III : Agglomerative Clustering

Aglomerative Illustration

In order to determine the number of cluster when using Agglomerative as clustering algorithm, kindly check below plot:

We can see that the best number of cluster (after 2 cluster) is 4 cluster (k=4). We will use this number of cluster (k=4) when trying to segment our user using KMeans. To see the distribution of each attributes, please check below plots.

Scatterplot for each cluster attributes distribution (Agglomerative)

Results : Cluster Analysis

From all of the clustering algorithm above, all of them use 4 as number of cluster. We can see that KMeans able to seperate our user into 4 seperate group better than others. Also, because most of the times people use KMeans as their user segmentation algorithm, we will pick KMeans too.

Statistics for each cluster
  • Cluster 0 : Type of user that recently use our application and also have avg > 1 frequency. While the spending is not as much as cluster 3, this type of user might be our loyal base user.
  • Cluster 1 : Look like type of user which last time order is still not too long ago (about half — one year ago). From the spending (monetary) and frequency itself, it seems that user in this cluster only use the apps as single use.
  • Cluster 2 : Look like type of user which have been lapsed (not using the apps in a long time)
  • Cluster 3 : Look like type of user which last time order is still not too long ago (about half — one year ago). Since from the monetary itself it have a really high average spending, it might be the cluster that we need to target more for returning user

It’s always sad to say goodbye 😢. But we have reached through the end of my project that I have been doing in my past 5–6 months as side project (for Blibli) from my university works. In the future I would like to create more similar series like this for my other side project. Feel free to contact/follow me if you want to see more. Thank you & Have fun~~

--

--

William Ong

I love magikarp! Be like magikarp! Struggle so we keep improving!