Customer segmentation for an e-commerce retail

Figure 1: Source Unsplash. By Blake Wisz.

I had 10 minutes to present a case I was pride of. Then, I chose this customer segmentation problem. Mainly, because I could learn a lot by building it: from defining my choices to the interpretability of the results in the light of the e-commerce business.

So here’s the problem: the marketing team from a retail e-commerce platform asked me to identify types of customer so that they could implement specific actions to foster greater consumption and reduce churn, in other words, users migration to others e-commerce platforms.

Questions to be answered: What are the type of customers? What are their main characteristics? Which actions the marketing team could take to increase the revenue?

The framework I used to segment the clients was the RFM methodology that stands for recency, frequency and monetary value.

I structured my solution in 4 parts:

1 | Hypothesis & Exploratory data analysis (EDA)

2 | Data manipulation

3 | Preprocessing & Clustering

4 | Business metrics evaluation

5 | Conclusions

1 | Hypothesis & Exploratory data analysis (EDA)

In the Figure 2, we can take a look at the available data. In each line, we have a product or service that was bought in a transaction by a customer. Some hypothesis that I used were:

  • Considering only the lines with a CustomerID assigned so that I could identify the client purchase’s behavior through time;
  • Considering only lines with positive values of quantity and price, because I am interested in analyzing revenue of products that were bought, without considering returns; and
  • Considering that the shipping fee payed by customers gives no margin for the company ( each shipping fee was listed in a line). With this, I analyze client purchase power and wish to buy only products.

Hence, I removed the lines that didn’t satisfied my hypothesis (25% of 541.909 lines).

Figure 2: Available columns in the dataset. From left to the right: transaction number, stock code of a product, description of the product/service, quantity bought, date of the transaction, unit price, customer ID and the country.

Special remark for the shipping fee lines: I stored the revenue related to them in a column so that I could evaluate their impact for the customer. Concerning where the company is placed, about 90% of the transactions were made in UK and 10% abroad. In average, the shipment fee payed by people in England represented 0.3% of the revenue while foreigners payed 4.1%. In the Figure 3, we can see that the shipment fee can reach up to 50% of the value payed in a transaction, but from 12,500 m.u. it drops to less than 10%.

Figure 3: Graphic of the percentage of shipment fees in the revenue versus the revenue per transaction. Blue points are transactions made abroad and orange points are transactions made in UK.

At last, it seemed important to me to analyze the revenue over the time. In the Figure 4, we can see the revenue central tendency per month expressed in the rectangles and its uncertainty in the gray lines showing the seasonality of the revenues.

Figure 4: Revenue seasonality. Graphic of revenue per transaction per month of 2016 and 2017.

2 | Data manipulation

In this step, I grouped the information initially available in product lines into a transactions dataframe. That is, all the lines with products belonging to a transaction were grouped in a single transaction line. Later, I grouped each transaction line belonging to a customer into a customer line. Originating a customer dataframe.

Figure 5: Data manipulation made. From left to the right, ‘groupby’ and ‘merge’ commands were implemented to build a dataframe with an unique transaction by line. Then, these transactions were grouped to build a customer dataframe, in each line a unique customer ID with its respective history data.

I defined the features for the clustering model as follows:

Recency: quantity of days that passed since the last customer transaction counting from today’s date (last registered date in the transaction dataframe)

Average time between transactions: quantity of days between the first and the last transaction divided by the number of transactions. Here I assumed regular purchases over time.

Mean revenue per customer: average revenue from all transactions made by one customer.

3 | Preprocessing & Clustering

Not removing the outliers when using K-Means in this case implies getting clusters with only one or two points inside. So I removed them using the definition of interquartile range. Resulting in the removal of 12.22% of the client base, or, 530 clients.

Another good practice when using k-Means (unless e.g. you’re evaluating cases like positions in a map) is scaling the features. It’s important to give each feature an equal weight. In this project, I chose the standard scaler (which removes the mean and scales to the unit variance).

One of the drawbacks of the k-Means is that we have to manually choose the number of clusters. Then, after doing this preprocess steps, I coded a function to save the inertia values and silhouette scores per iteration to plot the graphics below, Figure 6 and 7. The former measures the “sum of squared distances of samples to their closest cluster center”. The latter ranges from -1 to +1. The closer to +1, it relates to a model with “better defined clusters”. 0 means overlapping clusters and closer to -1 means worse models.

Figure 6: Sum of inertia of the points to its closest centroid center varying with the number of clusters.
Figure 7: Silhouette score varying with the number of clusters.

The number of clusters to be chosen is the one that combines a lower inertia, accordingly to the elbow method and its suitability to the problem, and a bigger silhouette score. Then I chose number of clusters equal to 4.

4 | Business metrics evaluation

As a result, it was obtained the graphic in the Figure 8. Looking to it we see 4 clusters that can well segment the customer base, as I am going to discuss below.

Figure 8: 3d-graphic in scaled axis with the customer’s data segmented.
Figure 9: Pairplot of the customer segmentation.

Figure 9 provides us a lot of insightful information about the clusters. The diagonals consists of univariate distributions from each scaled feature. Through the remaining graphics, we see the plans that cut the 3d original graphic from the Figure 8. Additionally, we can explain the low silhouette score: from some views we see intrusive points that mix inside a bigger cluster volume, e.g. red points in the graphic placed in the line 2, column 1.

Evaluating the clusters, one by one:

Red (cluster 0) — 20.5% of the considered clients: higher recency -> We probably have churn here. Can I recover these customers? At what cost?

Orange (cluster 1) — 24.3%: longer time between transactions -> it may be a tendency to churn OR just a characteristic of these customers. The company should strive to reduce this time.

Purple (cluster 3) — 35.6%: range features similar to the others, but average revenue is not so high. The marketing team should try to increase the average revenue through the crossell tatics (using the Association rule learning that finds products generally bought in association) or product recommendation.

Green (cluster 1)— 19.6%: ideal clients! Their average revenue range is the highest and the time indices ranges are satisfying. I would need to create more features to discover if they mostly buy premium products or a bulk basket.
The big deal is attract this kind of ideal customers. Look-alike models extract the main characteristics of these clients (behavioral and demographic information) and then allow marketing teams to target an audience with similar characteristics. It is important though nurture the model clients that are already with us. In this context, special interest clubs can be interesting, so they can make part of communities and keep exclusivity with the platform.

5 | Conclusions

It was possible to segment clients using k-Means method through good practices and suggest some actions for the marketing team in order to increase revenue.

However, k-Means is a hyperspherical model (in each kind of plane, it generates a corresponding sphere. In the 2d, it segments in circles. In the 3-d, in a sphere and so on) which means it relies in the compactness of the points to attribute clusters. So, points of improvements would be testing with non-hyperspherical model or even use the k-medoids that seems to be not as sensible to outliers. Concerning the k-Means case, for a complete analysis it would be necessary to treat the outliers case using feature engineering.




Brazilian mechanical engineer | Sharing my learning in Data Science and ML applications

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

NVivo Techniques in Grounded Theory: A Case Study

Transitioning from data scientist to data science leader

Hypothesis Testing with the Northwind Database

Data for AI in Health Care

Making Data Management Decisions

Statistical Analysis

CRAFT (Object detection)

Twitter text sentiment analysis: Disney Plus

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Rhosane Santos

Rhosane Santos

Brazilian mechanical engineer | Sharing my learning in Data Science and ML applications

More from Medium

Credit Card analysis by segmentation to define marketing strategy

Telecom Churn Prediction — EDA

Customer segmentation (Part 1)

In detailed EDA on Riiid Answer Correctness Prediction(kaggle Competition)