RFM Customer Segmentation: An Analysis of an Online Retail Store

Francisca Ubah
5 min readAug 12, 2022

--

Photo Credit: Pexels-Cottonbro

Overview

We will be analyzing transaction data to uncover the various customer segments for this client and offer actionable steps based on those findings.

The customers will be ranked based off three main criteria: Recency of purchase, Frequency of purchase and Monetary value of purchase. That is, we will explore the data set to figure out how long it’s been since each customer’s last purchase, how frequently they purchase and the total monetary value of their purchases. For this ranking, 5 is the highest score and 1 is the lowest.

Introduction

The client is a UK-based and registered non-store online retailer that mainly sells unique all-occasion gifts. They have consulted our services because they believe their customer churn rate is high and are looking for ways to maintain, and possibly improve, existing patronage. Customer churn is the fraction of customers that no longer use your company’s product or service during a given period.

The cofounders have asked us to look at their transaction data and come up with recommendations on how to turn things around, they also want to understand their customer groups.

Business Questions

Who are our best customers?
Which customers should we work to retain?
Which customers have we already lost?
Where should we look to improve sales?
Which customers are most likely to respond positively to campaigns and promotions?

Data Background

The data was gotten from Kaggle, it’s a transnational data set which contains all the transactions occurring between 01/12/2010 and 09/12/2011. Data credit: Online Retail Dataset

Upon my data exploration, I found the data set had a limitation of incompleteness. Some cancelled orders from 2010 could not be matched against any available order leaving customers with negative total sales.

Data Cleaning and Exploration

This was done using Microsoft Excel and SQL, see query codes here. The data format was in csv and imported into PostgreSQL through pgadmin. During the cleaning process, I checked to make sure there was no null element that would negatively impact my analysis, no trailing spaces, and all available transaction data was linked to a customer via customer id.

During my exploration, I noticed that the data included some nonsense figures like negative quantities, invoice numbers starting with letter C (inconsistent format as data pattern showed 6-digit numbers only) and negative sales. This led me to do some digging and after going down a rabbit hole in the Discussion and Code sections of the Kaggle page, it became clear these were linked to cancelled orders. After matching the cancelled orders against available sales, I excluded customers whose total sales amount still remained negative because it was obvious some data was missing. While it is common practice for online vendors to refund customers on cancelled orders, it is highly unreasonable to cancel an order that’s not was never made.

To get day count and measure recency, I subtracted each customer’s last transaction date (MAX date) from 01/01/2012 and counted the days in between. Additionally, I extracted the total sales and transaction count for each customer, grouping and subqueries were used extensively to calculate these aggregations. For easy reference, I created a temporary table using the VIEW query to avoid repeating the exhaustive process of getting these values.

I used the NTILE query to divide the customers into ranked groups of 5 as equally as possible. I exported the cleaned and refined data into MS Excel to categorize the data into segment labels, the sorting and filtering function was used for this. The Text-to-column feature to split the ‘day_count’ column and separate the number of days from the string.

Data Insights

Customer Segmentation

This chart shows the client has about 4300 customers of which 1343 are either Lost or Others, and an additional 361 About to Sleep, that’s about 40% of their customer base. They were right about their customer churn rate.

Average Customer Spending

Surprisingly, despite the churn rate we can see here that the high monetary values come from the customer groups who are already very loyal or could easily be converted to become loyal.

Average Purchase Frequency

Here we see the Potential Loyalists outpacing the Loyal Customers, this is promising.

Charts were made with Tableau Public and can be found here.

Segment description and actionable steps to take

• Champions — These are the best customers and highest scorers, mostly 5s across the board. They need to be rewarded for their consistent patronage and spending, this could be done by awarding them redeemable points or providing early access to new products.

• Loyal customers — They buy on a regular basis and often spend good money too! This group needs continuous engagement and will respond well to promotions and campaigns. Provide personalized value-added offers based off previous orders, as well as loyalty discounts.

• Potential loyalists — They are recent customers who spend sizable amounts frequently, we can see the average spend and purchase frequency of this group is quite high when compared to loyalists. Quickly engage them as they could become loyalists or champions, they show readiness to spend so consider recommending products to them. Offering membership programs can also be the gentle push that bumps them into loyalists.

• New customers — They show to have purchased recently but not often, they’ll need to be won over. Focus on building relationships and incentivizing patronage; this can be achieved by sending welcome emails or offering points/discounts to be redeemed with next few purchases over a certain threshold.

• Promising customers — They show commitment on some level so concentrate on offering incentives to increase monetization; this can be done by providing free shipping for purchases over a certain threshold.

• At Risk customers — They used to be regular big spenders who have now massively declined in recency. Investigate customer service records to check for dissatisfaction with service/product, check for returned orders, survey them to better understand discontinued patronage. Concentrate on reconnecting with them by sending personalized offers on products.

• Can’t lose them — This group used to be regulars with considerable spending but haven’t bought anything in a good while. Apply similar efforts as with at risk customers.

• Needs Attention — This group has the mid numbers, not new but not quite big spenders or regulars, they need to be revived. Recommend other relevant products and promotions based on their purchase history.

• About to sleep & Others — They are the low, below-average, scorers. Offer discounted renewals and promotions to recapture.

• Lost — This group are the lowest scorers, mostly 1s across the board, they bought maybe one cheap item over a year ago and unsubscribed from the mailing list. No action necessary.

--

--

Francisca Ubah

Data Storyteller. Excited about translating data into useful information.