RFM Analysis for Customer Segmentation

Busra Y.
11 min readJul 6, 2020

--

Hello data enthusiasts!

I am back with another article. In this post, I will describe RFM analysis and show how to use it for customer segmentation by analyzing a online retail shop’s data set on python. Based on the results of RFM analysis, I will exemplify what kind of actions can be taken for different kinds of customers.
Let’s start with background information!

What is RFM?

RFM is abbreviation for Recency, Frequency and Monetary. It is a technique that helps determine marketing and sales strategies based on customers’ buying habits.

Recency: Time passed since the customer’s last purchase. In other words, it is the “time passed since the last contact of the customer”.

It is found from the formula:

Recency= RFM analysis date — Last purchase date

For example, if we are doing this analysis today, then the analysis date is today’s date.

Frequency: Total number of purchases. It shows how frequently the customer does shopping. It can be found from the number of the invoices that one customer has.

Monetary (Monetary Value): Total spending by the customer.

Customer segmentation is the process of separating these values into groups by scoring between 1 and 5. Depending on these scores, the customers are segmented into different groups. These groups can be shown on the Recency and Frequency Grid as the following:

A low recency and frequency score (bottom left) shows the hibernating customers who haven’t been purchased anything recently or frequently. A high recency and frequency score (top right) shows the champions who have been recently and frequently purchasing.

Users are ranked by a score of 1 through 5, based on their percentile, with 5 being the highest and 1 being the lowest:

  • Champions [R(4–5), F(4–5)]
  • Loyal Customers [R(3–4), F(4–5)]
  • Potential Loyalists [R(4–5), F(2–3)]
  • Promising [R(3–4), F(0–1)]
  • Can’t Lose Them [R(1–2), F(4–5)]
  • At Risk [R(1–2), F(3–4)]
  • About to Sleep [R(2–3), F(1–2)]
  • Hibernating [R(1–2), F(1–2)]
  • New Customers R [(4–5), F(0–1)]
  • Need Attention R [(2–3), F(2–3)]

After deciding which customers belong to which group, customer-specific sales and marketing techniques are developed.

Now, I will apply RFM analysis an online retail shop’s data set and I will suggest few marketing strategies based on different customer segments. This analysis is done by using Python. You can find my code in Kaggle:

https://www.kaggle.com/busrayaman/customer-segmentation-with-rfm-analysis

Business Case: RFM Analysis on an Online Retail data set

An e-commerce company which sells souvenirs wants to segment its customers and determine marketing strategies according to these segments.

For this purpose, we will define the behavior of customers and we will put the customers into same groups who exhibit common behaviors and then we will try to develop sales and marketing techniques specific to these groups.

The data set “online_retail_II” includes the sales of this online shop between 01/12/2009 – 09/12/2011. For this study, the data of the year 2010–2011 is chosen.

Variables:
InvoiceNo: Invoice number. It is a unique value. If this code starts with C, it means refund.
StockCode: Product code. Unique number for each product
Description: Product name
Quantity: Number of products. It means how many of the products in the invoices are sold. Those who start with C get negative value
InvoiceDate: Invoice date and time
UnitPrice: Product price (in pounds)
CustomerID: Customer number. Unique number for each customer
Country: Country name. Refers to the country where the customer lives

Data Understanding

First we need to import required libraries:

Reading the input excel file’s sheet ‘Year 2010–2011’ and saving it as data frame ‘df_2010_2011’:

Then this data frame’s copy is assigned to a data frame called ‘df’ so that we can keep the initial original data frame just in case.

We try to understand the data by using the functions that can be used as a first look at the data:

The customer ID type is float so we have converted it to integer:

Let’s check how many products we have:

Let’s check if we have null values and if we have, we will drop them as we cannot analyze them in the RFM analysis.

Let’s find out how many invoices are there:

How much money has been earned per invoice? It is necessary to create a new variable by multiplying two variables ‘Quantity’ and ‘Price’:

Data Preparation

In the previous section, we tried to understand the data set. In this section, we will prepare the data set for RFM analysis.

First, we have removed the rows with refund invoices that starts with ‘C’ and assign it to a new data frame:

Then we have checked its shape, info and its statistical information:

Customer Segmentation with RFM Scores

  1. Finding the Recency values:

Let’s see the earliest and the latest purchase dates of our data set:

Now if we take today’s date as the analyse date, then there will be a very serious difference. For this reason, we have set the next day of the maximum day of the data set as the analyse date (today’s date). Now, we can find the recency scores according to the day of the last recording:

For each customer, we need to subtract the customers’ last purchase date from today’s date and we will keep this as a temporary data frame:

We have changed the name of column ‘InvoiceDate’ to ‘Recency’:

We will capture every row and take the days. Since it is a variable in time type, it takes days when you say x.days, months when you say x.months, and years when you say x.years

2) Finding the Frequency values

If we find the different number of invoices issued to each customer, we will find the total number of purchases of the customer. This is our frequency value.
Firstly, by grouping on the basis of Customer ID and Invoice Number, we observe how many times each invoice is multiplied on the basis of Customer. Let’s look at the random 5 observations:

Let’s find how many times the customer did shopping:

We assigned this to a new data frame called ‘freq_df’ and changed the ‘Invoice’ column name to ‘Frequency’:

3) Finding the Monetary values:

Monetary is the total spend of the customer. The sum of the ‘TotalPrice’ values, which we have calculated for each customer and added as a new column before, is that customer’s monetary value:

We have changed the ‘TotalPrice’ column name to ‘Monetary’:

4) Defining the score values:

Lets bring all the values together under a new data frame called ‘rfm’:

Normally the smallest of the recency scoring, which is 1, is the best recency score. However, we will define this in reverse and put the value 5 as the best recency value so that it will be in the same order as the others:

For frequency, using ‘qcut’ method gave out an error, so I have used ‘cut’ method and manually defined the cut values. I have checked the statistical information of frequency values to cut the data frame as equal as possible. I got the values of 0, ‘min’, 50%, 60%, 90% and ‘max’ as the cut values.

The monetary values are scored from 5 as the best to 1 as the worst:

A new column for rfm scores are added to the data frame as a new column:

Let’s see the best and the worst customers:

To segment the customers using “Recency” and “Frequency” values, we have set up regular expression (regex) structure by using dictionaries to name customer segments according to their Recency and Frequency Scores:

As the monetary is a similar value to frequency, it is not considered in scoring. Recency and frequency points are obtained with the following code:

Pull the value corresponding to the ranges of the score in the dictionary defined in ‘seg_map’, and add it as a new column named “Segment”:

Now our data frame is ready and all the customers are grouped. We can retrieve data of all the groups one by one and save them as new data frames. For example, let us do this for the ‘Need Attention’ segment:

We will save this group as a new data frame named ‘need_attention_df’:

We can convert this data frame to an excel file:

Same process can be done for the all the segments. We can send this excel files to Sales and Marketing Department so that they can take further actions.

Comments and Actions

Let’s retrieve statistical values of recency, frequency and monetary values according to the customer segments:

To exemplify, we can choose the 3 groups to suggest actions for them.

  1. Can’t Lose Segment [R(1–2), F(4–5)]:
  • There are 12 customers in this group.
  • They have not been shopping for an average of 136.83 days.
  • They shopped an average of 14.92 times.
  • We have gained an average of 4150.68 pounds from them.

This is a group we need to re-gain no matter because they are the customers which frequently shopped for big amounts but long time ago. We need to remind ourselves to them. We can send them campaign mails where we offer discounts on the products similar to what they have bought before. We can also send small surveys to get feedback about their shopping experience. We can make corrections based on the survey results. We can offer them renewals or newer products as well.

2. Need Attention Segment R [(2–3), F(2–3)]:

  • There are 121 customers in this group.
  • They have not been shopping for an average of 50.29 days.
  • They shopped an average of 3 times.
  • We have gained an average of 1167.57 pounds from them.

These customers have average recency and frequency scores. We can win them with small effort because they have spent good amount of money per transaction and this means the product prices is not a big issue for them. They just stopped shopping for some reason so we need to find out the reason for it and we can remind ourselves to them by offering them special discounts and free fast deliveries for limited time. This way, we can increase their shopping frequencies. We can also let them know that they are only few steps away from becoming a loyal customer where we can offer more discounts. This can encourage them to buy more frequently.

3. At Risk Segment [R(1–2), F(3–4)]:

  • There are 380 customers at risk.
  • They have not been shopping for an average of 135.37 days.
  • They shopped an average of 4 times.
  • We have gained an average of 1603.31 pounds from them.

These are the customers who have not shopped for a long time. They are the most important group to pay attention after the ‘can’t lose’ group because this group also shopped frequently but long time ago. Now, they are about to leave us. We can contact them by email or by phone to find out the reason why they have stopped buying from us. Seeing the average amount of their purchases, it seems the money is not the main problem for them. It could be the delivery problems or offers from the competitors. We can offer them better and more convenient delivery terms (cheaper and faster). We can offer them big discounts for their purchases over a certain amount. They might have problems with the products they have purchased so we can offer them renewals or provide them helpful manuals.

Conclusion

RFM analysis, which is based on customers’ past behavior, is a powerful and frequently used method for companies. Understanding it and knowing how to use it is very important for increasing the sales because it helps to understand your customers and develop personalized marketing strategies for them.

I hope I could explain clearly how to make RFM analysis on a data set by using python in this article. See you in my following articles!

References:

  1. Veri Bilimi Okulu (Data Science School) Class Notes:

https://www.linkedin.com/in/veribilimiokulu/

2. https://clevertap.com/blog/automate-user-segmentation-with-rfm-analysis/

3. https://www.blastanalytics.com/blog/rfm-analysis-boosts-sales

4. The online retail data set is retrieved from the following page:

https://www.kaggle.com/hikne707/online-retail

5. My data set:

https://www.kaggle.com/busrayaman/customer-segmentation-with-rfm-analysis

--

--

No responses yet