RFM Analysis: An Effective Customer Segmentation technique using Python

Anand Singh
Capillary Data Science
4 min readMay 28, 2020

RFM analysis enables personalized marketing, increases engagement, and allows you to create specific, relevant offers to the right groups of customers.

This post explores the benefits of RFM analysis, shares step by step instructions on how to perform RFM analysis in Python, and finally showcases the created RFM customer segments to maximize ROI.

What is RFM Analysis?

RFM analysis is a data-driven customer behavior segmentation technique where RFM stands for recency, frequency, and monetary value.

The idea is to segment customers based on when their last purchase was(Recency), how often they’ve purchased in the past(Frequency), and how much they spent(Monetary). All three of these measures have proven to be effective predictors of a customer’s willingness to engage in marketing messages and offers.

How to do a RFM Analysis in Python?

We will follow 5 steps to do RFM analysis, which will be explained in subsequent steps taking the data from an apparel retail store.

Importing useful Python libraries:

import numpy as np
import pandas as pd
import datetime as dt
from datetime import datetime
import datetime as dt
from datetime import datetime
%matplotlib inline
import matplotlib.pyplot as plt
import matplotlib.ticker as mticker
import seaborn as sns
import squarify
from sklearn.cluster import KMeans

Step 1: Data Import

rfm_single_view=pd.read_csv('RFM Data.csv')
rfm_single_view.head()

The sample records of the imported data:

Step 2: Data Preprocessing

It includes two steps:

I- Dropping records with empty values

rfm_single_view.dropna(axis=0,inplace=True)

II- Removing Top 1% record for analysis (as they might skew the analysis), these customers can be studied separately, if they are outliers or genuine bulk buyers.

#Creating Individual tables of RFM after removing top 1%recency_cleaned = rfm_single_view[rfm_single_view['Recency']<rfm_single_view['Recency'].quantile(0.99)]
frequency_cleaned = rfm_single_view[rfm_single_view['Visits']<rfm_single_view['Visits'].quantile(0.99)]
monetary_cleaned = rfm_single_view[rfm_single_view['Spend Per Visit']<rfm_single_view['Spend Per Visit'].quantile(0.99)]
#Merging three dataframes to create rfm tablerfm_table=pd.merge(pd.merge(recency_cleaned[['Cap User ID','Recency','Visits','Spend Per Visit']],frequency_cleaned[['Cap User ID']],on='Cap User ID'),monetary_cleaned[['Cap User ID']],on='Cap User ID')

The sample records of the data after preprocessing, Visits become Frequency whereas spend per visit has been taken as the Monetary field.

Step 3: Deciding RFM Clusters

First, we decide on the optimum no of clusters. Here, we get 3 as optimum no of clusters which means there will be three cuts for recency, frequency, and monetary each. This is done using the K-means clustering algorithm.

Python Code for finding RFM Clusters

Data visualizations after deciding RFM clusters

Distribution of RFM with Customer counts

Step 4: Finding a Combined RFM Score

Now the individual RFM scores ranging from 0 to 2 as we decided on 3 clusters are summed up to get a combined RFM score against each customer.

The sample records of the data after combined RFM Score

Step 5: Generating Unique Customer Segments based on RFM Score

1.Core - Your Best CustomersRFM Score: 222Who They Are: Highly engaged customers who have bought the most recent, the most often, and generated the most revenue.2.Loyal - Your Most Loyal CustomersRFM Score: X2XWho They Are: Customers who buy the most often from your store.3.Whales - Your Highest Paying CustomersRFM Score: XX2Who They Are: Customers who have generated the most revenue for your store.4.Rookies - Your Newest CustomersRFM Score: 20XWho They Are: First time buyers on your site.5.Slipping - Once Loyal, Now GoneRFM Score: 00XWho They Are: Great past customers who haven't bought in awhile.6.Regular - The customers having common behaviour across these metrics.RFM Score: Remaining ScoresWho They Are: Customer who have average metrics across each RFM scores.

Snapshot of some of the KPI’s against each customer segment clearly shows the best groups are the Core and Loyal customer segments.

Let’s create a nice visualization for our data.

#Create our RFM Segment plot and resize it.
fig = plt.gcf()
ax = fig.add_subplot()
fig.set_size_inches(12, 8)
squarify.plot(sizes=rfm_single_view_after_tags_v1['CustomerCount'],
label=['Core',
'Loyal',
'Regular',
'Rookies',
'Slipping',
'Whales'], alpha=0.8 )
plt.title("RFM Segments",fontsize=18,fontweight="bold")
plt.axis('off')
plt.show()
TreeMap visualization of RFM Customer Segments

Conclusion

RFM technique is a proven marketing model that helps retailers and e-commerce businesses maximize the return on their marketing investments.

The above-generated RFM customer segments can be easily used to identify high ROI segments and engage them with personalized offers

--

--