Ogunbajo Adeyinka
7 min readJul 16, 2021

Building An RFM Model in Python

A Step by Step approach to building an RFM model for Customer Segmentation in Python

Photo by Firmbee.com on Unsplash

Companies have spent a lot of money on market research, but with technology changing customers’ behaviour and research methodologies all the time, there is a need for constant improvements. The Marketing team have long recognized the significance of customer orientation, since knowing, serving, and influencing consumers is critical to accomplishing marketing goals and objectives.

Customer segmentation has the ability to help company’s marketers and individual marketers reach out to each customer in the most efficient way possible. A customer segmentation study uses the enormous amount of data available on customers (and future customers) to identify distinct groups of consumers with a high degree of accuracy based on demographic, behavioural, and other factors.

RFM segmentation enables marketers to target specific groups of consumers with communications that are far more relevant to their individual behaviours, resulting in much greater response rates and improved loyalty and customer lifetime value. RFM segmentation, like other segmentation approaches, is an effective tool to identify groups of consumers who should be treated differently. RFM stands for recency, frequency, and monetary

There are several approaches to segmentation. However, I chose RFM Model for the following reasons:

  1. It employs objective numerical scales to produce a high-level picture of consumers that is both succinct and instructive.
  2. It’s simple enough that marketers can utilize it without expensive tools.
  3. It’s simple — the segmentation method’s output is simple to comprehend and analyze.

Basis

For this project, I will be building an RFM (Recency Frequency Monetary) model using an FMCG data set I downloaded on Kaggle just for the sake of this project( I know someone must have put it out there for free use, a big thank you to the anonymous). I am sure there are countless free data sets you can get on Kaggle for practice as well.

Intended Outcome

The purpose of this project is to build an RFM model that segments customers into sections listed below:

  • Can\’t Loose Them’
  • Champions
  • Loyal/Committed
  • Requires Attention
  • Potential
  • Promising
  • Demands Activation

So, Let’s get started right away

Step 1: Importing Required Libraries:

  • Before getting started let’s import libraries needed for the project using the python scripts below:
Python Script to import the Python libraries needed(Code by Ogunbajo Adeyinka)

Step 2: Explorative Data Analysis (EDA):

I consider this step sacred and important in all data science projects. Performing a detailed EDA helps you understand your data and know the best approach to tackling any project. You will get to know the missing values, correlating features and identify other trends present in the data set. Below is what the FMCG data sets look like:

Pandas Data frame Showing a Supermarket dataset(Screenshot from Jupyter Notebook written by Ogunbajo Adeyinka)

Now that we have our data in a suitable environment, It's often a great idea to take a look at the first samples(just to describe what our data looks like). This dataset is used to analyze merchant behaviour. Here are a few details about the features:

  • Invoice No: This is a unique number generated by this FMCG store to help trace payment details.
  • StockCode: This is a unique number assigned to each product in a particular category to help in stock keeping/tracking purposes.
  • Description: This explains the product’s why and provides information about the products.
  • Quantity: This gives the number of products purchased.
  • InvoiceDate: This represents the time stamp (time and date) on which the invoice has been billed and the transaction officially recorded.
  • UnitPrice: This refers to the price of each product.
  • CustomerID: This refers to the unique number assigned to each customer.
  • Country: This refers to the country in which the purchase is being made.

One question that should come to mind is “What is the unique identifier of each row in the data?” A unique identifier can be a column or set of columns that is guaranteed to be unique across rows in your dataset. This is key for differentiating rows and referencing them in our EDA. For this FMCG data set, we will be using for this project the CustomerID is the key.

Step 3: Data Preprocessing

Here, we have our data ready and will be performing some basic pre-processing on the data sets

  • We’ll be using the Python Script below to convert the InvoiceDate Feature from Object format to DateTime format.
Python Script to Convert InvoiceDate feature from object to DateTime format(Code by Ogunbajo Adeyinka)
  • We create a new column TotalSum column with the Python Script below:
Python Script to Create a TotalSum column(Code by Ogunbajo Adeyinka)
  • We then create a snapshot of the date, using our FCMG_data with the Python Script below:
Python Script to Create a snapshot date (Code by Ogunbajo Adeyinka)
  • We can group customerID after creating the snapshot date using the python script below
Python Script to Group by CustomerID (Code by Ogunbajo Adeyinka)

we proceed to rename our features — columns (InvoiceDate, InvoiceNo,TotalSum) with Recency, Frequency and Monetary respectively but just before that, lets define some terms:

  • Recency: The more recently a customer has interacted or transacted with a brand. How long has it been since a customer engaged in an activity or made a purchase with the brand? The most common activity is a purchase for an FMCG store, though other examples include the most recent visit to a website or the use of a mobile app for other scenarios/industries.
  • Frequency: During a given time period, how many times has a consumer transacted or interacted with the brand? Customers who participate in activities regularly are clearly more involved and loyal than those who do so infrequently. it answers the question, how often?
  • Monetary: This factor, also known as “monetary value,” reflects how much a customer has spent with the brand over a given period of time. Those who spend a lot of money should be handled differently from customers who spend a little. The average purchase amount is calculated by dividing monetary by frequency, which is a significant secondary element to consider when segmenting customers.

Now we can relate the relationship between (InvoiceDate & Recency, InvoiceNo & Frequency, TotalSum & Monetary) .

  • Here, is a Python Script to rename the columns:
Python Script to rename the columns (Code by Ogunbajo Adeyinka)
  • we can explore the rows and shape of the data frame using the python scripts below:
Python Script to print the shape of the Pandas data frame (Code by Ogunbajo Adeyinka)
  • Then we have the top 5 rows of our data frame as this:
Pandas Data frame Showing the top 5 rows and shape of the data frame (Screenshot from Jupyter Notebook written by Ogunbajo Adeyinka
  • We can plot the distribution using the Python Script below:
Python Script to Plot the RFM distributions (Code by Ogunbajo Adeyinka)
  • We have our RFM distribution plot below:
A Plot of Recency and Frequency distributions (Screenshot from Jupyter Notebook written by Ogunbajo Adeyinka
A Plot of Frequency and Monetary distributions (Screenshot from Jupyter Notebook written by Ogunbajo Adeyinka

Step 4: Building the RFM Groups

  • We’ll be Calculating the R,F and M groups,
  • Creating labels for Recency, Frequency and Monetary Value,
  • Assigning labels created to 4 equal percentile groups,
  • Then create new columns R, F and M.

Here, is the python script to create the RFM Groups below:

Python Script to calculate the RFM Groups (Code by Ogunbajo Adeyinka)
  • we will have an output showing the updated data frame below:
Pandas Data frame Showing the Calculated R, F and M groups of the data frame (Screenshot from Jupyter Notebook written by Ogunbajo Adeyinka

Step 5: Building the RFM Model

  • We have to concatenate the RFM quartile values to create RFM segments using the python scripts below:
Python Script to create RFM Segments (Code by Ogunbajo Adeyinka)
  • We will have a Pandas Data frame Showing the Created RFM Segments below:
Pandas Data frame Showing the Created RFM Segments of the data frame (Screenshot from Jupyter Notebook written by Ogunbajo Adeyinka
  • Now let's count the number of unique segments
  • Then Calculate the RFM score with the python scripts below.
Python Script to calculate the RFM score (Code by Ogunbajo Adeyinka)
  • Here we have already calculated the RFM Score of each customer and we have the data frame of the RFM score as below:
Pandas Data frame Showing the Calculated RFM score of each customer in the data frame (Screenshot from Jupyter Notebook written by Ogunbajo Adeyinka
  • Then we create a conditional Statement using the python scripts below to segment Customers (by customerID column) as either Can’t Lose Them, or Champions, or Loyal/Committed, or Potential, or Promising, or Requires attention or Demands Activation:
Python Script to calculate the RFM Level (Code by Ogunbajo Adeyinka)
  • We have a Pandas Data frame Showing the Calculated RFM Level of each customer in the data frame below as:
Pandas Data frame Showing the Calculated RFM Level of each customer in the data frame (Screenshot from Jupyter Notebook written by Ogunbajo Adeyinka
  • Calculating the average values for each RFM_Level, and return a size of each segment using the python script below:
Python Script to calculate the average RFM Level and to return the size of each segment (Code by Ogunbajo Adeyinka)
  • We have a Pandas Data frame Showing the Calculated values for each RFM_Level of each customer in the data frame below as:
Pandas Data frame Showing the Calculated values for each RFM_Level of each customer in the data frame (Screenshot from Jupyter Notebook written by Ogunbajo Adeyinka

Step 6: The Data Visualization of Customers Segmented Using the RFM Model

  • Squarify library: I chose Squarify because,squarify library is built on top of Matplotlib, and it uses space efficiently.
  • Plotting the RFM level on the Squarify plot using the Python Script below:
Python Script to Plot the RFM with Squarify(Code by Ogunbajo Adeyinka)
  • Here we have our final dashboard showing how we’ve segmented customers using the RFM Model below :
A Plot of Customer RFM Segmentation (Screenshot from Jupyter Notebook written by Ogunbajo Adeyinka

Final Thoughts

Thanks for taking the time to read this article. You can read more articles by going to my profile(more will be available soon).

Remarks

All the references used were hyperlinked within the article. To see the complete python code written on Jupyter Notebook, Github, and my Social Media pages. Kindly use the links below:

Ogunbajo Adeyinka

Artificial Intelligence 🤖 | Data Science 🔬 📈 | Product Management 🎨