Building An RFM Model in Python
A Step by Step approach to building an RFM model for Customer Segmentation in Python
Companies have spent a lot of money on market research, but with technology changing customers’ behaviour and research methodologies all the time, there is a need for constant improvements. The Marketing team have long recognized the significance of customer orientation, since knowing, serving, and influencing consumers is critical to accomplishing marketing goals and objectives.
Customer segmentation has the ability to help company’s marketers and individual marketers reach out to each customer in the most efficient way possible. A customer segmentation study uses the enormous amount of data available on customers (and future customers) to identify distinct groups of consumers with a high degree of accuracy based on demographic, behavioural, and other factors.
RFM segmentation enables marketers to target specific groups of consumers with communications that are far more relevant to their individual behaviours, resulting in much greater response rates and improved loyalty and customer lifetime value. RFM segmentation, like other segmentation approaches, is an effective tool to identify groups of consumers who should be treated differently. RFM stands for recency, frequency, and monetary
There are several approaches to segmentation. However, I chose RFM Model for the following reasons:
- It employs objective numerical scales to produce a high-level picture of consumers that is both succinct and instructive.
- It’s simple enough that marketers can utilize it without expensive tools.
- It’s simple — the segmentation method’s output is simple to comprehend and analyze.
Basis
For this project, I will be building an RFM (Recency Frequency Monetary) model using an FMCG data set I downloaded on Kaggle just for the sake of this project( I know someone must have put it out there for free use, a big thank you to the anonymous). I am sure there are countless free data sets you can get on Kaggle for practice as well.
Intended Outcome
The purpose of this project is to build an RFM model that segments customers into sections listed below:
- Can\’t Loose Them’
- Champions
- Loyal/Committed
- Requires Attention
- Potential
- Promising
- Demands Activation
So, Let’s get started right away
Step 1: Importing Required Libraries:
- Before getting started let’s import libraries needed for the project using the python scripts below:
Step 2: Explorative Data Analysis (EDA):
I consider this step sacred and important in all data science projects. Performing a detailed EDA helps you understand your data and know the best approach to tackling any project. You will get to know the missing values, correlating features and identify other trends present in the data set. Below is what the FMCG data sets look like:
Now that we have our data in a suitable environment, It's often a great idea to take a look at the first samples(just to describe what our data looks like). This dataset is used to analyze merchant behaviour. Here are a few details about the features:
- Invoice No: This is a unique number generated by this FMCG store to help trace payment details.
- StockCode: This is a unique number assigned to each product in a particular category to help in stock keeping/tracking purposes.
- Description: This explains the product’s why and provides information about the products.
- Quantity: This gives the number of products purchased.
- InvoiceDate: This represents the time stamp (time and date) on which the invoice has been billed and the transaction officially recorded.
- UnitPrice: This refers to the price of each product.
- CustomerID: This refers to the unique number assigned to each customer.
- Country: This refers to the country in which the purchase is being made.
One question that should come to mind is “What is the unique identifier of each row in the data?” A unique identifier can be a column or set of columns that is guaranteed to be unique across rows in your dataset. This is key for differentiating rows and referencing them in our EDA. For this FMCG data set, we will be using for this project the CustomerID is the key.
Step 3: Data Preprocessing
Here, we have our data ready and will be performing some basic pre-processing on the data sets
- We’ll be using the Python Script below to convert the InvoiceDate Feature from Object format to DateTime format.
- We create a new column TotalSum column with the Python Script below:
- We then create a snapshot of the date, using our FCMG_data with the Python Script below:
- We can group customerID after creating the snapshot date using the python script below
we proceed to rename our features — columns (InvoiceDate, InvoiceNo,TotalSum) with Recency, Frequency and Monetary respectively but just before that, lets define some terms:
- Recency: The more recently a customer has interacted or transacted with a brand. How long has it been since a customer engaged in an activity or made a purchase with the brand? The most common activity is a purchase for an FMCG store, though other examples include the most recent visit to a website or the use of a mobile app for other scenarios/industries.
- Frequency: During a given time period, how many times has a consumer transacted or interacted with the brand? Customers who participate in activities regularly are clearly more involved and loyal than those who do so infrequently. it answers the question, how often?
- Monetary: This factor, also known as “monetary value,” reflects how much a customer has spent with the brand over a given period of time. Those who spend a lot of money should be handled differently from customers who spend a little. The average purchase amount is calculated by dividing monetary by frequency, which is a significant secondary element to consider when segmenting customers.
Now we can relate the relationship between (InvoiceDate & Recency, InvoiceNo & Frequency, TotalSum & Monetary) .
- Here, is a Python Script to rename the columns:
- we can explore the rows and shape of the data frame using the python scripts below:
- Then we have the top 5 rows of our data frame as this:
- We can plot the distribution using the Python Script below:
- We have our RFM distribution plot below:
Step 4: Building the RFM Groups
- We’ll be Calculating the R,F and M groups,
- Creating labels for Recency, Frequency and Monetary Value,
- Assigning labels created to 4 equal percentile groups,
- Then create new columns R, F and M.
Here, is the python script to create the RFM Groups below:
- we will have an output showing the updated data frame below:
Step 5: Building the RFM Model
- We have to concatenate the RFM quartile values to create RFM segments using the python scripts below:
- We will have a Pandas Data frame Showing the Created RFM Segments below:
- Now let's count the number of unique segments
- Then Calculate the RFM score with the python scripts below.
- Here we have already calculated the RFM Score of each customer and we have the data frame of the RFM score as below:
- Then we create a conditional Statement using the python scripts below to segment Customers (by customerID column) as either Can’t Lose Them, or Champions, or Loyal/Committed, or Potential, or Promising, or Requires attention or Demands Activation:
- We have a Pandas Data frame Showing the Calculated RFM Level of each customer in the data frame below as:
- Calculating the average values for each RFM_Level, and return a size of each segment using the python script below:
- We have a Pandas Data frame Showing the Calculated values for each RFM_Level of each customer in the data frame below as:
Step 6: The Data Visualization of Customers Segmented Using the RFM Model
- Squarify library: I chose Squarify because,squarify library is built on top of Matplotlib, and it uses space efficiently.
- Plotting the RFM level on the Squarify plot using the Python Script below:
- Here we have our final dashboard showing how we’ve segmented customers using the RFM Model below :
Final Thoughts
Thanks for taking the time to read this article. You can read more articles by going to my profile(more will be available soon).
Remarks
All the references used were hyperlinked within the article. To see the complete python code written on Jupyter Notebook, Github, and my Social Media pages. Kindly use the links below: