Segmenting Customers using RFM Analysis

What is RFM Analysis and how to segment the customers and what can we do with this segmentation?

Harrisun Raj Mohan
3 min readJul 23, 2021

What is RFM Analysis and how is it used in customer segmentation?

RFM stands for Recency, Frequency and Monetary value which are traits pertaining to each customer. Recency shows how recent is the last purchase of a customer. Frequency shows how frequent does a customer make purchases. Monetary value shows the total sum of money that a customer has spent in his/her purchases. The RFM is a tool that lets the firm know who it’s best customers are.

Steps involved

The following are the steps involved in doing the RFM Analysis using python

  1. Importing libraries
  2. Data collection and exploration
  3. Data Cleaning
  4. Calculating Recency, Frequency and Monetary value for each CustomerId
  5. Finding the Best Customers, Almost Lost Customer and Lost Customers and exporting the list into different files.

Importing Libraries:

Libraries are imported in order perform various function. NumPy for working with arrays, Pandas for working with data structures and data analysis tools.

Data collection and Exploration:

The data we are going to use for the has different characteristics of the customers of a bank(link to the dataset). The dataset has attributes InvoiceNo, StockCode, Description, Quantity, InvoiceDate, UnitPrice, CustomerId and Country. Since the dataset is in excel format we are using read_excel to read the file and we have also mentioned encoding to be ‘latin-1’ as the default encoding would always be ‘utf-8’, so while reading a file with any other encoding, the corresponding encoding should be mentioned.

Output:

Output:

Output:

Output:

Data Cleaning:

Dropping unwanted columns:

InvoiceNo, StockCode, Description and Country are of no relevance, hence we are dropping them.

Output:

Checking for null values:

Output:

Dropping the rows with missing CustomerId:

Recheck for null values:

Output:

Output:

Calculating Recency, Frequency and Monetary value for each CustomerId

Calculating Recency:

Calculating Frequency:

Calculating Monetary value:

Merging calculated Recency, Frequency and Monetary value fields:

Output:

Assigning zone values:

Output:

Finding the Best Customers, Almost Lost Customer and Lost Customers and exporting the list into different files

Inference:

Out of all the types of customers why did we choose only the Best, Almost Lost and the Lost customers?

Why Best Customers?

The best customers are the store’s loyal customers. From these customers you’ll get to know what are the positives that make them stay as the store’s customer and the store can also provide them with some loyalty bonus which might turn these customers into promoters.

Why Lost Customers?

The lost customers are those who may have left you for some reason. Reaching out to them will help you understand the problem faced by them with the store during or after the purchase because of the product.

Why Almost Lost Customers?

The almost lost customers are the customers that the store is in the verge of losing. Hence offers, focused advertisements, etc., can be initiated to keep them from leaving.

For this business model, these three customer segments are considered important for analysis. There is no best fit for customer segmentation, based on one’s business model and the problem scenario, the choice of customer segments will .

Please go to my GitHub repository to access all my codes.

--

--

Harrisun Raj Mohan

Product Management enthusiast revelled in creating ingenious products which are customer-centric and data-driven that makes peoples’ lives better.