RFM Analysis for Customer Segmentation and Profiling using Excel

12 min readSep 10, 2023

RFM analysis is a customer segmentation and profiling technique that helps businesses better understand and categorize their customers based on their past purchasing behavior. It consists of three key factors:

1. Recency (R): This measures how recently a customer made a purchase, with more recent purchases receiving higher scores.
2. Frequency (F): It assesses how often a customer makes purchases, with more frequent buyers getting higher scores.
3. Monetary Value (M): This evaluates how much money a customer spends on purchases, giving higher scores to those who spend more.

After scoring customers on these factors, businesses can group them into different segments. These segments help tailor marketing and customer service strategies to meet each group’s specific needs. For example, “High-Value” customers who make large, frequent purchases may receive premium offers, while “Low-Engagement” customers who haven’t bought recently can be targeted with re-engagement campaigns.

In essence, RFM analysis simplifies customer data into when they last bought, how often they buy, and how much they spend. This helps businesses make informed decisions to improve customer relationships and revenue across various customer groups.

Introduction

In a quiet neighborhood, NeighborMart faced a crucial business problem. They had rich customer data but needed to segment their customers for targeted marketing campaigns. The challenge was to distinguish high-value customers from low-engagement ones.

Business Problem

NeighborMart needed to identify and categorize its customers into four distinct segments based on their buying habits and engagement levels. This segmentation would enable them to create effective marketing strategies tailored to each group.

Methodology for RFM analysis at NeighborMart

For this project, we will be using Microsoft Excel for cleaning, analysis, and Visualization.

1. Data Cleaning
Clean the dataset by filling in missing values, correcting wrongly spelled values, and removing duplicates.

2. Data Transformation
Create new columns based on existing columns, Calculate the customer's age based on their year of birth; create a new column for the total number of children; Calculate the total monetary value spent by each customer; and the frequency (number of purchases) for each customer.

3. Feature Selection
Identify and select the relevant features (columns) from the dataset that will be used for customer segmentation.

4. Percentrank
Create a new column for Percentrank for Monetary Value, Frequency, and Recency to normalize these metrics.

5. Create RFM Score
Calculate an RFM score based on the Percentrank values, summing the three columns created in the previous step, and create a new column for Percentrank.inc for the RFM score to segment customers based on their combined RFM values.

6. Customer Segment
Create a new column called “Customer Segment” and segment customers based on the RFM score using VLOOKUP. Update segment names as follows:
— Top customers
— Loyal customers
— At risk
— Immediate Attention

7. Pivot Chart for Customer Segments
Create a pivot chart displaying the count of customers in each segment (Top customers, Loyal customers, At-risk, and immediate attention).

8. Visualize and Analyze
Create visualizations that explain each segment and add key performance indicators (KPIs). Include demographic values (average age, income, number of children, marital status) for the top customers and the “Immediate Attention” segment to understand customer profiles better.

This methodology will enable NeighborMart to effectively segment customers into four categories and gain valuable insights for targeted marketing strategies to maximize customer value and engagement.

Dataset

For this project, we used a dataset from KAGGLE. It contains information on 2,240 customers, with each customer’s details organized in 27 columns. These columns include various pieces of information about each customer, and one of them is a unique customer ID.

Data Dictionary

Let’s begin…

Data Cleaning

Handling missing values

In order to check for missing values, we will use conditional formatting. This will be done by highlighting the entire dataset, creating a new formatting rule, formatting only cells that contain blanks, and filtering to only see cells with missing values.

By doing this, we noticed 24 missing values in the income column.

These missing values constitute 1.07% of our entire dataset, so removing the rows that contain these missing values will not cause harm to our analysis. They will not be missed.

2. Checking for duplicate rows

To identify duplicate rows, we will utilize the “Remove Duplicate” feature located in the Data tab section.

There were no duplicate rows in this dataset.

3. Checking values in each column for correctness and accuracy

To address this issue, we will begin by examining non-numeric columns, as the data within these columns can frequently contain spelling errors or redundancy.

In the dataset, Education and Marital_status are the only categorical features. After filtering both columns to identify the distinct categories, we observed certain inconsistencies.

3.1 Education column

The values under the Education column are Graduation, PhD, Master, 2n Cycle, Basic. These values can be further grouped into three distinct groups. Undergraduate, graduate, and postgraduate by using the “Find and Replace” feature.

Graduation=Graduate

2n Cycle, Basic = undergraduate

PhD, Master = postgraduate

Example of the find and replace feature. This will be done separately for all 5 groups in the education column

3.2 Marital status column

The values under the Marital Status column are Married, Together, Single, Divorced, Widow, Alone, Absurd, and YOLO. These values can be further grouped into two distinct groups. “Partner” and “Alone” by using the “Find and Replace” feature.

Married, Together = Partner

Single, Divorced, Widow, Alone, Absurd, and YOLO = Alone

Example of find and replace for marital status column

4. Check for outliers

As we review the numeric columns for unusual data points, we’ve come across ages listed as 123 and 124, with the highest age recorded as 130 years. While optimism is a great attitude, these particular values are certainly outliers that could have a detrimental impact on our dataset. We will remove them from our dataset

Data Transformation

Change the year of birth to age

A new column called age will be created from the year_birth column using the “=YEAR(TODAY()) — B2” formula in Excel.

2. Create a new column for the total number of children

We have two columns named ‘Kidhome’ and ‘Teenhome,’ and managing them separately isn’t very efficient. To improve this, we’ll create a new column that adds up the values from both ‘Kidhome’ and ‘Teenhome’ using the formula “=SUM(G2:H2)”. Once we calculate this sum, we’ll copy it and paste it as a fixed number in a new column.

3. Create a column for monetary value

Monetary Value, which represents the “M” in our RFM analysis, reflects the amount of money a customer spends on purchases. As previously mentioned, this can be computed by summing up the values from the “Mnt xxx” columns, each of which represents the amount spent on specific items in the shop, using the SUM function.

4. Create a column for Frequency

Frequency, representing the “F” in our RFM analysis, evaluates how frequently a customer makes purchases. Similar to the previous section, we will sum up data from the columns NumWebPurchases, NumCatalogPurchases, and NumStorePurchases to calculate Frequency.

Feature Selection

In the context of RFM (Recency, Frequency, Monetary) Analysis, we will focus exclusively on the relevant columns. This includes the demographic information columns such as Age, Education, Marital Status, Income, and Number of Children, in addition to the Recency, Frequency, and Monetary Value columns that were computed in the previous section.

Percentrank

In this section, we will create three new columns that compute the percent rank for Recency, Frequency, and Monetary Value. To achieve this, we will use the Percentrank.inc function in Excel. We chose this function because it includes the value itself in the ranking and produces results within the range of 0 to 1. To obtain whole numbers ranging from 0 to 10, we will multiply the result by 10.

Here’s the formula we will use:

=PERCENTRANK.INC($A:$A, A2, 1) * 10

- $A:$A: This represents the fixed range of values for which we want to calculate the percentile rank.
- A2: This corresponds to the specific value for which we wish to compute the percentile rank.
- 1: This signifies the number of significant digits to be used in the calculation.

By multiplying the result by 10, we ensure that the final output is expressed as a whole number between 0 and 10.

Our ranking formula assigns a score of 0 to the lowest and 10 to the highest values. This method works well for evaluating frequency and monetary value. However, when considering the recency column, we must factor in the concept of time within a year, which comprises 365 days. For instance, if one individual last made a purchase 365 days ago, and another did so just 2 days ago, we want the person who shopped 365 days ago to receive a lower score, while the person who shopped 2 days ago should receive a higher score based on recency.

To address this, we will implement the following formula for the recency rank column:

= (1 — PERCENTRANK.INC($A:$A, A2, 1)) * 10

percentrank value for recency, frequency, monetary value

Create RFM Score

We will generate a new column named “RFM” by summing the individual scores for recency, frequency, and monetary value. Additionally, we will use the Percentrank.inc function to determine the ranking score for this newly created RFM column.

Customer Segmentation

We will create a look-up table showing our customer type and RFM score.

Using the lookup table as a reference, we will construct a VLOOKUP formula that examines the RFM score for each customer and fills in the associated customer segment value based on the lookup table.

VLOOKUP formula for this:

=VLOOKUP(M2,$R$4:$S$15,2,FALSE)

Pivot Chart for Customer Segments

To get the statistics for each customer group, we’ll use a pivot table. This table will sort customers into their respective groups and provide key statistics for each one, including the total customer count, average( age, income, number of children), and most common marital status and education level within each group.

1. Pivot table for the total number of customers in each segment

We want to get the total number of top customers, loyal customers, at-risk customers, and immediate attention customers using a pivot table.

2. Pivot table for the average recency, frequency, and monetary value for each customer segment

We will get the average value for recency, frequency, and monetary value for each customer segment.

3. Pivot table for the average age, income, and number of children by customer segment

To construct a profile for each customer segment, we will calculate the average values for age, income, and the number of children within each segment.

We observe that the average age of Top Customers is 55, with an average income of $72,790, and no children.

4. Pivot table for the Customer Count by Education Group within each Segment

The Education column is a categorical variable, and it is not suitable for calculating an average value. Instead, we should determine the mode, which represents the most frequently occurring value within this categorical column.

However, it’s important to note that the mode function is typically designed for numerical data and doesn’t work directly with categorical values. To find the mode for each customer segment, we will create a pivot table. Within this pivot table, we will group the data by customer segments and further subgroup them by education categories within each segment. Then, we will calculate the total count for each education category within these subgroups. The category with the highest count within each segment will be considered the mode value for that particular segment.

The most prevalent education group among the different customer segments is the “Graduate” group. Specifically, within the top customers, there are 212 individuals in the Graduate group. Among loyal customers, there are 307 individuals in the Graduate group. For at-risk customers, there are 366 individuals in the Graduate group. Finally, among customers needing immediate attention, there are 231 individuals in the Graduate group. This data clearly indicates that the “Graduate” group is the most frequently occurring education category across all customer segments.

5. Pivot table for the Customer Count by Marital Status Group within each Segment

The marital status column is also a categorical variable. Therefore, the above section applies here also.

The most prevalent marital status group among the different customer segments is the “Partner” group. Specifically, within the top customers, there are 269 individuals in the partner group. Among loyal customers, there are 365 individuals in the partner group. For at-risk customers, there are 495 individuals in the partner group. Finally, among customers needing immediate attention, there are 300 individuals in the partner group. This data clearly indicates that the “Partner” group is the most frequently occurring marital status category across all customer segments.

Customer Segment Profile

We will combine information from the pivot tables to build the customer profile for each segment.

Insight Section

A total of 2,213 customers were analyzed, and their distribution across different segments is as follows:
- 19% were categorized as top customers.
- 26% were classified as loyal customers.
- 33% were identified as at-risk/need attention customers.
- 21% fell into the immediate attention customer category
For Top Customers:
On average, they made 21 purchases, their average spending per purchase was $1303, and their last product purchase by top customers occurred, on average, 21 days ago.
At-Risk and Immediate Attention Customers (combined):
The combined average for customers in the at-risk and immediate attention segments was 7 purchases, these customers spent an average of $196 on their purchase, and their most recent product purchase took place, on average, 56 days ago.

Recommendations for marketing strategies

Top Customers

1. Exclusive Offers: Provide special offers or discounts exclusively for your top customers to make them feel appreciated. This could include early access to sales, loyalty rewards, or VIP treatment.
2. Personalized Recommendations: Use their purchase history and preferences to offer personalized product recommendations. Show them products related to their past purchases or complementary items.
3. Priority Support: Offer priority customer support to resolve any issues or questions they may have quickly. Excellent customer service can enhance their loyalty.
4. Loyalty Program: Consider implementing a loyalty program that rewards them for their continued patronage. Accumulated points, tier-based rewards, or cashback incentives can keep them engaged.

At Risk/Need Attention and Immediate Attention Customers Combined

1. Re-Engagement Campaign: Launch a re-engagement campaign targeting these two segments. Send them personalized emails or messages with special offers or discounts to entice them to make a purchase.
2. Feedback Request: Ask for feedback on their previous experiences and how you can improve. Showing that you value their opinion can help rekindle their interest.
3. Customer Service Outreach: Proactively reach out to these customers to address any issues they may have had in the past. Make sure their concerns are resolved and their needs are met.
4. Win-Back Offers: For customers who haven’t made a purchase in a while, offer attractive win-back offers. This could be a discount, free shipping, or a bundle deal to tempt them back.