RFM Analysis for Customer Segmentation and Profiling using Excel

Sefiat Kemi Adedeji
11 min readJan 11, 2024

--

Introduction

In today’s data-driven landscape, understanding customer behavior is key to navigating the competitive market. Among the myriad methods available, RFM Analysis stands tall as a powerful tool, offering invaluable insights into customer engagement and purchasing patterns.

RFM, short for Recency, Frequency, Monetary, delves deep into the core of customer interactions. This analytical technique segments customers based on three fundamental dimensions

  1. Recency (R): This refers to the measure of how recently a customer has engaged with a business, made a purchase, or interacted with a product or service.
  2. Frequency (F): This denotes the rate or number of times a customer engages with the business within a specified period. A higher frequency signifies a more engaged and loyal customer base.
  3. Monetary (M): This reflects the total value or monetary contribution of a customer to the business within a given period. Monetary also emphasizes the value each customer brings to the business.

By leveraging these dimensions, RFM Analysis enables businesses to categorize customers into distinct segments, unveiling hidden trends and preferences that might otherwise remain obscured.

Objectives

To understand the distribution and size of customers across segments, highlighting the prevalence of each group.

Comparing RFM values among segments to discern differences in engagement, purchase behavior, and spending patterns.

To analyze demographic factors within segments to identify preferences or correlations between customer profiles and behavior.

To Examine how educational backgrounds and marital status influence customer behavior within different segments, revealing potential trends or associations.

Methodology

Data Source

In this analysis, we’ll employ a dataset sourced from Kaggle, encompassing data on 2,240 customers and their purchasing behavior at a grocery store. The dataset comprises 27 columns, each containing valuable insights about each customers. You can download the dataset here.

Tool

We’ll utilize Microsoft Excel as our primary data tool for the process of data cleaning, analysis, and visualization.

Data Dictionary

Step-by-step for RFM Analysis

1. Data Cleaning: This involves imputing missing values, rectifying inaccuracies in spellings, and eliminating duplicate entries from the7 dataset

2. Data Transformation: Creating new columns based on existing columns, Calculate the customer’s age based on their year of birth; create a new column for the total number of children; Calculate the total monetary value spent by each customer; and the frequency (number of purchases)for each customer.

3. Feature Selection: Selecting the precise data fields from the dataset that hold relevance and significance for the analysis at hand.

4. Percentrank: Create a new column for Percentrank for Monetary Value, Frequency, and Recency to normalize these metrics.

5. Create RFM Score: Calculate an RFM score based on the Percentrank values, summing the three columns created in the previous step, and create a new column for Percentrank.inc for the RFM score to segment customers based on their combined RFM values.

6. Customer Segmentation: Create a new column called “Customer Segment” and segment customers based on the RFM score using VLOOKUP.

7. Pivot Table: Create a pivot chart displaying the count of customers in each segment (Top customers, Loyal customers, At-risk, and immediate attention). create a customer profile table that shows all four customer segment and the percentage of customer segment, average age, average income, average number of children, most common Education and Marital status

8. Visualization: Create visualizations that explain each segment and add key performance indicators (KPIs). Include demographic values (average age, income, number of children, marital status) for the top customers and the “Immediate Attention” segment to understand customer profiles better.

1. Data Cleaning

1.1 Separating data into tabs.

We employed the “Text-to-Columns” function to divide the CSV-formatted data into distinct tabs or columns, streamlining the organization and accessibility of the dataset for further analysis.

1.2 Checking for missing values.

Utilizing the COUNTBLANK() function, we scanned our dataset for any missing values. The analysis revealed 24 missing data points.

Subsequent column-wise filtering identified these gaps exclusively in the ‘income’ column.

While our dataset consists of over 2000 rows, prioritizing data integrity, we opted to heighten data quality by excluding the 24 rows with missing values from further analysis.

1.3 Checking for duplicate rows.

To identify and eliminate duplicate rows within the dataset. This process ensures the accuracy and reliability of our data, mitigating redundancy and potential errors in subsequent analyses.

The ‘Remove Duplicate’ was used and this can be found within the Data tab section

No values were duplicated.

1.4 Education Level Column

In the Education Column we have 5 Levels which are: 2n Cycle, Basic, Graduation, Masters, PhD.

so we need to condensed these levels into three broader categories which are: Undergraduate, Graduate and Post-graduate by using the “Find and Replace feature”.

a. Basics and 2n cycle — Undergraduate

b. Graduation — Graduate

c. Masters and PhD — Post-graduate

1.5 Marital Status Column

The Marital Status are of 8 different stages: Absurd, Alone, Divorced, Married, Single, Together, Widowed, and YOLO, we need to condensed these levels to just two

a. Absurd, Alone, Divorced, Single, Widow, and YOLO — Single

b. Married and Together — Married

To achieve this we need to use the “Find and Replace” feature.

2. Data Transformation

2.1 Create a Age column

This action aims to calculate the age of individuals based on their birth year data, providing a convenient reference for age-related analysis and segmentation within the dataset. To create a new column labeled “Age,” one approach involves employing the formula “=YEAR(TODAY()) — B2” or subtracting the values in the “Year_birth” column from the current year (2023).

We detected three Outliers in our Age column (123,124, 130). We proceed to remove them from the dataset.

2.2. Create a new column for the Number of children.

To derive a new column labeled “Number_of_children,” we will sum the values present in the “Kidhome” and “Teenhome” columns using the formula “=SUM(G2,H2)”. This calculation intends to consolidate the count of children across both columns, providing a comprehensive metric for child-related considerations within the dataset.

2.3 Create a column for monetary value

This is derived by Summing all the Mnt columns, Each of these columns represents expenditures on specific items within the shop. =SUM(L2:Q2)

2.4 Create a column for Frequency

To determine our Frequency which evaluates how frequently a customer makes purchases. We will sum up data from the columns NumWebPurchases, NumCatalogPurchases and NumStorePurchases to calculate the Frequency =SUM(T10:V10)

3. Feature Selection

We’re pinpointing the columns crucial for our RFM analysis, strategically eliminating non-essential ones. The finalized selection of columns vital for the RFM analysis comprises Age, Education, Marital Status, Income, and Number of Children, complemented by the Recency, Frequency, and Monetary Value columns. This streamlined dataset focuses on key metrics essential for our targeted customer segmentation and profiling.

4. Percentrank

Our aim is to generate three additional columns that calculate the percentile rank for Recency, Frequency, and Monetary Value. To accomplish this task, we’ll utilize the Percentrank.inc function. We opted for this function due to its inclusive nature, considering the value itself in the ranking process. To acquire whole numbers spanning from 0 to 10, we’ll subsequently multiply the obtained result by 10.

For frequency and monetary value scores, the below formula was used respectively.

=PERCENTRANK.INC($G:$G,G2,1)*10

=PERCENTRANK.INC($H:$H,H2,1)*10

Our ranking formula assigns a score of 0 to the lowest and 10 to the highest values.

In calculating the Recency score, we’ve adjusted the formula to incorporate the temporal aspect within a year. our objective is to grant a lower score to the former and a higher score to the latter, emphasizing that a recent purchase should yield a higher score compared to a longer time gap since the last transaction.

To address this, we will implement the following formula for the recency rank column:

= (1 — PERCENTRANK.INC($F:$F, F2, 1)) * 10

5. Create RFM Score

To compute our RFM Score, a column for RFM needs to be created. This score for each customer will be derived by summing the individual scores corresponding to Recency, Frequency, and Monetary Value. This allows us to synthesize a comprehensive RFM Score, consolidating the three key metrics into a singular evaluative measure for each customer.

RFM score column will be calculated using the PERCENTRANK formula and incorporated it into the table.

6. Customer Segmentation

We will create a table showing customer segment and RFM score. The criteria are delineated in the table provided below:

=VLOOKUP(M2,$S$16:$T$17,2,FALSE)

This VLOOKUP formula will be used to assign distinct customer segment categories to the RFM scores obtained.

7. PIVOT TABLE.

These tables serve as powerful tools, enabling the extraction of nuanced insights into the distinct traits, behaviors, and preferences exhibited within each segmented group.

7.1 Pivot table for the total number of customers and percentage of customer in each segment

7.2 Pivot Table showing the average Recency, Frequency, and Monetary value for each customer segment

7.3 Pivot Table showing the average age, income, and number of children by customer segment

7.4 Pivot Table showing the Customer Count by Education Group within each Segment.

The Graduate is the most prevalent across the four customers.

7.5 Pivot Table showing the Customer Count by Marital Status Group within each Segment.

7.6 create a customer profile table that shows all four customer segment and the percentage of customer segment, average age, average income, average number of children, most common Education and Marital status.

Insights

  1. Upon analyzing our dataset, a significant observation emerged regarding customer demographics: the majority, comprising 1429 individuals, are married, while 784 customers identify as singles. Moreover, a noteworthy trend in education groups was revealed, with a substantial portion of our customer base categorized as Graduates.
  2. At Risk/Need Attention

This Customers account for 33% of our customer segment. On average, this segment has made 9 purchases, spending approximately $285 per purchase. With an average age of 54, their income stands at $44,105. Identifying this segment allows us to focus on targeted strategies to re-engage and retain these customers, considering their purchasing behavior and demographic profile.

3. Immediate Attention

Customers needing Immediate Attention represent 21% of our total customer base. This segment demonstrates a noteworthy spending pattern, with an average purchase value of $55. However, they exhibit the lowest scores in frequency, monetary value, and recency metrics. On further examination, this segment, with an average age of 51, presents the lowest average income at $32,364 among similar segments.

4. Loyal Customer

This accounts for 26% of our customer segments, the Loyal Customers segment emerges as a robust cohort, showcasing notable engagement metrics. This segment demonstrates a commendable frequency of 17 purchases, following closely behind our Top Customers. Additionally, their substantial spending per purchase, averaging $960, signifies a consistent investment in our products or services.

5. Top Customer

The Top Customers Segment showcases compelling attributes within our dataset. On average, these customers made 21 purchases, illustrating a robust level of engagement. Notably, their average spending per purchase stands impressively high at $1303, indicating their propensity for high-value transactions.

Moreover, this segment boasts an exceptional average income of $72,790, the highest among the customer segments analyzed. This noteworthy financial capacity highlights their potential for substantial purchasing power and long-term value to our business.

Recommendations

At Risk/Need Attention

  1. Customer Experience Enhancement: Focus on enhancing customer experience through improved service, seamless transactions, or dedicated customer support. Positive experiences can reignite their engagement and loyalty.
  2. Loyalty Programs: Introduce loyalty programs with incentives tied to repeat purchases. Offer rewards, points, or tier-based benefits to incentivize continued engagement.
  3. Feedback and Surveys: Seek their feedback through surveys or direct communication. Understanding their concerns or preferences can guide tailored improvements and offerings.

Immediate Attention

1. Value-Oriented Offers: Develop cost-effective offerings or bundled deals that provide value without demanding substantial spending, aligning with their lower average purchase value.

2. Sensitivity to Price Sensitivity: Consider pricing strategies that accommodate their income level, ensuring products or services remain accessible without compromising quality.

Loyal Customer

  1. Referral Incentives: Encourage these loyal customers to refer others through referral programs, offering incentives or benefits for successful referrals.
  2. Engagement-based Rewards: Offer rewards tied to continued engagement, such as milestone-based incentives or anniversary offers to celebrate their loyalty tenure.

Top Customer

1. Personalized Recommendations: Use data-driven insights to offer personalized recommendations or bespoke offerings, showcasing an understanding of their preferences.

2. Specialized Customer Service: Provide a dedicated account manager or concierge service, ensuring their unique needs and preferences are catered to promptly.

3. VIP Treatment and Exclusive Services: Offer personalized VIP treatment, exclusive services, or dedicated customer support to acknowledge and appreciate their high-value patronage.

Dashboard

Thanks for reading.

Link to the Excel file

--

--