Maximizing Sales Revenue with RFM-Based Customer Analysis.

Introduction

Samuel Abu
11 min readDec 28, 2023

RFM analysis, a powerful tool in the realm of customer segmentation and behavioral targeting, offers businesses a strategic way to categorize and prioritize their customers. By delving into the dimensions of recency, frequency, and monetary value of transactions, RFM marketing serves as a dynamic compass for marketers and savvy entrepreneurs alike.

It empowers them to unravel the code of customer behavior, enabling a more precise allocation of resources and a laser-focused approach to making every marketing dollar count.

In this project, I will conduct an RFM (recency, frequency, and monetary value) analysis using a sample sales dataset sourced from Kaggle. For reference, you can access the original dataset through the following link.

Project Objective

The project’s main goal is to illustrate how RFM analysis can be a valuable tool for businesses in pinpointing their target audience. With RFM analysis, we can figure out the percentage of customers that make a difference in our business. Armed with this knowledge, we can create more effective marketing campaigns that boost brand loyalty and increase conversions as customers move through their journey with us. In simple terms, it’s about getting the most out of every customer relationship.

About the data

The dataset is acompany’s historic sales data with 23 attributes (columns) providing detailed information about each order :

  1. ORDERNUMBER: A unique number assigned to each order.
  2. QUANTITYORDERED: The quantity of items ordered.
  3. PRICEEACH: The price of each item.
  4. ORDERLINENUMBER: A line number for the order.
  5. SALES: The total sales amount for the order.
  6. ORDERDATE: The date on which the order was placed.
  7. STATUS: The current status of the order (e.g., ‘Shipped’).
  8. QTR_ID: The quarter in which the order was placed.
  9. MONTH_ID: The month in which the order was placed.
  10. YEAR_ID: The year in which the order was placed.
  11. PRODUCTLINE: The line of products ordered.
  12. MSRP: Manufacturer’s Suggested Retail Price.
  13. PRODUCTCODE: A unique code for the product.
  14. CUSTOMERNAME: The name of the customer.
  15. PHONE: The phone number of the customer.
  16. ADDRESSLINE1 and ADDRESSLINE2: The address of the customer.
  17. CITY: The city of the customer’s address.
  18. STATE: The state of the customer’s address.
  19. POSTALCODE: The postal code of the customer’s address.
  20. COUNTRY: The country of the customer’s address.
  21. TERRITORY: The sales territory.
  22. CONTACTLASTNAME and CONTACTFIRSTNAME: The name of the contact person for the customer.
  23. DEALSIZE: The size of the deal (e.g., ‘Small’, ‘Medium’).

Inspecting the Data.

For analyzing this dataset, SQL will be utilized for data querying and manipulation, while Tableau will be employed for visualizing and interpreting the data.

The initial stage involved examining the dataset and verifying the distinct values within specific columns;

--- Inspecting the Data 
Select *
From sales_data_sample

--- Checking Unique values
select distinct status from [dbo].[sales_data_sample]
select distinct year_id from [dbo].[sales_data_sample]
select distinct PRODUCTLINE from [dbo].[sales_data_sample]
select distinct COUNTRY from [dbo].[sales_data_sample]
select distinct DEALSIZE from [dbo].[sales_data_sample]
select distinct TERRITORY from [dbo].[sales_data_sample]

Findings from the query above;

  • There are six unique statuses in the dataset.
  • There are three distinct years represented in the dataset, indicating that the transactions cover the time period from 2003 to 2005.
  • There are seven distinct items in the product line, indicating that the company offers a total of seven different products.
  • Customers in this dataset represent a diverse group hailing from 19 different countries.
  • There are three deal sizes used to categorize transactions.
  • There are 4 terriroties

Analyzing the data.

What is the highest-selling product?

This could be answered by creating a query that utilizes the SUM aggregate function on sales while grouping the data by the product line.

-- Analyzing the data 
-- Grouping Sales by product line
select PRODUCTLINE, sum(sales) as Revenue
from [dbo].[sales_data_sample]
group by PRODUCTLINE
order by 2 desc
Results for the executed query above.

The results indicate that classic cars are the highest-selling product with a revenue of $3,919,615.33.

What year had the most sales?

This information could also be extracted by crafting a query that utilizes the SUM aggregate function on sales and groups the data by the “year_Id” column.

select YEAR_ID, sum(sales) Revenue
from [dbo].[sales_data_sample]
group by YEAR_ID
order by 2 desc
Results for the executed query above.

The results indicate that 2004 was the company's best year from a revenue standpoint.

The revenue for the year 2005 appears significantly lower when compared to other years. To investigate further, I executed a query to determine the number of months represented in the dataset for the year 2005.

Select distinct month_id from sales_data_sample
where YEAR_ID = 2005
Results for the executed query above.

Based on the findings, it appears that the company was operational for only five months during the year 2005, which explains the lower revenue in comparison to the years 2003 and 2004.

What is the largest deal size?

This information could also be obtained by creating a query that utilizes the SUM aggregate function on sales while grouping the data by “deal size.”

select  DEALSIZE,  sum(sales) Revenue
from [PortfolioDB].[dbo].[sales_data_sample]
group by DEALSIZE
order by 2 desc
Results for the executed query above.

The results show that the company’s biggest deal size category is medium.

In a particular year, which month saw the highest sales performance?

This could be done by crafting a query that calculates the count of order numbers, utilizes the SUM aggregate function on sales, and groups the data by “month_id.”

select  MONTH_ID, sum(sales) as Revenue, count(ORDERNUMBER) as Frequency
from [PortfolioDB].[dbo].[sales_data_sample]
where YEAR_ID = 2004
group by MONTH_ID
order by 2 desc
Results for the executed query above.

The highest selling month in the year 2004 is November.

The same steps are repeated for the years 2003.

select  MONTH_ID, sum(sales) as Revenue, count(ORDERNUMBER) as Frequency
from [PortfolioDB].[dbo].[sales_data_sample]
where YEAR_ID = 2003
group by MONTH_ID
order by 2 desc
Results for the executed query above.

The highest selling month in the year 2003 is also November.

We won’t be conducting this analysis for the year 2005 due to the limited 5 months of data available for that year.

The data reveals that November was the highest-selling month for this company in the years 2004 and 2005. Next, we aim to determine which product performed the best in November.

This inquiry can be addressed by creating a query that tallies the order numbers, employs the SUM aggregate function for sales, and groups the data by “month_id” and “product line.” Furthermore, the query filters for the month of November and the year 2004.

select  MONTH_ID, PRODUCTLINE, sum(sales) Revenue, count(ORDERNUMBER)
from [PortfolioDB].[dbo].[sales_data_sample]
where YEAR_ID = 2004 and MONTH_ID = 11
group by MONTH_ID, PRODUCTLINE
order by 3 desc
Results for the executed query above.

The results show that in the year 2004, the best-selling product in November was classic cars.

The same step was repeated for the year 2003.

select  MONTH_ID, PRODUCTLINE, sum(sales) Revenue, count(ORDERNUMBER)
from [PortfolioDB].[dbo].[sales_data_sample]
where YEAR_ID = 2003 and MONTH_ID = 11
group by MONTH_ID, PRODUCTLINE
order by 3 desc
Results for the executed query above.

The results show that in the year 2003, the best-selling product in November was also classic cars.

Who is the company’s best customer?

Determining the company’s top customer can be accomplished by composing a query that computes the total and average sales, counts the order number as a frequency indicator, and identifies the maximum order date. To retrieve the maximum order date, a sub-query was employed.

Calculation of RFM Metrics

To ascertain recency, the Date Difference function was utilized to subtract the customer’s most recent order date from the initial subquery’s result.

Select 
CUSTOMERNAME,
sum(sales) MonetaryValue,
avg(sales) AvgMonetaryValue,
count(ORDERNUMBER) Frequency,
max(ORDERDATE) last_order_date,
(select max(ORDERDATE) from [dbo].[sales_data_sample]) max_order_date,
DATEDIFF(DD, max(ORDERDATE), (select max(ORDERDATE) from [dbo].[sales_data_sample])) Recency
from [PortfolioDB].[dbo].[sales_data_sample]
group by CUSTOMERNAME
Results for the executed query above.

I established a Common Table Expression (CTE) named “RFM” based on the previous table. Subsequently, I employed the SQL tile function to partition the table into four distinct groups or buckets. I designated the measurements 1 through 4 to these tiles and allocated the frequency, recency, and monetary values accordingly to these four designated tiles.

;with rfm as 
(
select
CUSTOMERNAME,
sum(sales) MonetaryValue,
avg(sales) AvgMonetaryValue,
count(ORDERNUMBER) Frequency,
max(ORDERDATE) last_order_date,
(select max(ORDERDATE) from [dbo].[sales_data_sample]) max_order_date,
DATEDIFF(DD, max(ORDERDATE), (select max(ORDERDATE) from [dbo].[sales_data_sample])) Recency
from [PortfolioDB].[dbo].[sales_data_sample]
group by CUSTOMERNAME
)
select r.*,
NTILE(4) OVER (order by Recency desc) rfm_recency,
NTILE(4) OVER (order by Frequency) rfm_frequency,
NTILE(4) OVER (order by MonetaryValue) rfm_monetary
from rfm r

Afterward, I established an additional Common Table Expression (CTE) for the query described above. Within this CTE, I utilized the CAST function to modify the data types of the “rfm_recency,” “rfm_frequency,” and “rfm_monetary” columns.

;with rfm as 
(
select
CUSTOMERNAME,
sum(sales) MonetaryValue,
avg(sales) AvgMonetaryValue,
count(ORDERNUMBER) Frequency,
max(ORDERDATE) last_order_date,
(select max(ORDERDATE) from [dbo].[sales_data_sample]) max_order_date,
DATEDIFF(DD, max(ORDERDATE), (select max(ORDERDATE) from [dbo].[sales_data_sample])) Recency
from [PortfolioDB].[dbo].[sales_data_sample]
group by CUSTOMERNAME
),
rfm_calc as
(
select r.*,
NTILE(4) OVER (order by Recency desc) rfm_recency,
NTILE(4) OVER (order by Frequency) rfm_frequency,
NTILE(4) OVER (order by MonetaryValue) rfm_monetary
from rfm r
)
Select c.*, rfm_recency + rfm_frequency+ rfm_monetary as rfm_cell,
cast(rfm_recency as varchar) + cast(rfm_frequency as varchar) + cast(rfm_monetary as varchar)rfm_cell_string
from rfm_calc c

I then created a temporary table called #rfm for the CTEs in my query.

DROP TABLE IF EXISTS #rfm
;with rfm as
(
select
CUSTOMERNAME,
sum(sales) MonetaryValue,
avg(sales) AvgMonetaryValue,
count(ORDERNUMBER) Frequency,
max(ORDERDATE) last_order_date,
(select max(ORDERDATE) from [dbo].[sales_data_sample]) max_order_date,
DATEDIFF(DD, max(ORDERDATE), (select max(ORDERDATE) from [dbo].[sales_data_sample])) Recency
from [PortfolioDB].[dbo].[sales_data_sample]
group by CUSTOMERNAME
),
rfm_calc as
(
select r.*,
NTILE(4) OVER (order by Recency desc) rfm_recency,
NTILE(4) OVER (order by Frequency) rfm_frequency,
NTILE(4) OVER (order by MonetaryValue) rfm_monetary
from rfm r
)
Select c.*, rfm_recency + rfm_frequency+ rfm_monetary as rfm_cell,
cast(rfm_recency as varchar) + cast(rfm_frequency as varchar) + cast(rfm_monetary as varchar)rfm_cell_string
into #rfm
from rfm_calc c

Creating RFM Segments

I used a case statement to segment the customers based on their respective RFM columns. I used 5 segments:

  • Lost customers
  • Slipping away, cannot lose
  • potential churners
  • active customers
  • loyal customers
Select CUSTOMERNAME, rfm_recency, rfm_frequency, rfm_monetary,
case
when rfm_cell_string in (111, 112 , 121, 122, 123, 132, 211, 212, 114, 141) then 'lost_customers' --lost customers
when rfm_cell_string in (133, 134, 143, 244, 334, 343, 344, 144) then 'slipping away, cannot lose' -- (Big spenders who haven’t purchased lately) slipping away
when rfm_cell_string in (311, 411, 331) then 'new customers'
when rfm_cell_string in (222, 223, 233, 322) then 'potential churners'
when rfm_cell_string in (323, 333,321, 422, 332, 432) then 'active' --(Customers who buy often & recently, but at low price points)
when rfm_cell_string in (433, 434, 443, 444) then 'loyal'
end rfm_segment
From #rfm

Visualizaing the Data

The initial visualization I generated displayed revenue data grouped by country to help provide a clear and concise overview of revenue distribution across different countries.

The next visualization was a circle chart showing the deal size distribution allowing the viewer to quickly grasp the distribution patterns.

Next was a bar chart showing the company's annual revenue to help provide a clear and concise overview of the company’s financial performance over time.

Second Dashboard

The first visualization I created was a pie chart showing the company's revenue by product line to provide a visual breakdown of revenue contribution from different product lines.

Next was a line chart showing the company’s quantity distribution to provide a visual representation of how quantities of products are distributed over time.

The visualizations mentioned earlier were employed to craft the following dashboard called the sales dashboard below.

Second Dashboard

The Second visualization I created was a circle chart showing the company's revenue based on order status to help provide a visual breakdown of revenue attributed to different order statuses, allowing for a quick assessment of how orders are distributed across different states.

Next was a line chart showing the sales by product line and order year to help track the sales performance of different product lines over multiple years.

Next was a line chart showing monthly sales from all three years by country to help identify seasonal sales patterns in different countries. understanding when sales tend to peak or decline.

The visualizations above were used to create the following dashboard below.

A link to both dashboards can be found here.

Recommendations

Based on my findings, my recommendations to the company are as follows:

Identify high-value customers (high recency and frequency scores) for targeted retention campaigns as they are more often to respond to engagement efforts. Develop loyalty programs offering exclusive benefits or discounts to these customers to encourage repeat purchases. Lastly, the company should analyze purchase history to provide personalized recommendations.

Secondly, the company should encourage customers who buy often but spend less to increase their purchase value. This could be done by implementing upselling strategies by recommending higher-end products related to the customer's interests. Another way this could be done is by employing cross-selling by suggesting complementary products that enhance their primary purchases.

Thirdly the company should re-engage customers who have not made recent purchases by creating special offers or incentives to encourage them to revisit and make a purchase as well as sending personalized messages reminding them of the value your company offers.

The fourth recommendation is to enhance customer engagement through personalized communication, leverage RFM insights for tailoring marketing messages to various customer segments, customize communication to align with each customer’s purchase history and preferences, and offer product recommendations that match their interests. This approach should involve personalized email marketing, social media targeting, and direct mail, specifically designed for distinct RFM segments.

To maintain the relevance of the RFM model, it is essential to regularly update the analysis, possibly on a quarterly or annual basis, to reflect the most recent customer interactions and behaviors. Monitoring changes in customer segments over time is key to identifying trends and shifts in behavior, which in turn should inform adjustments in marketing and sales strategies.

Thank you for reading.

--

--