Customer Segmentation and Strategy using RFM Analysis in RStudio
What is RFM Analyis?
RFM (Recency, Frequency, Monetary) analysis is a proven marketing model for behavior based customer segmentation. It groups customers based on their transaction history — how recently, how often and how much did they buy.
RFM helps divide customers into various categories or clusters to identify customers who are more likely to respond to promotions and also for future personalization services.
RECENCY (R): Days since last purchase
FREQUENCY (F): Total number of purchases
MONETARY VALUE (M): Total money this customer spent
Example of RFM Scores by Segment
Read more about Segmentation
Order Recency Score
• Order Recency = 4 (ordered in the last 1 to 7 days)
• Order Recency = 3 (ordered in the last 8 to 14 days)
• Order Recency = 2 (ordered in the last 15 to 30 days)
• Order Recency = 1 (ordered more than 31 days ago)
Order Frequency Score
Frequency is based on quartiles.
• Order Frequency = 4 (more then top 25% orders)
• Order Frequency = 3 (quartile 3)
• Order Frequency = 2 (quartile 2)
• Order Frequency = 1 (1 order)
Monetary Score
Monetary is based on quartiles.
• Monetary = 4 (quartile 4)
• Monetary = 3 (quartile 3)
• Monetary = 2 (quartile 2)
• Monetary = 1 (quartile 1)
Dataset
The dataset can be downloaded from UCI Machine Learning Repository.
library(readxl)
trx <- read_excel(“Online Retail.xlsx”)
The dataset is between 01/12/2010 and 09/12/2011.So, I decided to made the analysis date is 01/01/2012.
Data Cleaning
Delete all negative value in Quantity and UnitPrice. We also need to delete all NA value.
library(dplyr)
trx<- trx %>%
mutate(Quantity = replace(Quantity, Quantity<=0, NA),
UnitPrice = replace(UnitPrice, UnitPrice<=0, NA))
trx <- trx %>%na.omit(trx)
Change the character variables to factors and calculate the GMV. So, we get the customer historical purchased dataset
trx <- trx %>%
mutate(InvoiceNo=as.factor(InvoiceNo), StockCode=as.factor(StockCode),
InvoiceDate=as.Date(InvoiceDate, ‘%m/%d/%Y’), CustomerID=as.factor(CustomerID),
Country=as.factor(Country))
trx <- trx %>%mutate(GMV = Quantity*UnitPrice)df_customer <- trx%>% select(CustomerID,InvoiceDate,GMV)
RFM Analysis
We have to process the data to get Recency,Frequency, and Monetary from each customers.
analysis_date <- lubridate::as_date(‘2012–01–01’, tz = ‘UTC’)
df_RFM <- trx %>%
group_by(CustomerID) %>%
summarise(recency=as.numeric(analysis_date-max(InvoiceDate)),
frequency =n_distinct(InvoiceNo), monetary= sum(GMV))
Check the quartiles
summary(df_RFM)
Calculate the score based on the quartiles
#Scoring
#R_score
df_RFM$R_Score[df_RFM$recency>164.8]<-1
df_RFM$R_Score[df_RFM$recency>73 & df_RFM$recency<=164.8 ]<-2
df_RFM$R_Score[df_RFM$recency>40 & df_RFM$recency<=73 ]<-3
df_RFM$R_Score[df_RFM$recency<=40]<-4#F_score
df_RFM$F_Score[df_RFM$frequency<1]<-1
df_RFM$F_Score[df_RFM$frequency>=1 & df_RFM$frequency<2]<-2
df_RFM$F_Score[df_RFM$frequency>=2 & df_RFM$frequency<5 ]<-3
df_RFM$F_Score[df_RFM$frequency>=5]<-4#M_score
df_RFM$M_Score[df_RFM$monetary<= 307.42]<-1
df_RFM$M_Score[df_RFM$monetary>=307.42 & df_RFM$monetary<674.49]<-2
df_RFM$M_Score[df_RFM$monetary>=674.49 & df_RFM$monetary<1661.74 ]<-3
df_RFM$M_Score[df_RFM$monetary>=1661.74]<-4#RFM_score
df_RFM<- df_RFM %>% mutate(RFM_Score = 100*R_Score + 10*F_Score+M_Score)
Segments
Let’s classify the customers based on the recency, frequency and monetary scores.
#Customer Segmentationdf_RFM$segmentRFM<-NULL
champions <- c(444)
loyal_customers <- c(334, 342, 343, 344, 433, 434, 443)
potential_loyalist <- c(332,333,341,412,413,414,431,432,441,442,421,422,423,424)
recent_customers <- c(411)
promising <- c(311, 312, 313, 331)
needing_attention <- c(212,213,214,231,232,233,241,314,321,322,323,324)
about_to_sleep <- c(211)
at_risk <- c(112,113,114,131,132,133,142,124,123,122,121,224,223,222,221)
cant_lose <- c(134,143,144,234,242,243,244)
hibernating <- c(141)
lost <- c(111)df_RFM$segmentRFM[which(df_RFM$RFM_Score %in% champions)] = “Champions”
df_RFM$segmentRFM[which(df_RFM$RFM_Score %in% loyal_customers)] = “Loyal Customers”
df_RFM$segmentRFM[which(df_RFM$RFM_Score %in% potential_loyalist)] = “Potential Loyalist”
df_RFM$segmentRFM[which(df_RFM$RFM_Score %in% recent_customers)] = “Recent customers”
df_RFM$segmentRFM[which(df_RFM$RFM_Score %in% promising)] = “Promising”
df_RFM$segmentRFM[which(df_RFM$RFM_Score %in% needing_attention)] = “Customer Needing Attention”
df_RFM$segmentRFM[which(df_RFM$RFM_Score %in% about_to_sleep)] = “About to Sleep”
df_RFM$segmentRFM[which(df_RFM$RFM_Score %in% at_risk)] = “At Risk”
df_RFM$segmentRFM[which(df_RFM$RFM_Score %in% cant_lose)] = “Can’t Lose Them”
df_RFM$segmentRFM[which(df_RFM$RFM_Score %in% hibernating)] = “Hibernating”
df_RFM$segmentRFM[which(df_RFM$RFM_Score %in% lost)] = “Lost”
Segment Size
df_RFM %>%
count(segmentRFM) %>%
arrange(desc(n)) %>%
rename(segmentRFM = segmentRFM, Count = n)
Visualization
library(ggplot2)ggplot(data = df_RFM) +
aes(x = segmentRFM, fill = segmentRFM) +
geom_bar() +
labs(title = “Customer Segmentation”,
x = “Segment”,
y = “Total Customer”) +coord_flip()+
theme_minimal()
What should we do next?
We can take action for each segment.
So, we have 8 segments in our dataset.
- 494 Champions: Reward them. Can be early adopters for new products. Will promote our brand.
- 728 Loyal Customers: Up-sell higher value products. Ask for reviews. Engage them.
- 635 Potential Loyalist: Offer membership / loyalty program, recommend other products.
- 56 Promising: Create brand awareness, offer free trials
- 706 Customers Needing Attention: Make limited time offers, Recommend based on past purchases. Reactivate them.
- 10 At Risk: Send personalized emails to reconnect, offer renewals, provide helpful resources.
- 270 Can’t Lose Them: Win them back via renewals or newer products, don’t lose them to competition, talk to them.
- 43 Hibernating: Offer other relevant products and special discounts. Recreate brand value.