Customer Segmentation and Strategy using RFM Analysis in RStudio

Tri Imam Wicaksono
4 min readApr 7, 2019

--

What is RFM Analyis?

Source:www.blastam.com

RFM (Recency, Frequency, Monetary) analysis is a proven marketing model for behavior based customer segmentation. It groups customers based on their transaction history — how recently, how often and how much did they buy.

RFM helps divide customers into various categories or clusters to identify customers who are more likely to respond to promotions and also for future personalization services.

RECENCY (R): Days since last purchase
FREQUENCY (F): Total number of purchases
MONETARY VALUE (M): Total money this customer spent

Example of RFM Scores by Segment
Read more about Segmentation

Order Recency Score

• Order Recency = 4 (ordered in the last 1 to 7 days)
• Order Recency = 3 (ordered in the last 8 to 14 days)
• Order Recency = 2 (ordered in the last 15 to 30 days)
• Order Recency = 1 (ordered more than 31 days ago)

Order Frequency Score

Frequency is based on quartiles.
• Order Frequency = 4 (more then top 25% orders)
• Order Frequency = 3 (quartile 3)
• Order Frequency = 2 (quartile 2)
• Order Frequency = 1 (1 order)

Monetary Score

Monetary is based on quartiles.
• Monetary = 4 (quartile 4)
• Monetary = 3 (quartile 3)
• Monetary = 2 (quartile 2)
• Monetary = 1 (quartile 1)

Dataset

The dataset can be downloaded from UCI Machine Learning Repository.

library(readxl)
trx <- read_excel(“Online Retail.xlsx”)

The dataset is between 01/12/2010 and 09/12/2011.So, I decided to made the analysis date is 01/01/2012.

Data Cleaning

Delete all negative value in Quantity and UnitPrice. We also need to delete all NA value.

library(dplyr)
trx<- trx %>%
mutate(Quantity = replace(Quantity, Quantity<=0, NA),
UnitPrice = replace(UnitPrice, UnitPrice<=0, NA))
trx <- trx %>%na.omit(trx)

Change the character variables to factors and calculate the GMV. So, we get the customer historical purchased dataset

trx <- trx %>% 
mutate(InvoiceNo=as.factor(InvoiceNo), StockCode=as.factor(StockCode),
InvoiceDate=as.Date(InvoiceDate, ‘%m/%d/%Y’), CustomerID=as.factor(CustomerID),
Country=as.factor(Country))
trx <- trx %>%mutate(GMV = Quantity*UnitPrice)
df_customer <- trx%>% select(CustomerID,InvoiceDate,GMV)

RFM Analysis

We have to process the data to get Recency,Frequency, and Monetary from each customers.

analysis_date <- lubridate::as_date(‘2012–01–01’, tz = ‘UTC’)
df_RFM <- trx %>%
group_by(CustomerID) %>%
summarise(recency=as.numeric(analysis_date-max(InvoiceDate)),
frequency =n_distinct(InvoiceNo), monetary= sum(GMV))

Check the quartiles

summary(df_RFM)
Summary from dataset

Calculate the score based on the quartiles

RFM Scoring
#Scoring
#R_score
df_RFM$R_Score[df_RFM$recency>164.8]<-1
df_RFM$R_Score[df_RFM$recency>73 & df_RFM$recency<=164.8 ]<-2
df_RFM$R_Score[df_RFM$recency>40 & df_RFM$recency<=73 ]<-3
df_RFM$R_Score[df_RFM$recency<=40]<-4
#F_score
df_RFM$F_Score[df_RFM$frequency<1]<-1
df_RFM$F_Score[df_RFM$frequency>=1 & df_RFM$frequency<2]<-2
df_RFM$F_Score[df_RFM$frequency>=2 & df_RFM$frequency<5 ]<-3
df_RFM$F_Score[df_RFM$frequency>=5]<-4
#M_score
df_RFM$M_Score[df_RFM$monetary<= 307.42]<-1
df_RFM$M_Score[df_RFM$monetary>=307.42 & df_RFM$monetary<674.49]<-2
df_RFM$M_Score[df_RFM$monetary>=674.49 & df_RFM$monetary<1661.74 ]<-3
df_RFM$M_Score[df_RFM$monetary>=1661.74]<-4
#RFM_score
df_RFM<- df_RFM %>% mutate(RFM_Score = 100*R_Score + 10*F_Score+M_Score)

Segments

Let’s classify the customers based on the recency, frequency and monetary scores.

#Customer Segmentationdf_RFM$segmentRFM<-NULL
champions <- c(444)
loyal_customers <- c(334, 342, 343, 344, 433, 434, 443)
potential_loyalist <- c(332,333,341,412,413,414,431,432,441,442,421,422,423,424)
recent_customers <- c(411)
promising <- c(311, 312, 313, 331)
needing_attention <- c(212,213,214,231,232,233,241,314,321,322,323,324)
about_to_sleep <- c(211)
at_risk <- c(112,113,114,131,132,133,142,124,123,122,121,224,223,222,221)
cant_lose <- c(134,143,144,234,242,243,244)
hibernating <- c(141)
lost <- c(111)
df_RFM$segmentRFM[which(df_RFM$RFM_Score %in% champions)] = “Champions”
df_RFM$segmentRFM[which(df_RFM$RFM_Score %in% loyal_customers)] = “Loyal Customers”
df_RFM$segmentRFM[which(df_RFM$RFM_Score %in% potential_loyalist)] = “Potential Loyalist”
df_RFM$segmentRFM[which(df_RFM$RFM_Score %in% recent_customers)] = “Recent customers”
df_RFM$segmentRFM[which(df_RFM$RFM_Score %in% promising)] = “Promising”
df_RFM$segmentRFM[which(df_RFM$RFM_Score %in% needing_attention)] = “Customer Needing Attention”
df_RFM$segmentRFM[which(df_RFM$RFM_Score %in% about_to_sleep)] = “About to Sleep”
df_RFM$segmentRFM[which(df_RFM$RFM_Score %in% at_risk)] = “At Risk”
df_RFM$segmentRFM[which(df_RFM$RFM_Score %in% cant_lose)] = “Can’t Lose Them”
df_RFM$segmentRFM[which(df_RFM$RFM_Score %in% hibernating)] = “Hibernating”
df_RFM$segmentRFM[which(df_RFM$RFM_Score %in% lost)] = “Lost”

Segment Size

df_RFM %>%
count(segmentRFM) %>%
arrange(desc(n)) %>%
rename(segmentRFM = segmentRFM, Count = n)
Segment Size

Visualization

library(ggplot2)ggplot(data = df_RFM) +
aes(x = segmentRFM, fill = segmentRFM) +
geom_bar() +
labs(title = “Customer Segmentation”,
x = “Segment”,
y = “Total Customer”) +coord_flip()+
theme_minimal()
Segment

What should we do next?

We can take action for each segment.

So, we have 8 segments in our dataset.

  • 494 Champions: Reward them. Can be early adopters for new products. Will promote our brand.
  • 728 Loyal Customers: Up-sell higher value products. Ask for reviews. Engage them.
  • 635 Potential Loyalist: Offer membership / loyalty program, recommend other products.
  • 56 Promising: Create brand awareness, offer free trials
  • 706 Customers Needing Attention: Make limited time offers, Recommend based on past purchases. Reactivate them.
  • 10 At Risk: Send personalized emails to reconnect, offer renewals, provide helpful resources.
  • 270 Can’t Lose Them: Win them back via renewals or newer products, don’t lose them to competition, talk to them.
  • 43 Hibernating: Offer other relevant products and special discounts. Recreate brand value.

--

--