CRM Analytics: Customer Segmentation with RFM

zeynep beyza ayman
4 min readOct 26, 2022

--

CRM Analytics is an analysis that includes topics such as analyzing customer data, getting to know customers better, segmenting customers and making decisions based on these segments. The aim is to recognize customer behaviours based on historical data and to determine a strategy according to these behaviours.

As the first step to CRM analytics, in this article, I will explain RFM with a python project.

Description of Dataset:

Online Retail II data set contains all the transactions occurring for a UK-based and registered, non-store online retail between 01/12/2009 and 09/12/2011. The company mainly sells unique all-occasion gift ware. Many customers of the company are wholesalers. you can access more information about the data set from here

Attribute Information:

  • Invoice: Invoice number. Nominal. A 6-digit integral number uniquely assigned to each transaction. If this code starts with the letter ‘C’, it indicates a cancellation.
  • StockCode: Product (item) code. Nominal. A 5-digit integral number uniquely assigned to each distinct product.
  • Description: Product (item) name. Nominal.
  • Quantity: The quantities of each product (item) per transaction. Numeric.
  • InvoiceDate: Invoice date and time. Numeric. The day and time when a transaction was generated.
  • UnitPrice: Unit price. Numeric. Product price per unit in sterling (£).
  • CustomerID: Customer number. Nominal. A 5-digit integral number uniquely assigned to each customer.
  • Country: Country name. Nominal. The name of the country where a customer resides.

RFM : Recency-Frequency-Monetary:

RFM analysis is a technique used for customer segmentation. It enables customers to be divided into groups based on their purchasing habits and to develop strategies specific to these groups. It provides the opportunity to take data-based actions on many topics for CRM studies.

  • Recency: It stands for the freshness of the customer. The value of the recency can be found by subtracting the last interaction date of the customer from the date of analysis. If the dataset is former, then the analysis date can be determined close to the date of data collection. The more recent the purchase, the more responsive the customer is to promotions.
  • Frequency: It shows the total expenditure made by the customer as a result of the purchases. The more frequently the customer buys, the more engaged and satisfied they are.
  • Monetary: It shows the total expenditure made by the customer as a result of the purchases. monetary value differentiates heavy spenders from low-value purchasers
calculation of RFM metrics

When the RFM metrics are evaluated within themselves, the lowest Recency value, the highest Frequency and Monetary values represent the best customer. However, there is a comparison problem among themselves. Therefore, while performing RFM Analysis, it will be beneficial to evaluate the dataset in itself and to bring these values into a more standard interpretation by placing them in a rule. While making this scoring, new values between 1 and 5 can be assigned to the Rececy, Frequency and Monetary values. The RF score is formed by combining these values.

calculation of RFM score

Customer Segments:

In the next step of the RFM Analytics, we are going to define segments of the customers according to their RF scores by using regex.

segmentation according to RF scores
segments
  • hibernating: sleeping customers, who have both shopped less and have not shopped for a long time.
  • at_risk: customers who shop relatively often but have not shopped for a long time.
  • can’t_loose: customers who haven’t shopped for a long time but used to very often before and shouldn’t be lost.
  • about_to_sleep: customers who have not shopped frequently and have had a period of time since their last purchase. moving towards sleep.
  • need_attention: customer class in the middle of the RF graph, and if it is not emphasized, it moves towards the risky group.
  • loyal_customers: customers who shop frequently and it has been a short time since their last purchase.
  • promising: customers who do not shop frequently, and who have been shopping for a short time, 1 level above the new customers.
  • new_customer: customers who have not shopped frequently and have been shopping for a short time, are considered as new customers.
  • potential_loyalists: customers who shop occasionally and have not spent a long time since their last purchase.
  • champions: customers who shop frequently and have made their last purchases in a very short time.
statistics of segments
percentage of segments

When statistics and percentages of the segments are examined together, we see that the champions, cant_loose and loyal_customer segments are the segments with the highest profit, even though we did not take their monetary values into account when making the segmentation.

Customers in the champions and loyal_customers segments are both frequent and recent shoppers. But if they stop shopping they will move towards cant_loose. Before that happen, studies can be carried out to make customers in these segments feel special and increase their loyalty to the company.

Besides in the light of percentages and the statistics of segments we see that most customers are in the hibernating segment and both cant_loose and at_risk segments have not been trading for a long time despite frequent shopping. Reaching new customers is more costly than keeping them. Therefore, it is necessary to carry out studies that will turn these 3 segments into shopping again.

We have reached the end! Got questions, got stuck or just want to say hi? kindly use the comment box.🦖

--

--