Customer Segmentation with RFM Analysis & Kmeans Clustering

Anugrah Nurhamid
Analytics Vidhya
Published in
9 min readJul 20, 2020
spotonwifi.com

Hello everyone! I wanna share about customer segmentation. I’ll try comparing RFM Analysis and Kmeans clustering to build a segmentation for each customer. Before we go to the analysis, let’s see why we need to build a customer segmentation.

Customer segmentation very important to make a decision, what action needed to increase revenue, build good relationship with customer and many more we can optimize the sales with customer segmentation. In my humble opinion, why customer segmentation important?

  • Making Good Strategy

Customer segmentation would be give us a reference to take action for each customer in their segmentation, like a product differentiation, make a focus campaign for each customer and another strategy that we have.

  • Optimize Resource

Customer segmentation made companies would be focusing with priority scale. This segmentation give us for reach the “star” customer with big purchase until “rare” customer with low purchase. The companies can be focus their energy, costs, and attention on that particular segment.

  • Key Factor

The companies must be able to make segmentation with different perspectives and ways from those of competitors.

This Online Retail data set contains all the transactions occurring for a UK-based and registered, non-store online retail between 01/12/2009 and 09/12/2011.The company mainly sells unique all-occasion gift-ware. Many customers of the company are wholesalers. There’s so many incomplete data, like mising values, transaction they called “manual” and many more the lack of data. I have done to cleaning this dataset and I drop all canceled transaction.

Exploratory Data Analysis

Before we go to the analysis, let’s try to explore this dataset. I find many outliers in this dataset especially for total purchase for each transaction.

Noted : In this exploratory data analysis, I have done to calculate the RFM score for each transaction. The RFM Analysis will be explain in next section.

Calculated by total purchase for each transaction (grouping for Number of Invoice)

From the transaction for each invoice number we have conclude that transaction with 1 item with very high quantity. The higest spend on this retail with customer id 16446,and then they buying 3 times.

Countplot to explain total transaction (grouping Invoice Number) for each month
Countplot to explain total purchase customer (grouping Invoice Number) for each month

From the two graph above we conlude that frequency for highest transaction (by frequency) is November and look the pattern in the end of the year the sales in our retail is increase from September until November. In the end of the year even total invoice or total purchase for each transaction increase significant (around 25%) for each month (Sep — Nov).

Checking distribution and outliers for Recency, Frequency and Monetary

There is so many outliers in this transaction for each customer (Recency, Frequency & Monetary). I have explore for each outliers, it’s would be using in this modeling because the data is make sense for transaction on a retail. Beside of that we try to using robust scaller to scaling this dataset, because to many outliers. Before we jump to modeling, we try using RFM analysis segmentation for our retail. Let’s see how that it works.

What is RFM Analysis?

RFM stands for Recency, Frequency, and Monetary value, each corresponding to some key customer trait. These RFM metrics are important indicators of a customer’s behavior because frequency and monetary value affects a customer’s lifetime value, and recency affects retention, a measure of engagement.

clevertap.com
  • Recency: When was the last time they purchased?
  • Frequency: How often and for how long have they purchased?
  • Monetary Value/Sales: How much have they purchased

After we calculate RFM values, we grouping that values for each group.I assume that rank these customers from 1–4 using RFM values. Before we scoring with this rank we must sorted the value of recency and then give the score. In this case, 25% data will be give rank 1, above 25% will be give rank 2 and so on. I’ll try with qcut pandas to give our dataset the rank.

The result of data frame that we have to build customer segmentation with RFM Analysis

I have some source to do customer segmentation. In this case the segmentation will be segmented to seven group. The characteristic for each group, scoring by RFM_Score in the result data frame above. Calculating RFM score is very simple, we just sum for each metrics rank in RFM. The highest score segmentation is 12 and the lowest score is 3 from that score we can conclude the segmentation for each characteristic customer with RFM Analysis.

Distribution each feature RFM Analysis with 3D visualization.
The table for resume the result of seven customer segmentation

As seen above we have seven segmentation with many characteristic customer for each level. In this case I’ll try to explain the level of segmentation customer that we have from the highest level until the lowest.

  • Champions

Type customers who bought most recently, is good for us because this type customers often to visiting our online retail with the highest total purchase. So, we can maintain with many strategies, maybe with reward poin for each transaction and give priority level after that. This is your best customers and we must keep it up!

  • Can’t Loose Them

They are customers who used to visit and purchase quite often, but haven’t been visiting recently. Bring them back with relevant promotions, and make them back with daily activites to get poin or reward for their transaction.

  • Loyal

Type customers who used to visiting recently more often then “potential ” customers, but they not good enough with their purchase if we compare with “potential ”customers. So, we can give more promotion for our product, and make a bundling product.

  • Potential

Type customers who used to visiting recently less often then “loyal” customers, but they have good purchase if we compare with “loyal” customers. So, we can reach the potential customers give more promotion for our product, membership, and recommended product in our retail.

  • Promising

Type customers more often to visiting our online-retail with comparing to “needs attention ”customers with good enough for their purchase in our online retail. It’s called promising, cause we hope they would be our loyal or potential customers. So, we can reach them with another recommended product that has been purchase before and give information about benefit with membership program.

  • Needs Attention

They are customers who visit & purchase below average with comparing to another level. This type customers needs direct approach from our team. Maybe, we can start with send product catalog, daily product or most recent purchase from another customer, highly demand in our retail and many more to reach this customers.

  • Require Activation

They are customers would be called “rare”, cause they have lowest score for each metrics on RFM Analysis. This type customers maybe new customers and never comeback again to our online retail, it would be so hard to reach this segmentation. Last visit almost 1 years ago, maybe we can reach with introduction what is our core business, recently try to reach their email for what promotions in this month and all recommended product with highly demand.

Customer Segmentation with Kmeans Clustering

In this case we will comparing RFM Analysis with Kmeans clustering. How much best cluster making in modeling with Kmeans. First step, this data set would be better with scaling and centering data, robust scaler should be using in this dataset because so many outliers in our data. After that, we must finding best n_cluster for this data to build how much segmentation should be give and show best silhoutte score for each n_cluster. And in the final is the visualization our segmentation with Kmeans.

Elbow method for show how much ‘K’ should be set in our model.

Basically elbow method, show the best ‘K’ after the graph looks when forming elbow and after that value the inertia score is constant. Beside of that, we can conclude the best ‘K’ with silhoutte score, the highest silhoutte score explain the segmentation for each group is good. (Noted : The highest silhoutte score in this case is 4 with average score is 0.60)

Distribution each feature with 3D visualization for Kmeans Clustering (n_cluster is 4).
Pairplot for show distribution and corellation for each feature (Recency, Frequency and Monetary)
The table for resume the result of four customer segmentation

From the result we have four segmentation for our customers in online retail. Almost 75 % dominated with segmentation called “Disengaged”. I put they level with high potential value because almost our customer spread in this level. We will can reach their customers for many strategies for being a “loyal” customer moreover they would be a “star” customer.

Metrics that I have from some source and I try modify to adjust in our case.
  • Star

They is like a superior customer, not most visiting recently, but they have highest total purchase for they transaction. We must to keep this customers with many action. Reward is a must for this segmentation, free membership with high benefit for each month, and many more action that we can to hold and keep it!

  • Loyal

This is type customers with often to visiting our online retail with good purchase, but we have only four customer. Beside of that, we hope we can keep this customer and bring the “Disengaged” customer to this level. Do not forget this customer in this segmentation.

  • Disengaged

I’ll try to grouping this customer to Disengaged, but they have highly potential. We can reach for many promotions, share about they daily needs, more action like direct approach with their social media, email, etc. Increase this segmentation to “next level” is a must for give a more high revenue as we know our customers dominated in this segmentation (75% customers in this level).

  • Require Activation

This is segmentation with highest recency, it means average last visit this customer is almost 1 years ago. Same as explaination above, this customers must be increase to the “next level” as previously it would be so hard to reach this segmentation. But, we can try with introduction what is our core business, recently try to reach their email for what promotions in this month and all recommended product with highly demand.

Conclusion

There is comparation customer segmentation with two analysis. Who’s the best? The best choice is to take all approaches for customer segmentation. We can analysis based on our objective goals that we have in our companies. The characteristic for each transaction and customer wouldn’t same. So, we must have a great domain knowledge in our business with that we can easily to understand and set up our objective goals.

That is all from me, I hope you can take the insight of this dataset. There are still many mistakes and shortcomings in every analysis that I do. This analysis not perfect at all, and I’m not good at marketing too. Building great campaign, optimize in marketing process would be given you more great result. Maybe this analysis would be one of your reference.

For more detail about this data, the code, and more visualize you can reach my github by following this link https://github.com/Anugrahn. Feel free to ask, and lets start discuss guys!

Thank you, I hope you enjoy it guys. See you on the next stories. Have a nice day! :)

source :

--

--