Customer Segmentation with Python
Customer segmentation can help make your business strategy more effective, but what is it and how do you do it?
Customer segmentation is the process by which you divide your customers into segments up based on common characteristics. One of them is RFM Analysis. In this exercise, we will use data from e-commerce data.
Now let’s get started to implement this method on Python. First, we import all required library.
Next step, we load the data set and see the top 10 rows of data. We used alias retail_df as the retail data frame.
Data E-Commerce consists of Invoice number, Stock Code, Description, Quantity, Invoice date, Unit Price, Customer ID, and Country. The following script is to show the data type for each column and the null values for each column.
As customer clusters may vary by geography, I’ll restrict the data to only United Kingdom customers, which contains most of our customers' historical data. We will make a new data frame by filtering the “United Kingdom” as a Country. After filtering we found the row data are reduced from 514909 to 495478.
In e-commerce, we know that the company has some business processes for canceled or returned orders. We assume that the Quantity of canceled orders is represented by a negative value.
Next step, we will remove all canceled orders from the data frame. After filtering we found the row data are reduced from 495478 to 486286.
After get some clear data that we required, we need to check whether any null values in any column
Since Customer ID is a mandatory column for customer segmentation, We need to cleanse the null values on Customer ID. Besides that, we can see the invoice date are still in DateTime format. We also need to add a new column for extracting the date only.
Next, we need to define how long the period of this observation is. If we decided to limit data to 1 year only. We need to see the oldest and newest Date on data frame
The final data to be analyzed with RFM are 16017 Transactions (1 Transaction number can contain more than 1 Product), 3611 products, and 3863 Customers.
RFM (Recency, Frequency, Monetary) analysis is a customer segmentation technique that uses past purchase behavior to divide customers into groups. RFM helps divide customers into various categories or clusters to identify customers who are more likely to respond to promotions and also for future personalization services.
- RECENCY (R): Days since last purchase
- FREQUENCY (F): Total number of purchases
- MONETARY VALUE (M): Total money this customer spent.
We will create those 3 customer attributes for each customer.
To calculate recency, we need to choose a date point from which we evaluate how many days ago was the customer’s last purchase. The last date we have is 2011–12–09 so we will use it as a reference.
Now we need to make a new data frame named recency, which only contains 2 values: Customer ID and Last Purchase.
After that, we can compare the Last purchase date to the current date to get Recency values.
Now we can move on to Frequency analysis, frequency helps us to know how many times a customer purchased from us. To do that we need to check how many invoices are registered by the same customer.
The monetary attribute is showing how much money the customer spent over time.
To do that, first, we will create a new column total cost to have the total price per invoice.
After having the all RFM data, now we can merge the data with the primary key Customer ID.
The simplest way to create customer segments from RFM Model is to use Tertiles. We assign a score from 1 to 3 to Recency, Frequency, and Monetary. Four is the best/highest value, and one is the lowest/worst value. A final RFM score is calculated simply by combining individual RFM score numbers. The final Segmentation will be 3*3 Matrix from RFM.
The Tertiles value is used to Define the Recency Score, Frequency Score, and Monetary Score.
A Fewer number of Recency values means a higher Recency score.
More number of Frequency and Monetary Values means a Higher Frequency_Recency Score.
Now that we knew our customer segments we can choose how to target or deal with each segment. We can convert each RFM Matrix score to customer segment.
For example:
Best Customers — Champions: Reward them. They can be early adopters of new products. Suggest to them “Refer a friend”.
Export the data to csv, and you can visualized on Tableau
rfm_segmentation.to_csv(‘rfm_segment.csv’)
You can visualized with Seaborn on python or Tableau
Thats all for today, Thank you, Happy Learning!