Customer Segmentation with Python

Alaind Fadrian
5 min readJun 30, 2022

--

Customer segmentation can help make your business strategy more effective, but what is it and how do you do it?

Customer segmentation is the process by which you divide your customers into segments up based on common characteristics. One of them is RFM Analysis. In this exercise, we will use data from e-commerce data.

RFM Method

Now let’s get started to implement this method on Python. First, we import all required library.

Import Library

Next step, we load the data set and see the top 10 rows of data. We used alias retail_df as the retail data frame.

Data E-Commerce

Data E-Commerce consists of Invoice number, Stock Code, Description, Quantity, Invoice date, Unit Price, Customer ID, and Country. The following script is to show the data type for each column and the null values for each column.

Data Frame information and shape

As customer clusters may vary by geography, I’ll restrict the data to only United Kingdom customers, which contains most of our customers' historical data. We will make a new data frame by filtering the “United Kingdom” as a Country. After filtering we found the row data are reduced from 514909 to 495478.

Data Frame for the UK

In e-commerce, we know that the company has some business processes for canceled or returned orders. We assume that the Quantity of canceled orders is represented by a negative value.

Sorting data to find out the negative value

Next step, we will remove all canceled orders from the data frame. After filtering we found the row data are reduced from 495478 to 486286.

Removing the canceled order

After get some clear data that we required, we need to check whether any null values in any column

Null value on Retail UK Data Frame

Since Customer ID is a mandatory column for customer segmentation, We need to cleanse the null values on Customer ID. Besides that, we can see the invoice date are still in DateTime format. We also need to add a new column for extracting the date only.

Drop Null Values on Customer ID and add column for Date only
Data Frame Info after cleansing

Next, we need to define how long the period of this observation is. If we decided to limit data to 1 year only. We need to see the oldest and newest Date on data frame

Minimum and Maximum value of Data Frame Retail UK
Restrict data to 1 year
Detail data Retail UK

The final data to be analyzed with RFM are 16017 Transactions (1 Transaction number can contain more than 1 Product), 3611 products, and 3863 Customers.

RFM (Recency, Frequency, Monetary) analysis is a customer segmentation technique that uses past purchase behavior to divide customers into groups. RFM helps divide customers into various categories or clusters to identify customers who are more likely to respond to promotions and also for future personalization services.

  • RECENCY (R): Days since last purchase
  • FREQUENCY (F): Total number of purchases
  • MONETARY VALUE (M): Total money this customer spent.

We will create those 3 customer attributes for each customer.

To calculate recency, we need to choose a date point from which we evaluate how many days ago was the customer’s last purchase. The last date we have is 2011–12–09 so we will use it as a reference.

Define Latest date

Now we need to make a new data frame named recency, which only contains 2 values: Customer ID and Last Purchase.

After that, we can compare the Last purchase date to the current date to get Recency values.

Recency data (in metric days)

Now we can move on to Frequency analysis, frequency helps us to know how many times a customer purchased from us. To do that we need to check how many invoices are registered by the same customer.

Frequency Data

The monetary attribute is showing how much money the customer spent over time.

To do that, first, we will create a new column total cost to have the total price per invoice.

Monetary Table

After having the all RFM data, now we can merge the data with the primary key Customer ID.

Merge Data RFM

The simplest way to create customer segments from RFM Model is to use Tertiles. We assign a score from 1 to 3 to Recency, Frequency, and Monetary. Four is the best/highest value, and one is the lowest/worst value. A final RFM score is calculated simply by combining individual RFM score numbers. The final Segmentation will be 3*3 Matrix from RFM.

Tertiles RFM

The Tertiles value is used to Define the Recency Score, Frequency Score, and Monetary Score.

A Fewer number of Recency values means a higher Recency score.
More number of Frequency and Monetary Values means a Higher Frequency_Recency Score.

RFM Score

Now that we knew our customer segments we can choose how to target or deal with each segment. We can convert each RFM Matrix score to customer segment.

For example:

Best Customers — Champions: Reward them. They can be early adopters of new products. Suggest to them “Refer a friend”.

Export the data to csv, and you can visualized on Tableau

rfm_segmentation.to_csv(‘rfm_segment.csv’)

You can visualized with Seaborn on python or Tableau

Seaborn
Tableau Visualization 1
Tableau Visualization 2

Thats all for today, Thank you, Happy Learning!

--

--

Alaind Fadrian

I’m Senior Staff of Mobile Network Operator, which enthusiast to Data Science.