RFM Analysis Using PostgreSQL

Rafiqa Irwandi
5 min readAug 17, 2020

--

RFM, stands for recency, frequency and monetary, is one method of customer segmentation, a process of dividing customers into groups based on similar characteristics. This method can help companies identify their customers who are most likely to respond the marketing campaign. Thus, company can make marketing campaigns that are more suitable for each groups.

Unlike traditional method that using demographic data, the idea of RFM analysis is to segment customers by transaction data. This makes RFM more practical than the traditional method, also RFM can analyse the entire population that available in the transaction data history. However, not all features in the data set will be used in RFM analysis, we only use these three features:

  • The last time customer purchased a product/service (Recency)
  • The number of purchases made by a customer during a certain period of time (Frequency)
  • The amount of money spent by a customer during a certain period of time (Monetary)

In this article, I will do RFM analysis using PostgreSQL, the data set is an online retail data set from here. The table contains 8 attributes as follow:

  1. InvoiceNo : Invoice number. Nominal, a 6-digit integral number that uniquely assigned to each transaction. Any number that starts with letter ‘c’ indicates a cancellation.
  2. StockCode : Product code. Nominal, a 5-digit integral number uniquely assigned to each distinct product.
  3. Description : Product name. Nominal.
  4. Quantity : The quantities of each product (item) per transaction. Numeric.
  5. InvoiceDate : Invoice Date and time. Numeric, the day and time when each transaction was generated.
  6. UnitPrice : Numeric, product price per unit in sterling.
  7. CustomerID : Nominal, a 5-digit integral number uniquely assigned to each customer.
  8. Country : Nominal, the name of the country where each customer resides.

The first thing to do is create the table and import the data to PostgreSQL. Since there are columns containing both numbers and letters, the encoding must be set to windows-1251.

And here is a glimpse of the database, since it won’t be so clear to put all the columns, I’ll just point out the columns that will need some treatments.

As mentioned, the data include cancelled transaction that can be seen from the table above, therefore any invoiceno containing ‘C’, NULLs in customerid and 0 in unitprice must be omitted.

The data set is now ready for RFM analysis with 4 tier groups for each dimensions (R, F and M). 1 indicates longest time of transaction, least transaction and lowest spend while 4 indicates most recent, most transaction and highest spend.

There are 4338 numbers of customers segmented to one of 64 groups (4x4x4). Now I’d like to create a tree map for visualization, but I’ll do it in Python. Before that, it’s crucial to define the groups into some levels because having many groups is not practical.

import pandas as pd
import matplotlib.pyplot as plt
import squarify
def rfm_level(df):
if ((df['rfm_recency'] >= 4) and (df['rfm_frequency'] >= 4)
and (df['rfm_monetary'] >= 4)):
return 'Best Customers'
elif ((df['rfm_recency'] >= 3) and (df['rfm_frequency'] >= 3)
and (df['rfm_monetary'] >= 3)):
return 'Loyal'
elif ((df['rfm_recency'] >= 3) and (df['rfm_frequency'] >= 1)
and (df['rfm_monetary'] >= 2)):
return 'Potential Loyalist'
elif ((df['rfm_recency'] >= 3) and (df['rfm_frequency'] >= 1)
and (df['rfm_monetary'] >= 1)):
return 'Promising'
elif ((df['rfm_recency'] >= 2) and (df['rfm_frequency'] >= 2)
and (df['rfm_monetary'] >= 2)):
return 'Customers Needing Attention'
elif ((df['rfm_recency'] >= 1) and (df['rfm_frequency'] >= 2)
and (df['rfm_monetary'] >= 2)):
return 'At Risk'
elif ((df['rfm_recency'] >= 1) and (df['rfm_frequency'] >= 1)
and (df['rfm_monetary'] >= 2)):
return 'Hibernating'
else:

return 'Lost'
#Create a new variable rfm_level
data['rfm_level'] = data.apply(rfm_level, axis=1)
data

The customers now divided into 8 segments, here are what each segments means:

  1. Best Customers
    Customers in this segment bought product very recent, very often and spend the most among others.
  2. Loyal
    Loyal customers are those who spent good amount of money and they usually responsive to promotions.
  3. Potential Loyalist
    Potential loyalists are recent customers that bought product more than once and spent good amount of money. We can offer membership and recommend other products to customers in this segment.
  4. Promising
    This segment is for new shoppers that haven’t spent much. We can create brand awareness and offer free trials for them.
  5. Customers Needing Attention
    This customers bought much and spent good money but have not bought very recently. We can reactive them by make limited time offers and recommend products based on their past purchases.
  6. At Risk
    This group of customers spent big money and purchased often a long time ago. We need to bring them back by sending emails to reconnect and offer renewals.
  7. Hibernating
    This customers have been inactive for a long time, they have low number of orders and low spenders. We can offer other relevant products and special discounts.
  8. Lost
    Customers with lowest performance of RFM, we can ignore them or reach them out to get back their interest.
# Calculate total customers in each segment
rfm_agg = data.groupby('rfm_level').agg({'customerid':'count'})
print(rfm_agg)

From the data tabulated above, there are 43% customers from online retail data set that place the top tier of RFM levels (“Best Customers”, “Loyal”, “Potential Loyalist”). The size of customers in each level can also be seen from the visualization below.

#RFM visualization
fig = plt.gcf()
ax = fig.add_subplot()
fig.set_size_inches(13, 7)
squarify.plot(sizes=rfm_agg['customerid'],
label=['At Risk',
'Best Customers',
'Customers Needing Attention',
'Hibernating',
'Lost',
'Loyal',
'Potential Loyalist',
'Promising'], alpha=0.6)
plt.title("RFM Segments",fontsize=20)
plt.axis('off')
plt.show()

That’s all I can share about RFM analysis, thank you for taking your time to read my article. If you have any comments, feel free to leave your feedback below or reach me on LinkedIn. Have a nice day!

References

https://www.putler.com/rfm-analysis/

https://www.silota.com/docs/recipes/sql-recency-frequency-monetary-rfm-customer-analysis.html

https://towardsdatascience.com/recency-frequency-monetary-model-with-python-and-how-sephora-uses-it-to-optimize-their-google-d6a0707c5f17

https://www.moengage.com/blog/rfm-analysis-using-predictive-segments/

--

--

Rafiqa Irwandi

Newcomer in Data Science | Posting to save my learnings