Sitemap
TDS Archive

An archive of data science, data analytics, data engineering, machine learning, and artificial intelligence writing from the former Towards Data Science Medium publication.

Follow publication

Photo by William Iven on Unsplash

Fundamental Marketing Analytics : RFM Analysis

Through Recency-Frequency-Monetary model and Managerial Segmentation machine learning model with Python

4 min readFeb 12, 2020

--

To be a successful marketing professional nowadays, marketing analytics is a must-have technique. Therefore, the marketing experts with the economic and statistical background are generally regarded as a definite advantage over the others. But regardless of your backgrounds, there is still one easy analysis technique that you can make use of — the RFM (Recency-Frequency-Monetary) model.

RFM analysis is a marketing analysis tool used to determine which of your customers are the best ones. The RFM model is based on 3 quantitative factors: First it is Recency (R), which shows how recent a customer has purchased; then Frequency (F), which tells how often the customer purchase; finally, Monetary (M), it shows how much the customer spends.

In this short article, we will demonstrate to you this marketing analytical skill by exploring a retailing data set through a Python environment. (a very simple SQL statement will be employed to achieve this goal)

As a reader, there is nothing more rewarding than getting our hands dirty, so let’s go!

The Data

  • The data set used in the analysis can be downloaded here.
  • We are not using all the columns. We will mainly focus on customer_id, order_date, and sales.
  • sales: the total amount of each order.

The RFM Analysis and Segmentation

  • As usual, the first thing to do is to import the necessary libraries.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from pandasql import sqldf
from sklearn.preprocessing import scale
from scipy.cluster.hierarchy import linkage, dendrogram, cut_tree
  • let’s import the data and check the first 5 lines of it.
df = pd.read_csv('data.csv')
df.head()
check the first 5 lines of the data
  • Convert the order_date column to date format.
  • Then we need to find out the duration of the dataset.
df["order_date"] = df["order_date"].astype('datetime64[ns]')
print(df["order_date"].max(),"\n",df["order_date"].min())
  • Create a new column and name it days_since. This is to show the days of the last purchase.
df['days_since'] = (pd.Timestamp('2020-01-01') - df['order_date']).dt.days

Personally, I think this is the most interesting part of the article. We will use only a line of sql code to compute recency, frequency, and average purchase amount.

  • “GROUP BY 1" means the first column of your result set. This is totally different from our python language, as you can see there is no column given the alias 1.
  • Then we check the new data set “rfm”.
rfm = sqldf("SELECT customer_id, MIN(days_since) AS 'recency', COUNT(*) AS 'frequency', AVG(sales) AS 'amount' FROM df GROUP BY 1", globals())rfm.describe()
dataframe “rfm”
  • Visualize the data now.
rfm.recency.hist(bins=20)
Recency
customers.frequency.hist(bins=6)
Frequency
customers.amount.hist()
Amount

Observations:

  1. Most customers bought recently.
  2. Most customers came to shop only once and seldom visited more than 2 times over the year, which means, a lot of work has to be done on CS.
  3. The average order value sides to the left of the graph (Amount), which is reasonable.
  • Now we copy customer data into a new data frame.
  • And set customer_id as the index
new_data = rfm
new_data.head()
new_data = new_data.set_index(new_data.customer_id).iloc[:,1:4]
new_data.head()
  • Then we take the log-transform of the amount and plot.
new_data.amount = np.log(new_data.amount)new_data.amount.hist(bins=7)
Amount after log transform
  • Standardize the variables.
new_data = pd.DataFrame(scale(new_data), index=new_data.index, columns=new_data.columns)
new_data.head()
new_data dataframe

Running a Hierarchical Segmentation

  • Take a 10% sample of the standardized data.
new_data_sample = new_data.iloc[::10, :]
  • Perform hierarchical clustering on distance metrics.
c = linkage(new_data_sample, method='ward')
  • Set the segmentation to 3,
  • And check the frequency table.
members = pd.DataFrame(cut_tree(c, n_clusters = 3), index=new_data_sample.index, columns=['ClusterNumber'])members.ClusterNumber.value_counts(sort=False)
  • Show the 10 first customers.
members.iloc[0:10]
Customer table

Conclusion:

Isn't it easy, easy script, and simple workflow!

With this customer table (the table just above), marketing managers will be able to target each individual customer with respect to his segments, and the related marketing campaigns.

For-example, customer segment 2 are those with a higher frequency of purchase and medium spending amount. Marketers can target this customer segment with active coupon marketing or frequent incentive emails. You may also run small campaigns with diversified parameters, such as discount value, duration, discounted products, to test this group of customers.

While those in segment 1, they are those dormant with little spending. It is not easy to revive them, therefore skipping them for a moment may be a good option.

Finally, the Jupyter Notebook of the analysis can be located on Github.

Happy coding!

To view my other Traditional Chinese marketing blogs, here are the links,

Marketing Blog

Digital Marketing Blog

--

--

TDS Archive
TDS Archive

Published in TDS Archive

An archive of data science, data analytics, data engineering, machine learning, and artificial intelligence writing from the former Towards Data Science Medium publication.

Lorentz Yeung
Lorentz Yeung

Written by Lorentz Yeung

Data Analyst in Microsoft, Founder of El Arte Design and Marketing, Certified Digital Marketer, MSc in Digital Marketing, London based.

No responses yet