Customer Lifetime Value Prediction: Dataset Exploration (Part 1)
Gain hands-on experience to understand primary revenue trends and seasonal patterns. It helps you improve your strategic planning skills.
Welcome to the second lesson. In this lesson, we will focus on exploring monthly revenue trends using the CD Now dataset. By analyzing monthly revenue, we can identify patterns and seasonality, which are crucial for understanding business performance and planning future strategies.
Welcome back! In this lesson, we will work on an online Compact Disk (CD) selling dataset that contains customer transaction records. By analyzing monthly revenue, we can identify patterns and seasonality, which are crucial for understanding business performance and planning future strategies. Later, we’ll learn the techniques to calculate Customer Lifetime Value (CLV) using machine learning.
Feature details
Here is some basic information about the dataset:
Let’s dive in. First import the dataset, check the dimension and inspect the data types of each column. As we inspect the dataset, we see there are 69,659 records of customer transactions in the dataset. We check further details of the dataset using the info()
and describe()
method of the dataframe.
import pandas as pd
df_cdnow_tr = pd.read_csv('data/cdnow.csv', index_col=0)
print('The dimention of the dataset')
print(df_cdnow_tr.shape)
print('\nDetail description of the dataset')
print(df_cdnow_tr.info())
print('\nNumeric details')
print(df_cdnow_tr.describe())
Data processing
We want to observe monthly transactions, revenues, and other customer activities. To do that, we have to process and shape the dataset.
First, we convert the type of the date
column from string to datetime. Then we create a new column year_month
from the date
column. This will be helpful in preparing the monthly transaction reports.
# fix the data type and parse datetime
df_cdnow_tr['date'] = pd.to_datetime(df_cdnow_tr['date'])
# calculate transaction month-year
df_cdnow_tr['year_month'] = df_cdnow_tr['date'].dt.to_period('M')
df_cdnow_tr['revenue'] = df_cdnow_tr['price'] * df_cdnow_tr['quantity']
df_cdnow_tr.head()
Aggregate monthly transactions
Let’s aggregate transactions on a monthly basis and count the number of invoices. Then we rename the aggregated columns to increase readability.
# aggregate monthly sale and count monthly unique invoices
df_monthly_revenue = df_cdnow_tr.groupby(['year_month']).agg({'price': 'sum', 'date': 'count'})
df_monthly_revenue.rename(columns={'price': 'sale', 'date': 'invoice_count'}, inplace=True)
df_monthly_revenue.reset_index(inplace=True)
print(df_monthly_revenue.head())
Monthly revenue and transactions
Let’s observe the monthly total sales and the monthly total number of unique transactions side-by-side.
# plt.figure(figsize=(12, 8))
fig, ax = plt.subplots(1, 2, figsize=(16, 6))
# monthly total sales
s1 = sns.barplot(data=df_monthly_revenue, x='year_month', y='sale', ax=ax[0], color='deepskyblue')
# monthly total number of unique transactions
s2 = sns.barplot(data=df_monthly_revenue, x='year_month', y='invoice_count', ax=ax[1], color='seagreen')
s1.set_xticklabels(s1.get_xticklabels(), rotation=90)
s2.set_xticklabels(s2.get_xticklabels(), rotation=90)
s1.set_ylabel('Total sales')
s2.set_ylabel('Number of invoices')
plt.show()
In both plots, we see a similar pattern. In the beginning, there are a high number of transactions and revenue, but that momentum fades away over time.
That’s all for now! Practice along by following the accompanying code.
What’s next?
- Lesson 5.3: Data Exploration — Customer Revenue Analysis: This lesson will focus on individual customer revenue, segmenting customers based on their purchase behavior, and understanding the revenue contribution of different customer segments.
- Lesson 5.4: Predict Quarterly Sales with RFM Model: We’ll introduce the RFM (Recency, Frequency, Monetary) model to predict future sales and revenue. This will include building and validating a predictive model to forecast quarterly sales.
Join the community
Join our vibrant learning community on Discord! You’ll find a supportive space to ask questions, share insights, and collaborate with fellow learners. Dive in, collaborate, and let’s grow together! We can’t wait to see you there!
Thanks for reading! If you like this tutorial series make sure to clap (up to 50!) and let’s connect on LinkedIn and follow me on Medium to stay updated with my new articles.
Support me at no extra cost by joining Medium via this referral link.