Customer Lifetime Value Prediction: Dataset Exploration (Part 1)

Gain hands-on experience to understand primary revenue trends and seasonal patterns. It helps you improve your strategic planning skills.

Asish Biswas
AnalyticSoul
3 min readJun 18, 2024

--

Welcome to the second lesson. In this lesson, we will focus on exploring monthly revenue trends using the CD Now dataset. By analyzing monthly revenue, we can identify patterns and seasonality, which are crucial for understanding business performance and planning future strategies.

Welcome back! In this lesson, we will work on an online Compact Disk (CD) selling dataset that contains customer transaction records. By analyzing monthly revenue, we can identify patterns and seasonality, which are crucial for understanding business performance and planning future strategies. Later, we’ll learn the techniques to calculate Customer Lifetime Value (CLV) using machine learning.

Feature details

Here is some basic information about the dataset:

Let’s dive in. First import the dataset, check the dimension and inspect the data types of each column. As we inspect the dataset, we see there are 69,659 records of customer transactions in the dataset. We check further details of the dataset using the info() and describe() method of the dataframe.

import pandas as pd

df_cdnow_tr = pd.read_csv('data/cdnow.csv', index_col=0)

print('The dimention of the dataset')
print(df_cdnow_tr.shape)

print('\nDetail description of the dataset')
print(df_cdnow_tr.info())

print('\nNumeric details')
print(df_cdnow_tr.describe())

Data processing

We want to observe monthly transactions, revenues, and other customer activities. To do that, we have to process and shape the dataset.

First, we convert the type of the date column from string to datetime. Then we create a new column year_month from the date column. This will be helpful in preparing the monthly transaction reports.

# fix the data type and parse datetime
df_cdnow_tr['date'] = pd.to_datetime(df_cdnow_tr['date'])

# calculate transaction month-year
df_cdnow_tr['year_month'] = df_cdnow_tr['date'].dt.to_period('M')
df_cdnow_tr['revenue'] = df_cdnow_tr['price'] * df_cdnow_tr['quantity']

df_cdnow_tr.head()

Aggregate monthly transactions

Let’s aggregate transactions on a monthly basis and count the number of invoices. Then we rename the aggregated columns to increase readability.

# aggregate monthly sale and count monthly unique invoices
df_monthly_revenue = df_cdnow_tr.groupby(['year_month']).agg({'price': 'sum', 'date': 'count'})
df_monthly_revenue.rename(columns={'price': 'sale', 'date': 'invoice_count'}, inplace=True)

df_monthly_revenue.reset_index(inplace=True)
print(df_monthly_revenue.head())

Monthly revenue and transactions

Let’s observe the monthly total sales and the monthly total number of unique transactions side-by-side.

# plt.figure(figsize=(12, 8))
fig, ax = plt.subplots(1, 2, figsize=(16, 6))
# monthly total sales
s1 = sns.barplot(data=df_monthly_revenue, x='year_month', y='sale', ax=ax[0], color='deepskyblue')
# monthly total number of unique transactions
s2 = sns.barplot(data=df_monthly_revenue, x='year_month', y='invoice_count', ax=ax[1], color='seagreen')

s1.set_xticklabels(s1.get_xticklabels(), rotation=90)
s2.set_xticklabels(s2.get_xticklabels(), rotation=90)
s1.set_ylabel('Total sales')
s2.set_ylabel('Number of invoices')
plt.show()

In both plots, we see a similar pattern. In the beginning, there are a high number of transactions and revenue, but that momentum fades away over time.

That’s all for now! Practice along by following the accompanying code.

What’s next?

Join the community

Join our vibrant learning community on Discord! You’ll find a supportive space to ask questions, share insights, and collaborate with fellow learners. Dive in, collaborate, and let’s grow together! We can’t wait to see you there!

Thanks for reading! If you like this tutorial series make sure to clap (up to 50!) and let’s connect on LinkedIn and follow me on Medium to stay updated with my new articles.

Support me at no extra cost by joining Medium via this referral link.

--

--