# Integrated Approach of RFM, Clustering, CLTV & Machine Learning Algorithms for Forecasting

C**LTV** is a customer relationship management (CRM) issue with an enterprise approach to understanding and influencing customer behavior through meaningful communication to improve customer acquisition, customer retention, customer loyalty, and customer profitability. The whole idea is that, business wants to predict the average amount of $$ customers will spend on the business over the entire life of relationship.

Although statistical methods can be very powerful, but these methods make several stringent assumptions on the types of data and their distribution, and typically can only handle a limited number of variables. Regression-based methods are usually based on a ﬁxed-form equation, and assume a single best solution, which means that we can compare only a few alternative solutions manually. Further, when the models are applied to real data, the key assumptions of the methods are often violated. Here, I will show ** Machine Learning **(ML) methods by integrating the

**and customer transaction variables with the**

*CLTV***variables to forecast consumer purchases.**

*RFM*I will use two approaches here —

1st approach-

RFM (Recency, Frequency, and Monetary)marketing analysis method is used in order to segmentation of customers and2nd approach using

Customer Lifetime Value (CLTV)will train aMLalgorithm forprediction. I will use 3 months of data to calculateRFMand use it for predicting next 6 months.

** RFM** is a scoring model attempt to predict customers’ behavior in the future and implicitly linked to

**. One key limitation of**

*CLTV***models is that they are scoring models and do not explicitly provide a $ number for customer value. A simple equation to derive**

*RFM***for a customer**

*CLTV*- pt= price paid by a consumer at time t,
- ct = direct cost of servicing the customer at time t,
- i = discount rate or cost of capital for the firm,
- rt = probability of customer repeat buying or being “alive” at time t,
- AC = acquisition cost, and
- T = time horizon for estimating
.*CLTV*

# Data Mining

Let’s load and see the data.

We have all the necessary information that we need:

- Customer ID
- Unit Price
- Quantity
- Invoice Date

With all these features, we can build the equation for Monetary value= *Active Customer Count * Order Count * Average Revenue per Order*

df[‘InvoiceDate’] = pd.to_datetime(df[‘InvoiceDate’]) #convert the type of Invoice Date Field from string to datetime.df[‘InvoiceYearMonth’] = df[‘InvoiceDate’].map(lambda date: 100*date.year + date.month) #create YearMonth fielddf[‘Monetary’] = df[‘UnitPrice’] * df[‘Quantity’] #calculate Monetary for each row and create a new data frame with YearMonth — Monetary columnsmonetary = df.groupby([‘InvoiceYearMonth’])[‘Monetary’].sum().reset_index()

Before we dive into ** RFM** score, we can do some analysis to know more about customer behavior such as Monthly Active Customers/ Monthly Order Count/Average Revenue per Order /New Customer Ratio/ Monthly Customer Retention Rate etc. Interested may visit

**to know about the such analysis. So, I will start with segmentation.**

*here*# Customer Segmentation

Let’s assume some common segments-

- Low Value- Customers who are less active than others, not very frequent buyer/visitor and generates very low — zero — maybe negative revenue.
- Mid Value- Customers who are fairly frequent and generates moderate revenue.
- High Value- Customers with High Revenue, Frequency and low Inactivity; business always want to retain these customers.

We shall calculate ** RFM** Value and apply unsupervised ML to identify different clusters for each by applyting

**clustering to assign a**

*K-means***score. Number of clusters generally defined by business, we need to**

*recency***algorithm. However,**

*K-means***of**

*Elbow Method***helps us to know the optimal cluster number.**

*K-means*# Recency

To calculate ** recency**, we need to find out most recent purchase date of each customer and see how many days they are inactive for. After having no. of inactive days for each customer, we will apply

**clustering to assign customers a**

*K-means***score.**

*recency*Here it looks we have 3 clusters. Based on business requirements, we can go with less or more clusters. Let us select 4 for this example:

Likewise, we can do** Frequency** and

**and finally the Overall Score.**

*Monetary*We divide these cluster in High/Mid/Low — 0 to 2- Low / 3 to 4- Value / 5+- High Value customers

The descriptive statistics of the respective ** RFM** is show below—

We see that even though the average is 90 day recency, median is 49. Negative Monetary value at min indicating return of items. The test statistic values and below distribution & QQ plots confirm that data set do not follow a normal distribution. Therefore, the use of nonparametric

framework for making predictions is justified.

Evidences from the statistical tests imply that data characterized by their nonparametric nature behavior. This justifies the deployment of advanced ML

and deep learning algorithms for predictive modeling exercise. However, I have not exercised deep learning algorithm here.

We can start taking actions with this segmentation. The strategies are simple for all three classes:

- Improve retention of High Value customer
- Improve retention and increase frequency of Mid Value customer
- Increase Frequency of Low Value customer

# Customer Lifetime Value (CLTV)

** CLTV** is quite simple here. First we will select a time window anything from 3, 6, 12, or 24 months. We can have compute the

**for each customer in that specific time window with an equation:**

*CLTV***. This equation based on historical data and gives us the historical value. If we see some customers having very high negative lifetime value historically then probably we are too late to take an action. Let’s use ML**

*Total Gross Revenue -Total Cost***algorithm to predict.**

# CLTV Prediction

So, let’s follow the steps-

- Define an appropriate time frame for
calculation*CLTV* - Identify the features we are going to use to predict future and create them
- Calculate
for training the ML model*CLTV* - Build and run the ML model
- Check if the model is useful

We already have obtained the ** RFM** scores for each customer ID. To implement it correctly, let’s split our dataset. I will take 3 months of data, calculate

**and use it for predicting next 6 months.**

*RFM*#create 3m and 6m dataframes

m3 = DF_uk[(DF_uk.InvoiceDate < date(2011,6,1)) & (DF_uk.InvoiceDate >= date(2011,3,1))].reset_index(drop=True)

m6 = DF_uk[(DF_uk.InvoiceDate >= date(2011,6,1)) & (DF_uk.InvoiceDate < date(2011,12,1))].reset_index(drop=True)

Now, the similar process of clustering, computing ** RFM** and overall scoring of each data frame and finally merging the 3 months and 6 months data frames to see correlations between

**and the feature set we have.**

*CLTV*Here, by applying K-means clustering, we can identify the existing ** CLTV** groups and build segments on top of it. Considering business part of this analysis, we need to treat customers differently based on their predicted

**. For this example, we will apply clustering and have 3 segments (number of segments really depends on your business dynamics and goals):**

*CLTV*- Low

*CLTV*- Mid

*CLTV*- High

*CLTV*We are going to apply K-means clustering to decide segments and observe their characteristics:

2 is the best with average 8.2k ** CLTV** whereas 0 is the worst with 396. There are few more step before training the ML model:

- Need to do some feature engineering. We should convert categorical columns to numerical columns.
- We will check the correlation of features against our label,
clusters.*CLTV* - We will split our feature set and label (
) as X and y. We use X to predict y.*CLTV* - Will create Training and Test dataset. Training set will be used for building the ML model.

We will apply our model to Test set to see its real performance.

from sklearn.model_selection import KFold, cross_val_score, train_test_split

#convert categorical columns to numerical

DF_class = pd.get_dummies(DF_cluster)#calculate and show correlations

corr_matrix = DF_class.corr()

corr_matrix[‘LTVCluster’].sort_values(ascending=False)#create X and y, X will be feature set and y is the label — LTV

X = DF_class.drop([‘LTVCluster’,’m6_Monetary’],axis=1)

y = DF_class[‘LTVCluster’]#split training and test sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.05, random_state=42)

We see that 3 months Revenue, Frequency and ** RFM **scores will be helpful for our ML models. With the training and test sets we can build our model.

# Machine Learning Algorithm comparison

Predictive model based on ML algorithms are kind of black box models which can be opened by using Sensitivity & Specificity analysis.

FP = confusion_matrix.sum(axis=0) — np.diag(confusion_matrix)

FN = confusion_matrix.sum(axis=1) — np.diag(confusion_matrix)

TP = np.diag(confusion_matrix)

TN = confusion_matrix.values.sum() — (FP + FN + TP)

TPR = TP/(TP+FN) # Sensitivity, hit rate, recall, or true positive rate

TNR = TN/(TN+FP) # Specificity or true negative rate

PPV = TP/(TP+FP) # Precision or positive predictive value

NPV = TN/(TN+FN) # Negative predictive value

FPR = FP/(FP+TN) # Fall out or false positive rate

FNR = FN/(TP+FN) # False negative rate

FDR = FP/(TP+FP)# False discovery rate

ACC = (TP+TN)/(TP+FP+FN+TN) # Overall accuracy

# XGB model

We have a multi classification model with 3 groups (clusters). Accuracy shows 78% on the test set. Our True positives are on the diagonal axis and are the largest numbers here. The False Negatives are the sum of the other values along the rows. The False Positives are the sum of the other values down the columns. Precision and recall are acceptable for 0. For cluster 0 which is Low ** CLTV**, if model identifies

*customer belongs to cluster 0, 85% chance that it will be correct(precision).The classifier successfully identifies 90% of actual cluster 0 customers (recall). We need to improve the model for other clusters. The classifier barely detect 43% of Mid*

**customers.**

*CLTV*Let’s experiment changing the depth and OneVsRestClassifier —

Some improvement can be seen here. However, there are still rooms for improvement e.g.

- Adding more features and improve feature engineering
- Try ANN /DNN

# ROCAUC

By default with multi-class ROCAUC visualizations, a curve for each class is plotted, in addition to the micro- and macro-average curves for each class. This enables the user to inspect the tradeoff between sensitivity and specificity on a per-class basis.

# Class Prediction Error

Understanding prediction errors and determining how to ﬁx them is critical to building effective predictive systems. If you are interested, I will recommend to read this *article*** **to know more about prediction errors.

# Summary

In ML models parameters are tuned/estimated based on the data and the parameters control how the algorithms learn from the data (without making any assumptions about the data, and downstream of the data generation). XGB is a tree based algorithm and hence can be considered nonparametric. The tree depth used here is a parameter of the algorithm, but it is not inherently derived from the data, but rather an input parameter.

**I can be reached ***here***.**