Machine Learning & Statistical Modeling in CRM (All in One❗)(EN)

15 min readJun 12, 2022

The best CRM systems provide robust analytics combined with artificial intelligence and machine learning. AI is the future of CRM. It is the basis of customer decision support systems and seeks the best opportunity for the customer. Most importantly, it retains the customer.

Lets take a ☕ we have a work to do!

AI & ML Applications in CRM;

RFM
CLTV & Predictions
Churn Analysis

1. RFM:

RFM analysis is a marketing technique used to quantitatively rank and group customers based on the recency, frequency and monetary total of their recent transactions to identify the best customers and perform targeted marketing campaigns. Lets explain with the examples;
So what we gonna do with RFM ??
Lets start explain these Words means…

R(Recency):

How recent was the customer’s last purchase? Customers who recently made a purchase will still have the product on their mind and are more likely to purchase or use the product again. Businesses often measure recency in days. But, depending on the product, they may measure it in years, weeks or even hours. NOTE!(These Recency is not equal to CLTV’s Recency, I made explain of these diffrences on CLTV’s part)

F(Frequency):

How often did this customer make a purchase in a given period? Customers who purchased once are often are more likely to purchase again. Additionally, first time customers may be good targets for follow-up advertising to convert them into more frequent customers.

M(Monetary):

Monetary value stems from how much the customer spends. A natural inclination is to put more emphasis on encouraging customers who spend the most money to continue to do so. While this can produce a better return on investment in marketing and customer service, it also runs the risk of alienating customers who have been consistent but may not spend as much with each transaction.

Why Is the Recency, Frequency, Monetary Value (RFM) Model Useful?

The recency, frequency, monetary value (RFM) model is based on three quantitative factors namely recency, frequency, and monetary value. Each customer is ranked in each of these categories, generally on a scale of 1 to 5 (the higher the number, the better the result). The higher the customer ranking, the more likely it is that they will do business again with a firm. Essentially, the RFM model corroborates the marketing adage that “80% of business comes from 20% of the customers.”

How RFM Works in Real World applications!

Computing RFM for real-world application typically requires special analytical expertise or advanced math skills. And, like any model, RFM models can vary in complexity from simple to sophisticated. RFM segmentation begins by ranking customers in each of the three categories: recency score, frequency score and monetary score. Typically the business rule is choose the RFM Score’s fate. We can explain with Most common examples like RFM Scores Scaled by 1 to 5 every columns(R,F,M).

Example of Business Rule of Hotel Recency

When you chose your Business rule and give them scaled scores you need pull these scores together in one columns with string type like 555 or 111 etc. So we have the string types of scores what is the next step ? You have a business rule to use these string types with as in the example below creating regex, You don’t have to use these Segments this way, create them yourself whatever your business rules require.

seg_map = {
    r'[1-2][1-2]': 'hibernating',
    r'[1-2][3-4]': 'at_Risk',
    r'[1-2]5': 'cant_loose',
    r'3[1-2]': 'about_to_sleep',
    r'33': 'need_attention',
    r'[3-4][4-5]': 'loyal_customers',
    r'41': 'promising',
    r'51': 'new_customers',
    r'[4-5][2-3]': 'potential_loyalists',
    r'5[4-5]': 'champions'
}

generally these medium blogs have a codes of examples, i dont give it because the examples already have and don’t be changed. Every medium blogs has a Retail II dataset. What I want to do is to learn the real world application theory of the subject thoroughly.

Finally, after creating the segments, we need to take certain actions according to our sector, for example; Except for customer-specific actions such as customer-specific campaigns and discounts for new customers, certain analyzes within the sector can be made on a segment-specific basis, the biggest examples of these are; Cohort analysis, Churn analysis and Customer life time value calculation and estimation can be given.Speaking of customer segments, let’s move on to another analysis where we can segment and Predict.

2. Customer Lifetime Value (CLTV) & Predictive values

In marketing, customer lifetime value (CLV or often CLTV), lifetime customer value (LCV), or life-time value (LTV) is a prognostication of the net profit contributed to the whole future relationship with a customer. The prediction model can have varying levels of sophistication and accuracy, ranging from a crude heuristic to the use of complex predictive analytics techniques.
Customer lifetime value can also be defined as the monetary value of a customer relationship, based on the present value of the projected future cash flows from the customer relationship.[1] Customer lifetime value is an important concept in that it encourages firms to shift their focus from quarterly profits to the long-term health of their customer relationships. Customer lifetime value is an important metric because it represents an upper limit on spending to acquire new customers.[2] For this reason it is an important element in calculating payback of advertising spent in marketing mix modeling.One of the first accounts of the term customer lifetime value is in the 1988 book Database Marketing, which includes detailed worked examples.[3] Early adopters of customer lifetime value models in the 1990s include Edge Consulting and BrandScience.

Purpose of CLTV;

The purpose of the customer lifetime value metric is to assess the financial value of each customer. Don Peppers and Martha Rogers are quoted as saying,

“some customers are more equal than others.”[4]

Customer lifetime value differs from customer profitability or CP (the difference between the revenues and the costs associated with the customer relationship during a specified period) in that CP measures the past and CLV looks forward. As such, CLV can be more useful in shaping managers’ decisions but is much more difficult to quantify. While quantifying CP is a matter of carefully reporting and summarizing the results of past activity, quantifying CLV involves forecasting future activity.

Where we use the CLTV ?

One of the major uses of CLV is customer segmentation, which starts with the understanding that not all customers are equally important. CLV-based segmentation model allows the company to predict the most profitable group of customers, understand those customers’ common characteristics, and focus more on them rather than on less profitable customers. CLV-based segmentation can be combined with a Share of Wallet (SOW) model to identify “high CLV but low SOW” customers with the assumption that the company’s profit could be maximized by investing marketing resources in those customers.

Customer Lifetime Value metrics are used mainly in relationship-focused businesses, especially those with customer contracts. Examples include banking and insurance services, telecommunications and most of the business-to-business sector. However, the CLV principles may be extended to transactions-focused categories such as consumer packaged goods by incorporating stochastic purchase models of individual or aggregate behavior.[5] In either case, retention has a decisive impact on CLV, since low retention rates result in Customer Lifetime Value barely increasing over time.

Factors that contribute to the Customer Lifetime Value metric;

The average lifespan of a customer: While it is a very difficult calculation for startups, for well-established companies with established business processes and stable customers, this value, which is calculated by dividing the average year in which customers are active by the total number of customers in that year, is very important for our model.
The average frequency of purchases: how often does the customer make a purchase? If the business offers products purchased frequently (a coffee shop, for example), the frequency is likely higher than an online monthly subscription-based service. The number of visits/purchases is the frequency.
The average purchase value: In order to understand how much a customer spends over time, it is important to find out how much they spend on average on a regular purchase. To find this value, it is found by dividing the customer’s spending in a certain period by the number of sales made during that time.
The average value of a customer: This value determines how much an active customer will spend on average. If a business sees customers every day or every week, they can use the time period that customer came in and multiply that by the The average purchase value each time.
Churn Rate: loss customers rate
Profit Margin: It is the level of income the company expects. It is a fixed value. Determined by business rule

How calculate Customer lifetime value ?

We explained what is that means of factors, now jump into the formulas;

📌CLTV = (The average lifespan of a customer/Churn Rate) * Profit Margin
📌Customer Value = The average purchase value* The average frequency of purchases
- The average purchase value= Total Price of product(coffee etc.)/ Total Transaction(how much purchases of unique customer)
- Purchase Frequency = Total Transaction / Total Number Of Customers
📌 Churn Rate = 1 — Repeat Rate
📌 Profit Margin = Total Price * 0.10
formulas may vary according to business rules, but there must be a difference between 5 purchases and 20 purchases in changes

When we put the values in formulas, we get a numerical result, this is the life of the Customer. Generally, we use this value in businesses by scaling it between 0 and 1.
Now we have a results but what is these numbers?
you can Segmentation like RFM And prediction of future KPI of Revenue & Frequency. We need to use some statistical modeling right now!
Probabilistic Model:
This class of models tries to fit a probability distribution to the data and then use that information to estimate other parameters of the CLV equation (such as the number of future transactions, future monetary value, etc.).
There are various probabilistic models out there that can be used to predict future CLV. One important thing to note here is that not all the variables in the CLV equation can be predicted using a single model. Usually, Transaction variables (Purchase freq & Churn) and Monetary variables (Avg order value) are modeled separately. Below is the list of probabilistic models available for the same.

BG/NBD(Beta Geometric/Negative Binomial Distribution):

Beta Geometric / Negative Binomial Distribution known as BG-NBD Model. Also sometimes it comes up as “Buy Till You Die”. It gives us the conditional expected number of transactions in the next period. This model can answer the following questions[4]

How many transactions will be next week?
How many transactions will be in the next 3 months?
Which customers will do the most purchases in the next 2 weeks?

This model models 2 processes by using probability for predicting the expected number of transactions

Transaction Process (Buy)
Dropout Process (Till You Die)

Transaction Process (Buy)

We use this for indicating the purchase process
During the customer is alive, the number of will have made by the customer, will be distributed poison by transaction rate parameter
During the customer is alive, they will be purchasing around their own transaction rate
Transaction rates will change for each customer and they will be distributed gamma (r,α)[4]

Dropout Process (Till You Die)

dropping purchasing
Each customer has their own dropout rate by p probability
customer dropout by p probability
Dropout rates will change for each customer and they will be distributed beta (a,b) for the mass

BG-NBD Formula

We see the formula of the BG-NBD model below.

Explaining of parameters;

E refers to the expected value
| refers to that this probability is conditional (conditional expected number of transactions)
x refers to frequency for each customer who purchased at least 2 times.
tx refers to recency for each customer. I The time from the last purchasing date to the first purchasing date (weeks).(Note!: these called Tenure in some business rule)
T refers to the time from today’s date to the last purchasing date .
r,α comes from the gamma distribution (buy process). Transaction rate of the mass.
a,b comes from the beta distribution (till you die process). The dropout rate of the mass.
Y(t) refers to the expected number of transactions for each customer.

❗Note:in these formula have some time values. Just note that this values must be in week based. So you must data manipulation

Gamma-Gamma Submodel

We use this model for predicting how much average profit we can earn for each customer.

A customer’s monetary value (the sum of a customer’s transaction amounts) will be random distributed around the average of its transaction values
An average transaction value can change in periods between the customers but it’s not changing for a customer
The average transaction value will be distributed gamma between all customers

We see the formula of the Gamma-Gamma submodel below.

E refers to the expected value
x refers to frequency for each customer
mx refers to the monetary for each customer
M refers to the expected value of transactions (expected average profit)
p,q,γ comes from the gamma distribution

Paretto/NBD Model

First proposed by Schmittlein et. al. (1987), The Pareto/NBD has been highly successful as a tool for customer base analysis. The Pareto/NBD aims to model whether or not customers are alive and, if alive, how frequently they purchase. Customers purchase according to a Poisson process while alive. Customer lifetimes are distributed according to an exponential distribution. Purchasing rates and survival propensities vary across the population according to separate gamma distributions (Schmittlein et. al. 1987). Since 1987, the Pareto/NBD has been extended by other researchers. The Pareto/NBD and other similar models can be used to solve a number of managerial problems including estimating the number of “active” customers, ranking customers based on probability of being “alive,” and predicting future transaction levels. The model has been shown to work well in a range of settings (Schmittlein and Peterson 1994). Fader, Hardie and Lee have vaidated the model’s forward-looking predictions for an online music retailer (2005a). Abe has applied the model in different purchasing settings from e-commerce to department stores to large-scale chains (2009). In addition to empirical validation, the model has been applied in several other areas. Hopmann and Thede used it to investigate churn forecasts in non-contractual settings (2005). Wübben and Wangenheim showed the model performs equivalently or better than common managerial heuristics (2008). Glady, Baesens and Croux extend the model to provide estimates of customer lifetime value in several settings (2009). In these varied settings the Pareto/NBD has formed the backbone of wide-ranging and successful customer base analysis.[3]

This section will briefly discuss the mathematical intuition behind the standard Pareto/NBD (RF) and the the proposed Recency-Only (RO) model. As discussed, the standard Pareto/NBD describes two independent processes (Schmittlein et. al. 1987). First, a customer’s lifetime τ is modeled as an exponential process with “death” rate[5]

µ:

Given that the customer is alive until time τ, purchase frequency is modeled with a Poisson distribution with purchasing rate

λ:

To capture heterogeneity across the population, it is assumed that µ is distributed according to a gamma distribution with shape parameter s and scale parameter β. Similarly heterogeneity on λ is distributed according to a gamma distribution with shape parameterr and scale parameter α.
The Pareto/NBD has a general likelihood function of

where x is the number of observed purchases in the interval (0, T], and tx is the time of the most recent purchase (Fader and Hardie 2010). Estimations of the Pareto/NBD require knowledge of x and tx (and T).[5]

3. Churn Analysis

Churn rate, when applied to a customer base, refers to the proportion of contractual customers or subscribers who leave a supplier during a given time period. It is a possible indicator of customer dissatisfaction, cheaper and/or better offers from the competition, more successful sales and/or marketing by the competition, or reasons having to do with the customer life cycle.

Customer base churn

Churn is closely related to the concept of average customer life time. For example, an annual churn rate of 25 percent implies an average customer life of four years. An annual churn rate of 33 percent implies an average customer life of three years. The churn rate can be minimized by creating barriers which discourage customers to change suppliers (contractual binding periods, use of proprietary technology, value-added services, unique business models, etc.), or through retention activities such as loyalty programs. It is possible to overstate the churn rate, as when a consumer drops the service but then restarts it within the same year. Thus, a clear distinction needs to be made between “gross churn”, the total number of absolute disconnections, and “net churn”, the overall loss of subscribers or members. The difference between the two measures is the number of new subscribers or members that have joined during the same period. Suppliers may find that if they offer a loss-leader “introductory special”, it can lead to a higher churn rate and subscriber abuse, as some subscribers will sign on, let the service lapse, then sign on again to take continuous advantage of current specials.

When talking about subscribers or customers, sometimes the expression “survival rate”(❗NOTE: Same as Repeat rate) is used to mean 1 minus the churn rate. For example, for a group of subscribers, an annual churn rate of 25 percent is the same as an annual survival rate of 75 percent. Both imply a customer lifetime of four years. I.e., a customer lifetime can be calculated as the inverse of that customer’s predicted churn rate. For a group or segment of customers, their customer life (or tenure) is the inverse of their aggregate churn rate. Gompertz distribution models of distribution of customer life times can therefore also predict a distribution of churn rates.

For companies with a fast-growing customer base (e.g., digital media companies in a BCG-matrix problem child or star phase), confusion can arise between the statistical analyses associated with what percentage of the whole customer base churns in a given year — What percentage of the base of subscribers in all of 2010 churned out? — versus a particular customer cohort’s churn rate. For example: Taking those customers who subscribed in given month, say January 2010 — How many had churned out by January 2011? Examining churn for a fast-growing aggregated customer base will understate the true churn rate compared to cohort based approach to the calculation. The cohort based approach will also allow you to calculate the survival rate and the average customer life, whereas the aggregate approach can not calculate these two metrics.

Researchers at Deloitte have argued that social network analysis is a good tool to calculate churn.[2]

In recent years, using AI and machine-learning as a means to calculate customer churn has become increasingly common for large retailers and service providers.[3]

The phrase “rotational churn” is used to describe the phenomenon where a customer churns and immediately rejoins. This is common in prepaid mobile phone services, where existing customers may take up a new subscription from their current provider in order to avail of special offers only available to new customers.

In most circumstances churn is seen as indicating that customers are dissatisfied with a service. However, in some industries whose services delivers on a promise, churn is considered as a positive signal, such as the health care services, weight loss services and online dating platforms. [4]

Some researchers have disputed the simple assumption that just dissatisfaction would lead customers to churn, and called for a more nuanced approach.[5]

Why does churn analytics matter?

Prevent revenue loss
Lower customer acquisition costs
Reduce marketing and sales costs
Improve quality of customer service
Increase opportunity for up-sell and cross-sell

Here we go!, Thank you for reading my 📧, if you see something wrong you should explaing with the dm!

In this 📧 I explained Machine Learning & Statistical Modeling in CRM with only Theory, see you to my next 📧, If you want to check out my CRM(Turkish) 📧 there is a link for you: 🔎
👋👋👋👋👋👋👋👋👋👋👋👋👋👋👋👋👋👋👋👋👋👋👋👋👋