Cohort Analysis in SQL: Retention Rate vs Churn Rate

The algorithm and interpretation are clearly explained!

Hafez Afghan
10 min readFeb 18, 2024
Image is created by Author

Hello! Good morning! Good day! Good evening! Whenever and wherever you are!

Have you ever wondered how to determine if your customers are buying from your store continuously?

But, you don’t know how?

This article would be the answer.

“Cohort analysis is quantitative research using a measure of the concept of cohort and studying its effect on some outcome variable(s). In this sense, a cohort consists of people who share a common experience during a specified period of time.” (Glenn, 2005)

In this case, cohort analysis is a tool to measure and study customers’ behavior in buying at a store. This analysis works by calculating the number of customers who made their first transaction in a specific period and by calculating the number of customers who made transactions in a sequence of periods after their first. This helps us determine how loyal customers are in re-shopping at our store. For example, look at the table below.

Example Cohort Analysis Table

As we can see, the table consists of several rows and columns. Rows represent the n-th period, and columns represent the n-th period after the month on the respective row. The c0 column consists of the number of customers who made their first transaction in each period, the c1 column consists of the number of customers who made transactions in 1 period after their first, and so on. For example, cell c1,2 contains the number of customers who made transactions in the third period, while they made their first transaction in the first period. By comparing c0 column values and other column values, we can determine how many customers re-shop in our store and do not do such thing.

However, we need a table so we don’t have to compare values between the columns. Therefore, we should use the retention rate.

What is The Retention Rate?

Retention rate is the percentage of customers who continue to buy the products sold over a certain period. In other words, the retention rate is the percentage of the cohort analysis.

However, the churn rate is the percentage of customers who no longer buy the product sold over a certain period.

Retention and churn rates can be calculated in the following.

Retention rate (%) = ((number of customers at the end of period - number of customers acquired during period) / number of customers at the start of period) * 100%

Churn rate (%) = ((number of customers at the start of period - (number of customers at the end of period - number of customers acquired during period)) / number of customers at the start of period) * 100%

Note that churn rate (%) = 1 - retention rate (%). By those formulas, we can apply the analysis in SQL.

How to apply the analysis in SQL?

The easiest way to find the retention rate is by utilizing cohort analysis. Cohort analysis works by measuring how many customers made transactions at the start of the period (the denominator) and how many such customers still made transactions in subsequent periods (the numerator). Thus, the algorithm is by setting the cohort table and calculating the rates. On this occasion, I use the data from my previous analysis, Customer Behavior and Shopping Habits.

1. Set the cohort table

a. Create a table of the months of the first transaction
Each customer is identified in which month they made their first transaction. The table can be created by the following code.

-- First Month Transaction
SELECT customer_id,
MIN(EXTRACT(MONTH FROM order_date)) AS first_month
FROM customer_transaction
GROUP BY customer_id;
Months of the customer’s first transaction

b. Create a table of months of transactions made
Each customer is identified in which months they made their transaction. The table is generated by the following code.

-- Transaction Month
SELECT DISTINCT customer_id,
EXTRACT(MONTH FROM order_date) AS month
FROM customer_transaction;
Months of the customer’s transactions

c. Create a join table from a and b
Table from a and b step is integrated with JOIN function. Additionally, add a new column that subtracts the month in table b by the month in table a for each of the same customers, named month_diff. That column represents the month gap between the month of the first transaction and the month of the particular transaction. The column is used to determine the cohort values. The table is made using the following code which utilizes the CTE (Common Table Expression).

-- Join Table
WITH
first_month_transaction AS (
SELECT customer_id,
MIN(EXTRACT(MONTH FROM order_date)) AS first_month
FROM customer_transaction
GROUP BY customer_id
),
transaction_month AS(
SELECT DISTINCT customer_id,
EXTRACT(MONTH FROM order_date) AS month
FROM customer_transaction
)
SELECT tm.customer_id, tm.month,
fmt.first_month,
tm.month - fmt.first_month AS month_diff
FROM transaction_month AS tm
JOIN first_month_transaction AS fmt
ON tm.customer_id = fmt.customer_id
Join Table

The join_table looks like the table above. As we can see, month_diff consists of integers 0 to 11. These values indicate the month gap.

d. Create the Cohort Table
Now, we need to count each unique month_diff (0 to 11). We can utilize the SUM and CASE clauses to run the logic - return 1 if month_diff is equal to the desired value, whereas return 0 if month_diff is not equal to the desired value - for each column. For example, to count month_diff equal to 0 (the column m0), we write SUM(CASE WHEN month_diff = 0 THEN 1 ELSE 0 END) AS m0 to gain the count of month_diff = 0. Then, we use GROUP BY first_month to segregate the count by each month of the first purchase.

-- Cohort Table
WITH
first_month_transaction AS (
SELECT customer_id,
MIN(EXTRACT(MONTH FROM order_date)) AS first_month
FROM customer_transaction
GROUP BY customer_id
),
transaction_month AS(
SELECT DISTINCT customer_id,
EXTRACT(MONTH FROM order_date) AS month
FROM customer_transaction
),
join_table AS
(
SELECT tm.customer_id, tm.month,
fmt.first_month,
tm.month - fmt.first_month AS month_diff
FROM transaction_month AS tm
JOIN first_month_transaction AS fmt
ON tm.customer_id = fmt.customer_id
)
SELECT first_month,
SUM(CASE WHEN month_diff = 0 THEN 1 ELSE 0 END) AS m0,
SUM(CASE WHEN month_diff = 1 THEN 1 ELSE 0 END) AS m1,
SUM(CASE WHEN month_diff = 2 THEN 1 ELSE 0 END) AS m2,
SUM(CASE WHEN month_diff = 3 THEN 1 ELSE 0 END) AS m3,
SUM(CASE WHEN month_diff = 4 THEN 1 ELSE 0 END) AS m4,
SUM(CASE WHEN month_diff = 5 THEN 1 ELSE 0 END) AS m5,
SUM(CASE WHEN month_diff = 6 THEN 1 ELSE 0 END) AS m6,
SUM(CASE WHEN month_diff = 7 THEN 1 ELSE 0 END) AS m7,
SUM(CASE WHEN month_diff = 8 THEN 1 ELSE 0 END) AS m8,
SUM(CASE WHEN month_diff = 9 THEN 1 ELSE 0 END) AS m9,
SUM(CASE WHEN month_diff = 10 THEN 1 ELSE 0 END) AS m10,
SUM(CASE WHEN month_diff = 11 THEN 1 ELSE 0 END) AS m11
FROM join_table
GROUP BY first_month
ORDER BY first_month
Cohort Table

2. Calculate the retention rate

Recall that the retention rate is the percentage of the cohort or we can write it as follows.

Retention rate (%) = mi/m0 × 100%, for i = 0,1,…,11.

Having the formula above, we create the retention table using the following code.

-- Retention Rate Table
WITH
first_month_transaction AS (
SELECT customer_id,
MIN(EXTRACT(MONTH FROM order_date)) AS first_month
FROM customer_transaction
GROUP BY customer_id
),
transaction_month AS(
SELECT DISTINCT customer_id,
EXTRACT(MONTH FROM order_date) AS month
FROM customer_transaction
),
join_table AS
(
SELECT tm.customer_id, tm.month,
fmt.first_month,
tm.month - fmt.first_month AS month_diff
FROM transaction_month AS tm
JOIN first_month_transaction AS fmt
ON tm.customer_id = fmt.customer_id
),
cohort_table AS
(
SELECT first_month,
SUM(CASE WHEN month_diff = 0 THEN 1 ELSE 0 END) AS m0,
SUM(CASE WHEN month_diff = 1 THEN 1 ELSE 0 END) AS m1,
SUM(CASE WHEN month_diff = 2 THEN 1 ELSE 0 END) AS m2,
SUM(CASE WHEN month_diff = 3 THEN 1 ELSE 0 END) AS m3,
SUM(CASE WHEN month_diff = 4 THEN 1 ELSE 0 END) AS m4,
SUM(CASE WHEN month_diff = 5 THEN 1 ELSE 0 END) AS m5,
SUM(CASE WHEN month_diff = 6 THEN 1 ELSE 0 END) AS m6,
SUM(CASE WHEN month_diff = 7 THEN 1 ELSE 0 END) AS m7,
SUM(CASE WHEN month_diff = 8 THEN 1 ELSE 0 END) AS m8,
SUM(CASE WHEN month_diff = 9 THEN 1 ELSE 0 END) AS m9,
SUM(CASE WHEN month_diff = 10 THEN 1 ELSE 0 END) AS m10,
SUM(CASE WHEN month_diff = 11 THEN 1 ELSE 0 END) AS m11
FROM join_table
GROUP BY first_month
ORDER BY first_month
)
SELECT first_month AS cohort_month,
ROUND((m0/CAST(m0 AS decimal))*100,0) AS m0,
ROUND((m1/CAST(m0 AS decimal))*100,0) AS m1,
ROUND((m2/CAST(m0 AS decimal))*100,0) AS m2,
ROUND((m3/CAST(m0 AS decimal))*100,0) AS m3,
ROUND((m4/CAST(m0 AS decimal))*100,0) AS m4,
ROUND((m5/CAST(m0 AS decimal))*100,0) AS m5,
ROUND((m6/CAST(m0 AS decimal))*100,0) AS m6,
ROUND((m7/CAST(m0 AS decimal))*100,0) AS m7,
ROUND((m8/CAST(m0 AS decimal))*100,0) AS m8,
ROUND((m9/CAST(m0 AS decimal))*100,0) AS m9,
ROUND((m10/CAST(m0 AS decimal))*100,0) AS m10,
ROUND((m11/CAST(m0 AS decimal))*100,0) AS m11
FROM cohort_table;
Retention Rate Table
Image is created by Author using Minitab Software

The higher the retention rate, the more loyal customers are in making transactions at the GenggamData Store. As an example of an explanation of the retention rate in January 2023, it is found that it is at an average of 47% and does not show any trend. It can be concluded that around 47% of customers who first made transactions in January 2023 returned to make transactions in the following months in 2023. In general, the values lie in a range of 20 to 60, indicating that 20% - 60% of customers made transactions again after their first transaction.

3. Calculate the churn rate

The churn rate is known by churn rate (%) = 1 - retention rate (%). Thus, we can run the code as follows.

-- Churn Rate Table
WITH
first_month_transaction AS (
SELECT customer_id,
MIN(EXTRACT(MONTH FROM order_date)) AS first_month
FROM customer_transaction
GROUP BY customer_id
),
transaction_month AS(
SELECT DISTINCT customer_id,
EXTRACT(MONTH FROM order_date) AS month
FROM customer_transaction
),
join_table AS
(
SELECT tm.customer_id, tm.month,
fmt.first_month,
tm.month - fmt.first_month AS month_diff
FROM transaction_month AS tm
JOIN first_month_transaction AS fmt
ON tm.customer_id = fmt.customer_id
),
cohort_table AS
(
SELECT first_month,
SUM(CASE WHEN month_diff = 0 THEN 1 ELSE 0 END) AS m0,
SUM(CASE WHEN month_diff = 1 THEN 1 ELSE 0 END) AS m1,
SUM(CASE WHEN month_diff = 2 THEN 1 ELSE 0 END) AS m2,
SUM(CASE WHEN month_diff = 3 THEN 1 ELSE 0 END) AS m3,
SUM(CASE WHEN month_diff = 4 THEN 1 ELSE 0 END) AS m4,
SUM(CASE WHEN month_diff = 5 THEN 1 ELSE 0 END) AS m5,
SUM(CASE WHEN month_diff = 6 THEN 1 ELSE 0 END) AS m6,
SUM(CASE WHEN month_diff = 7 THEN 1 ELSE 0 END) AS m7,
SUM(CASE WHEN month_diff = 8 THEN 1 ELSE 0 END) AS m8,
SUM(CASE WHEN month_diff = 9 THEN 1 ELSE 0 END) AS m9,
SUM(CASE WHEN month_diff = 10 THEN 1 ELSE 0 END) AS m10,
SUM(CASE WHEN month_diff = 11 THEN 1 ELSE 0 END) AS m11
FROM join_table
GROUP BY first_month
ORDER BY first_month
)
SELECT first_month AS cohort_month,
100 - ROUND((m0/CAST(m0 AS decimal))*100,0) AS m0,
100 - ROUND((m1/CAST(m0 AS decimal))*100,0) AS m1,
100 - ROUND((m2/CAST(m0 AS decimal))*100,0) AS m2,
100 - ROUND((m3/CAST(m0 AS decimal))*100,0) AS m3,
100 - ROUND((m4/CAST(m0 AS decimal))*100,0) AS m4,
100 - ROUND((m5/CAST(m0 AS decimal))*100,0) AS m5,
100 - ROUND((m6/CAST(m0 AS decimal))*100,0) AS m6,
100 - ROUND((m7/CAST(m0 AS decimal))*100,0) AS m7,
100 - ROUND((m8/CAST(m0 AS decimal))*100,0) AS m8,
100 - ROUND((m9/CAST(m0 AS decimal))*100,0) AS m9,
100 - ROUND((m10/CAST(m0 AS decimal))*100,0) AS m10,
100 - ROUND((m11/CAST(m0 AS decimal))*100,0) AS m11
FROM cohort_table;
Churn Rate Table
Image is created by Author using Minitab Software

The lower the churn rate, the more loyal customers are in making transactions on the GenggamData Store. As an example of an explanation of the churn rate in January 2023, it is found that the churn rate is at an average of 53% and does not show any trend. It can be concluded that about 53% of customers who first made transactions in January 2023 did not return to make transactions in the following months in 2023. In general, the values lie in a range of 40 to 80, indicating that 40% - 80% of customers did not make transactions again after their first transaction.

In summary, cohort analysis especially retention rate is a commonly used and powerful tool in gaining an understanding of customer behavior. Many companies use this analysis to evaluate their policies, for the boost of their revenue.

That’s all of my article. If you have interest in this article please give a like and follow for more information like this. It is my pleasure for you who want to drop a comment on this article. Thank you!

Reference

[1] Glenn, N. D. (2005). Cohort analysis (Vol. 5). Sage.

--

--

Hafez Afghan

A data enthusiast who specialize in conducting project in time series, machine learning and other analysis. LinkedIn : https://www.linkedin.com/in/hafez-afghan/