Retention Series

User cohort retention

The critical concept to learn about for any start-up

9 min readMar 5, 2024

User retention is one of the most important concepts that every startup team should understand well. To unpack it we need to switch from calendar-based thinking to cohort-based one.

What’s a cohort?

A cohort is a group of people that have one common-in-time shared characteristic. In the SaaS space shared characteristics could be:

a period when users signed up for a product or
a period when users completed an action in a product (started paying for a product, started using a product feature, etc)

Of course, users in the cohort could share some other characteristics as well: source, sex, age, geo, etc. However, the main cohort condition is a common-in-time action.

Why do we need to define a cohort?

As a rule, there are 2 reasons to define a cohort:

we are interested in assessing the effect over time
we want to control the effect by assessing it for users with the same lifetime

So, cohort is a special case of segment.

What’s the difference between cohort and segment?

Cohort: a group of users with a shared experience during a specific time. We track their behavior over time to understand them and predict future actions.

Segment: a group of users with shared characteristics, regardless of time. We compare them to identify patterns and understand their behavior.

In other words:

Each cohort is a segment, but not every segment is a cohort.

Now we are ready to return to our initial topic — user cohort retention.

Retention — a great example of using cohort

The idea of calculating user cohort retention is very straightforward:

define a cohort of users
check how many of them returned on period 1, period 2, …, period X

A period could be a day, week, month, or even year.

Let’s start with the following user cohort data:

As we can see, there are 3 series in the table above: time-series of cohort — t, actual users count series — N, and survival rate series — S.

There are a few general observations that we can make from the table above:

The cohort's number of users (N) tends to be zero.
The biggest drop-off happens in the first period.
The retention (S) trend is not linear.

As it usually happens if you want to learn some process the best way to do it is to model it.

Before we start digging deeper I would like to mention that we can assess user cohort retention in 2 different ways:

by assessing how many users are using the product at period t compared to the initial period t = 0 (let’s call this approach Survival curve)
by assessing how many users are using the product at period t compared to the previous period t = t — 1 (let’s call this approach Retention curve)

Survival curve vs Retention curve. See charts near to S or R.

Learning from survival curves in a long tail can be a challenging task. Figures in the long tail become small and figuring out what’s processes happening there is very hard.

For example, have you noticed that in the table above starting from period t = 7, the survival curve drop-off rate stabilized and is equal to 1 — 0.91 = 0.09?

Retention Modeling

To learn about how user cohorts can behave I generated 3 common scenarios:

Good cohort (first-period drop-off < 30%, long-tail R = 0.99)
Bad start cohort (first-period drop-off > 60%, long-tail R = 0.95)
Bad long-tail cohort (first-period drop-off in a range [30%, 60%], long-tail R = 0.91)

These 3 cohorts might look like this:

Based on the chart above, we can assume that to model user cohort retention we need to model 3 separate problems:

What’s the first-period drop-off?
On what level and when the cohort drop-off rate has stabilized?
How to model the middle cohort line? (this will be addressed in the next post)

The factors that impact the user cohort will depend on cohort type: sign-up cohort, payers cohort, or specific product feature usage cohort. The remaining part of the post will be devoted to sign-up cohorts.

What’s the first-period drop-off?

As a rule, key factors that impact the first-period drop-offs are:

User Acquisition (UA) source/medium/campaign
Sign-up form
Activation process

Let’s talk a bit about each factor.

UA source/medium/campaign

We all know that some UA sources can work for one product and don’t work for another product. The main reason for this is different user intentions. User intention is often considered as the quality of traffic. However, even users with high intentions could fail to get into a product.

The trickiest thing about modeling this is to distinguish the main reason for the high first-period drop-off:

whether it’s because of traffic quality
whether it’s because of the product quality

The best recommendation that I can give you here is to find a proxy that will help you figure out how the qualified users behave in the product in the first session.

If the percentage of qualified users is decreasing then it’s an issue with traffic quality provided that other factors have not changed.

Sign-up form

The intriguing aspect of acquiring new users relates to how much energy these users need to invest to get into the product. As a rule, any growth specialist will tell you that you need to simplify the sign-up form. Is it a good advice?

If a user has high intentions to solve his problem, then he will fill in any reasonable form. If a user is not motivated then the sign-up form can work as gatekeeper.

Let me share with you one story.

A B2B company had a sign-up form with 6 fields. To increase the conversion of the sign-up form, it was decided to simplify it and, as a result, only 2 fields remained in that sign-up form. Result?

The conversion rate increased up to 30% but at the same time user cohort retention dropped considerably. Reason?

Non-motivated users just moved one step further within the funnel. That’s the reason why shortening Time-To-Value does not always work.

What does it mean for you?

You need to have realistic expectations about your UA traffic sources and sign-up forms.

You definitely can improve UA targeting and play with sign-up form but it does not mean that you can shrink first-period drop-off to zero. There will always be some natural percentage of users who are not into your product.

Activation process

In the previous paragraphs, I stated that:

the quality of the cohort is predetermined
users with high intentions could fail to get into a product

Do these points contradict each other?

Actually — not. It means that while some users do not have any chance to get into the product other users are really struggling to get into it.

The activation process for the latter is a great tool that could help to keep these users. Figuring out which key events these users should go through is the main way to achieve a successful user activation and decrease first-period drop-off.

Special memo for paid User Acquisition

Also, I would like to mention that the first-period drop-off has a huge impact on paid UA.

Let’s assume that we acquired 3 cohorts with different first-period drop-off rates (but these cohorts behaved the same after the first period):

Acc_good has first period drop-off = 1–0.75 = 0.25
Acc_avg has first period drop-off = 1–0.50 = 0.50
Acc_bad has first period drop-off = 1–0.35 = 0.65

Acquiring each cohort cost us $1,500.

Different first-period drop-off rate analysis.

Even if the difference in cohorts in absolute figures is shrinking from period to period the accumulated effect will be very different.

Let’s assume that at each period 10% of users will pay $5.

Then accumulated revenue per cohort could look like this (right table):

Accumulated revenue per cohort with different first-period drop-offs.

The first important thing that could jump out to your eyes is that Accumulated revenue per cohort in 12 periods looks quite different:

Acc_good cohort has accumulated revenue = $3,234
Acc_avg cohort has accumulated revenue = $2,323
Acc_bad cohort has accumulated revenue = $1,776

What’s probably even more important is how quickly we can reinvest revenue from the cohort into the next round of User Acquisition.

Depending on the length of the payback period these user cohorts have very different growth leverages:

Acc_good cohort has the shortest payback period = 4 periods. This allows you to acquire 1.16 new cohorts by the end of the 12th period.
Acc_avg cohort has a payback period = 6 periods. This allows you to acquire only 55% of a new cohort by the end of the 12th period.
Acc_bad cohort has a payback period = 9 periods. This allows you to acquire just 18% of a new cohort by the end of the 12th period.

As you can see even one cohort with a short payback period makes a big difference.

If all new cohorts have such a short payback period the total difference in a few years might be huge. That’s why optimizing for first-period drop-off is so important to grow fast, but not only that.

Now it’s time to talk about the long-tail drop-off rate.

On what level and when the cohort drop-off rate has stabilized?

As I mentioned earlier cohort can be divided into 3 stages. The last one is often the longest. It’s called a long tail stage.

The long-tail stage starts when the cohort drop-off rate becomes stable.

Depending on user cohort quality, the time to stabilize the drop-off rate might vary. From my past experience in e-commerce and SaaS, eventually, every cohort has a stabilized drop-off rate after 5–6 periods.

To model a long-tail drop-off rate, I generated 3 common cases:

long tail drop-off rate is very close to 0 (f.e. in S_good user cohort long tail drop-off rate = 0.01)
long tail drop-off rate is very high (f.e. in S_bad_long_tail user cohort long tail drop-off rate = 0.09)
long tail drop-off rate is high (f.e. in S_bad_start user cohort long tail drop-off rate = 0.05)

Even if you might think that long-tail drop-off rates like 0.05, 0.09, or 0.01 are not so important (compared to large first-period drop-offs), let me prove that your impression it’s very misleading.

I modeled 3 cohorts that reached the long-tail stage with 100 users but each of them has a different long-tail drop-off rate:

Drop-off rates comparison: 0.01 vs 0.05 vs 0.09

From the chart above we can learn that in just 12 periods these cohorts have very different numbers of retained users:

User cohort with a long-tail drop-off = 0.01 still has 89 users (89%)
User cohort with a long-tail drop-off = 0.05 still has 54 users (54%)
User cohort with a long-tail drop-off = 0.09 just has 32 users (32%)

What’s probably even more important is that if you want to play an LTV game lifetime will play a crucial role:

cohort with a long-tail drop-off = 0.01 has lifetime 100 periods
cohort with a long-tail drop-off = 0.05 has lifetime 45 periods
cohort with a long-tail drop-off = 0.09 has lifetime 30 periods

The tricky moment here is not to overestimate the long tail. There are 2 reasons for this:

for many young startups focusing on the payback period is the only reliable way to work with paid UA
if you want to play in an LTV game you need to apply the Discount Cash Flow (DCF) approach to adjust revenue in future periods

SUMMARY:

User cohort is the most reliable way to assess what’s going on with acquired users in the product. It’s advisable to split the user cohort into 3 drop-off stages:

first-period drop-off
middle cohort line
long-tail drop-off

Each stage has its factors that should be addressed differently.

In the next few posts, we talk about some mistakes that analysts make while building or analyzing user cohorts.

Retention Series

User cohort retention

The critical concept to learn about for any start-up

Written by Paul Levchuk