Retention Series

How to color and read user cohorts properly

Cohort data is varying but that does not mean you need to overreact to these fluctuations

Paul Levchuk
6 min readMar 21, 2024

In one of the previous posts, I showed you a user cohorts chart with applied conditional formatting to help you understand how quickly user cohorts can diminish.

Today I will talk about how to draw and read user cohorts.

In general, user cohorts can be represented by:

  • absolute figures of users active in a specific period
  • percentage of users active in a specific period

Absolute figures with conditional formatting

Let’s generate user cohorts with absolute figures and add conditional formatting. Our user cohort chart could look like this:

User cohorts: absolute figures.

The initial idea of using conditional formatting to understand better user retention patterns sounds reasonable. But as it usually happens, the devil is in the details.

What are our first impressions from the chart above?

  • early user cohorts were worse
  • over time user cohorts became better

Is it true?

Frankly speaking, we can’t be sure, as different cohorts have different scales:

  • user cohort 3/2 has 1014 signed-up users
  • user cohort 3/11 has 1467 signed up users (it’s 40% larger)

To check our gut feeling we need to switch from absolute figures to percentages.

Percentage figures with conditional formatting

Let’s calculate user cohorts with percentage figures and add again conditional formatting. Our user cohort percentage chart could look like this:

User cohorts: percentage figures.

What are our first impressions now? Do we see that user cohorts become better over time?

Actually from the chart above it’s not obvious.

Let’s take a few cohorts mentioned earlier (3/2 and 3/11) and visualize them.

User cohort 3/2

User cohort 3/2: absolute figures.
User cohort 3/2: percentage figures.

As we can see from the charts above user cohort 3/2 which looked the worst in absolute figures does not look the worst in percentages.

User cohort 3/11

User cohort 3/11: absolute figures.
User cohort 3/11: percentage figures.

Again, from the charts above user cohort 3/11 which looked the best in absolute figures does not look the best in percentages.

What should we learn from this?

It’s not recommended to use conditional formating to color user cohorts in absolute figures as it could be misleading.

OK, probably it’s still a good idea to use conditional formatting to color user cohorts in percentages.

Let’s return to the user cohorts chart with percentage figures and conditional formatting.

User cohorts: percentage figures.

Can we spot any issues with user cohort retention in the first period t = 1?

I chose period t = 1 as in one of my previous posts I have already shown that period t = 1 is one of the most critical in terms of future user cohort retention.

Returning to my question I can say that using the chart above as is I can’t find any issues with user retention.

Actually, I can’t find any issues because of the default conditional formatting applied to the chart:

  • too many colors around
  • not enough color diversity for the period of interest t = 1

Can we do better?

Absolute + Percentage figures with conditional formatting using statistics

Below are two recommendations that I would give you when you are working with user retention cohorts:

  1. take into account the user cohort context
  2. stop using conditional formatting based on values, use statistics instead

Let me unpack these recommendations.

User cohort retention is a composite metric: [# users at period t] / [# users at period t = 0].

If we want to spot user retention issues (decrease in user retention), we need to watch carefully for 3 cases:

  • [# users at period t] has decreased
  • [# users at period t = 0] has increased
  • [# users at period t] has decreased and [# users at period t = 0] has increased at the same time

Marketing constantly tries to scale user acquisition. Sometimes it works like a charm, sometimes not.

For us, it means that:

  1. [# users at period t = 0] will vary permanently and we need to figure out when this variation is concerning.
  2. scaling as a rule related to expanding ad targeting. This, in turn, increases the number of new users with low intent.

So, we need to know whether the marketing team is in the User Acquisition (UA) scaling mode right now and how it impacts user retention simultaneously.

Below is an alternative user cohort chart with conditional formatting using statistics (I decided to keep old and new charts together to simplify comparison).

Focused conditional formatting using statistics.

How is the bottom chart colored?

  1. [# users at period t = 0] and [% user retention at period t = 1] use the same color scheme
  2. we use the following statistics: calculate AVERAGE() and STDEV.S() for each metric for the last 7 days (excluding the day of analysis) and color cells that are larger/smaller than AVERAGE() ± 2*STDEV.S()
  3. If the cell value is larger than the corresponding threshold then we will color the cell in green, if the cell value is lower than the corresponding threshold then we will color the cell in red

What’s the story we can learn after we apply such an approach?

  • Cohorts from 3/1 to 3/4 vary by user retention but we don’t need to react. It’s a natural variation.
  • Cohort 3/5 had some retention issues. It’s advisable to dig deeper into this.
  • Cohorts from 3/6 to 3/9 started growing a bit, user retention varies but we don’t need to react.
  • Cohorts from 3/10 to 3/11 were scaled greatly. The number of users increased considerably and user retention dropped. That’s often happened with paid UA.
  • Since then nothing interesting from an analytical standpoint happened.

As a result, we managed to learn a story and figured out 2 cases when we needed to pay attention:

  • we need to check data for user cohort 3/5
  • we need to give feedback to the marketing team about scaling user cohort 3/10. The marketing team made changes to paid ad targeting. The next day user retention is OK and no need to take urgent actions further.

Is this approach universal and robust?

Unfortunately, it’s not. For example, a lookback window of 7 days is arbitrary.

However, two main goals were largely achieved:

  • distinguish signals from noise
  • decrease the number of false positive signals from default conditional formatting

SUMMARY:

All analytics vendors can prepare user cohorts and color them using conditional formatting.

Unfortunately, how they do that hides the story that happened inside your data.

In the next post, I will apply this approach to all cohort periods and we will see what we can find there.

--

--

Paul Levchuk

Leverage data to optimize customer lifecycle (acquisition, engagement, retention). Follow for insights!