Engagement Series

User Engagement — D1 retention

How to think about Day 1 retention analysis

Paul Levchuk
4 min readNov 21, 2023

Every product manager knows that D1 retention is the first significant milestone for his product. Whether your product can achieve high D1 retention or not — is the question that distinguishes the best products from the rest.

High-level approach

When you have a measure and you want to improve it, two of the most powerful options are:

  1. divide user flow into some meaningful stages, define metrics for each stage, and then multiply them to get the target metric
  2. segment users by a few key behavioral events to learn the sizes of segments and their target metric

Both approaches can give you two different perspectives on what’s going on in the product.

Today I’m going to show you the the first approach.

D1 retention decomposition by user stages

Our target metric is D1 retention.

It means that we want to know how many users return on D1 (as usual in a range from 24 up to 48 hours from the user sign-up date).

What meaningful stages could we divide user flow on?

  • 1st session (it’s always much less than 24 hours)
  • % of users who started 2nd session within 24 hours after sign-up
  • % of users who returned on D1 after 2nd session

Now we are ready to shape a table with metrics and start our analytical research.

D1 retention decomposition by user stages

What can we learn from the table above?

  • marketing started scaling User Acquisition on 05/20. Probably the quality of traffic started decreasing and that’s why D1 retention decreased.
  • [% users who started 2nd session] and [% users who returned on D1 after 2nd session] are varying a lot. So far it’s not clear whether there is some relationship between those factors and [% users returned on D1].

Let’s sort the table by decreasing our target metric — [% users returned on D1]. This could help to uncover some visual patterns.

D1 composition: sorted by D1 decreasing.

Indeed, the picture is more clear now:

  • the days when the metric [% users returned on D1] is high when our decompose factors are high as well
  • the days when the metric [% users returned on D1] is low when our decompose factors are low as well

It means two things:

  1. decomposition is meaningful
  2. it can be used to explain the variability of our target variable

Clusterization

Now let’s run simple k-means clusterization based on our two factors and aggregate metrics on the cluster level:

D1 clusters summary

From the table we can learn:

  • the best D1 retention results are possible to get when we have both conversions high (Cluster4)
  • even with low conversion to 2nd session, it’s possible to improve D1 retention provided that we successfully engage users on 2nd session (Cluster3)

In general, [% users who returned on D1 after 2nd session] is higher than [% users who started 2nd session].

Taking into account that to get [% users returned on D1] we need to multiply factors, we can conclude that the factor [returning on D1 after 2nd session] is more important than just factor [returning to 2nd session].

In other words:

it’s more important to deliver the 1st piece of value to users, even if it happens during the 2nd session or later than just managing to return users to 2nd session.

Now it’s time to look into clusters.

Let’s build a scatter plot and learn the spread of data points within clusters. To get more sense from the data I added 2 medians and used [% users who returned on D1] as the size of points.

D1 retention clusters

It seems that medians could be used as a rule of thumb to figure out which dates are going to be the best ones in terms of D1 retention.

To make sure clusterization will make sense in the long run period, let’s build cohorts based on clusters and check D7 retention:

Check D7 retention for clusters

From the table we can learn:

  • cohorts from Cluster2 (both factors are low) are dying much faster compared to other cohorts
  • cohorts from Cluster4 (both factors are high) are running better than cohorts from other clusters but not so much better

The last point should bring us to the conclusion that:

using the user staging approach mostly works for checking for bad User Acquisition scenarios.

Next time I will show another approach: segmenting users by a few key behavioral events.

--

--

Paul Levchuk

Leverage data to optimize customer lifecycle (acquisition, engagement, retention). Follow for insights!