Diligence at Social Capital Part 5: Depth of Engagement and Quality of Revenue

Jonathan Hsu
The Startup
Published in
11 min readNov 9, 2015


[Note from the author: See an update to the thinking presented in these articles in my more recent writing on A Quantitative Approach to Product Market Fit and the follow up Unit Economics and the Pursuit of Scale Invariance. I no longer work at Social Capital, if you want to reach me you can email me at jonathan@tribecap.co]

In the first two parts of this series we discussed growth accounting and applied it to both engagement and revenue. In the third and fourth parts we discussed lifetime value (LTV) in businesses that generate revenue and then applied the LTV framework to retention and other engagement related value. The frameworks covered so far give a good sense of the underlying mechanics of top-line growth. Today, we’ll cover a somewhat orthogonal question that we often ask in diligence, namely, “What is the distribution of engagement across the user-base?” and the related revenue question “How solid is the revenue stream?”

The Engagement Case

Consider the consumer app case. One area that isn’t covered by the previous approaches is the depth of engagement. Start with a monthly active user (MAU) growth accounting (as covered in Part 1 of this series) for some app that shows some solid month-on-month retention with reasonable churn. With the frameworks we’ve so far covered we don’t currently have a good way of separating out those MAU into highly engaged users vs. marginally engaged users.

To quantify depth of engagement in a consumer app, the most generic per-user metric would be “days active in the month”. Recall the discussion in the second part of this series where we introduced the concept of L28. A user of L28=10 was active 10 days of the last 28. If you sum up L28 across all users in a month (as measured on the last day of the month) you get the total sum of DAU across the month. Each user’s L28 is an example of value being created and thus you can do growth accounting on the total L28 from month-to-month. The previous post on non-revenue LTV is essentially a cumulative lifetime measure of L28. Let’s take a single 28-day MAU period (to avoid day-of-week effects) and look at an example L28 distribution:

Sample L28 Distribution

For concreteness, let’s pretend that the app has 100k MAU. The L28 distribution is now telling us that fully 34k of them are active only one day in the month. It also says that that a small chunk, 3k, are super engaged using the app every single day of the 28-day month at L28=28. This distribution provides a natural segmentation of users into “not engaged” vs. “medium engaged” and “highly engaged” which can be used for other analyses. This view is good but it turns out to be easier to read as a cumulative distribution function (CDF). Here’s the CDF of the above.

The blue line shows the CDF of users. This tells us a few things. For instance, it says that 50% of MAU are L28<=2. It also says that the top quintile of users (the top 20%) are L28>=5. The top 10% most engaged users are L28>=15. If the app is really one that is supposed to be used once per week then it would be reasonable to say that a given user has achieved product-market fit if they are L28>3. In that case, 33% of the MAU have truly hit product-market fit and the other 67% are not engaged enough for us to really believe that the app is delivering the desired level of value.

The green line is a slightly different measure. This is the CDF of “active days”. To make sense of this, first note that if you add up the L28 variable across all MAU then you get the same number as if you had added up DAU across the whole 28 day period (this may not be obvious, feel free to work it out). This is basically the same as averaging DAU across the whole period.

∑ DAU(t) = 28*average_DAU = ∑ L28(u, t_end)

Where the first sum is over the 28 days ending on t_end and the second sum is over all users u in the MAU set measured exactly at t_end.

For concreteness, considering our 100k MAU example, this data would be showing a total of 500k summed DAU across the month which would be an average of 500k/28 = 18k DAU through the month. The CDF of days is telling you how much of this summed DAU figure is made up of each L28 bucket. For instance, it says that 7% of active days are made up of the L28=1 users. While these L28=1 users made up 34% of users they only make up 7% of average DAU because they don’t contribute to DAU as often as higher L28 users. Recall that we decided to declare L28>=4 as users who have hit product-market fit. In this example, such users comprise 25% of users but are contributing ~72% of total DAU (and hence average DAU). People often talk of the Pareto principle which roughly states that 80% of the value comes from 20% of the participants. Such an 80/20 for the visitation aspect of our example product is 72/25 which is not too far off. There are fancier ways to encapsulate this notion that don’t demarcate the 80% line as special (i.e. the Gini index). However, I find such measures harder to interpret and tend to prefer some combination of “80/20” measurements.

One common measure of engagement is the ratio DAU/MAU. This is actually related to the L28 framing. If you interpret “DAU/MAU” as average_DAU/MAU(t_end) then this is really 28 times the “average number of days active in the last month”. Since sum of DAU is the same as sum of L28 this is just the mean of the above L28 distribution. The proposal here is to replace a single measure of the distribution (the average) with a view of the entire distribution of contribution to DAU. In general, the mean provides a good description of a distribution when it is close to normal. In this case, the L28 distribution is not even close to normally distributed so the mean is not as indicative of a typical experience as, say, the median or other percentile measures.

The L28 approach is also useful because it gives you a sense of the relative value of different users. If we believe that a user who is L28=5 is 5x more “valuable” than a user who is L28=1 then the L28 distribution is really the distribution of “value” across the user-base. Whether an L28=5 user is actually 5x more valuable than an L28=1 user depends on your business. For instance, if your product is ads driven then the user is spending roughly 5x more time and is hence roughly 5x more valuable. If your definition of “active” means “spent money in a transaction” then this breakdown of value is clearly directly related to monetary value and the L28=5 user is clearly ~5x more valuable than the L28=1 user.

In terms of how we’d judge the above company at Social Capital, it would depend on the nature of the actual product. If it’s really a weekly use-case type of app (like, say, an app that sells tickets on the weekend) this would be pretty good. Fully 25% of users are using it every week. However, if it’s meant to be an ongoing content consumption app this is only so-so as 75% of users don’t use it once per week. This data isn’t useful without the product context. Conversely, having only product context is not very useful without objective metrics such as the ones described here.

From Depth of Engagement to Quality of Revenue

As in the discussion around growth accounting and LTV, we can take this L28 approach and abstract it to other forms of value that a customer generates. A customer may generate value by contributing DAU to your business. A customer may also generate value by spending money.

Let’s pretend that we have a B2B SaaS product that sells to companies and charges them $1 per seat per month and that may also offer some premium services on a per-customer basis. Each customer is a company so there is a distribution of how much each customer is paying. Shown below is the distribution of revenue (monthly recurring revenue aka MRR) per customer. This is analogous to the L28 distribution above.

This distribution is a bit hard to read. There are a bunch of users spending less than ~$100 and clearly a few customers who are spending a lot (either through up-sells or more seats). As in the L28 case, this data is easier to view in a CDF:

This view makes it easier to read off what’s going on. The red line is the CDF of customers and says, for instance, 50% of customers are spending less than $20. It also says that the top 20% of customers are spending are more than $63. The blue line is the CDF of total revenue and is the analogue of the CDF of total L28 from the engagement example. This says that 30% of revenue comes from customers spending less than $50. The top 20% of customers spending more than $63 are responsible for 66% of all revenue. So 80/20 for this company is 66/20 which is to say that the lower end customers make up more of the business than a typical Pareto principle would suggest.

You could imagine a couple of extreme versions of these distributions. For instance, consider the monthly revenue from Spotify. In that case the vast majority comes from users at a single price point in which case the red line would be zero until that price point and then abruptly go up to nearly 100% at the price point. The CDF of revenue would be very similar to the CDF of users and this would be a case of “20/20”. At another extreme, consider something like FarmVille or any of the other old social/Facebook games. In those cases there are a bunch of users who spend a little and a very small number that spend a whole lot (“whales”). As such, the blue line would be way below the red line as the top few percent would make up a very large portion of the revenue. This would be a case of “99/20”.You should be able to convince yourself that the blue line here is always below the red line.

For a B2B example, consider Google Cloud Services. It was rumored that Snapchat spends something like $25–30m on Google Cloud Services each year and that they are the far and away largest spender on the service. If you were to take the CDF of customers of Google Cloud and compute the CDF of customers by revenue most customers would be spending a reasonable amount each month but the blue line would be way below the red as the Snapchat outlier would be at a very high value single-handedly making up a large fraction of total revenue.

You might be wondering why I called this section “quality of revenue”. The idea is that, in the context of enterprise SaaS, customers who are spending a lot of money in a highly recurring fashion are generally higher quality customers because they are more likely to survive a downturn. The customers on the right end of the revenue distribution are the flagship customers who are getting a lot of value from the product and would likely continue being customers in a weaker macro environment. For enterprise startups it’s often the case that small customers with low contract values are other small VC backed startups. If/when funding dries up it will be these small customers that are at higher risk of going out of business or downsizing their spend and so that part of the revenue stream would be at risk. People often talk about “average contract value” (ACV) when talking about enterprise SaaS companies. For MRR driven businesses the above approach shows the entire distribution of contract values which gives us more nuanced information than just the average. We typically encapsulate such an analysis by stating that X% of the revenue stream is subject to a downturn in startup-driven spending vs. Y% that will survive because it’s coming from high quality customers.


Let’s review the three areas that we’ve discussed in this series:

  • Growth accounting provides a framework for understanding the underlying components that drive net growth by unpacking top-line new customers from resurrection and churn. The framework applies not just to growing users but to growing anything of value including subscription revenue, visitation, posting behavior, etc.
  • The textbook explanation of lifetime value (LTV) is not the most useful one for understanding early stage companies (or early stage features). Empirical N-week or N-month LTV is preferred. The idea of LTV can be generalized from revenue generation to other activities of value such as cumulative visitation, referrals, etc.
  • Aggregate measures of engagement such as DAU/MAU provide a limited lens for understanding depth of engagement. A better practice is to observe the full distribution of depth of engagement. This is also true when considering the distribution of revenue generation where ACV is also a limited view compared with understanding the full distribution of contract values.

With these approaches we’ve given a pretty complete view of how we quantitatively think about product-market fit when conducting diligence on startups.

At Social Capital we also have an in-house growth team that spends most of it’s time embedded in portfolio companies helping them execute their growth, data and user acquisition strategies. The tools described in these posts are not just useful for a one-time determination of product-market fit but are even more powerful when integrated into the ongoing product development process. Hopefully you find them useful for understanding and driving your own business. In the future we hope to write more about our experience helping entrepreneurs grow their companies using these frameworks in specific contexts.

As always, feel free to comment or email me at jonathan@socialcapital.com if you have any questions.

Edit: For reference, here’s the full table of contents.

  1. Accounting for user growth
  2. Accounting for revenue growth
  3. Empirically observed cohort lifetime value (revenue)
  4. Empirically observed cohort lifetime value (engagement)
  5. Depth of engagement and quality of revenue
  6. Epilogue: The 8-Ball and GAAP for Startups

[Note from the author: See an update to the thinking presented in these articles here. I no longer work at Social Capital, if you want to reach me you can email me at jonathan@tribecap.co]



Jonathan Hsu
The Startup

Co-Founder and General Partner at Tribe Capital, data scientist, jazz guitarist, physicist…