Having a VC Cohort Conversation: What VCs expect from Cohort Analysis

Once upon a time I founded a company. I now play a different role in the fairy-tale as a VC. One of the things I wish I’d understood better as I was building my business was the importance of cohort analyses as a decision making tool. This is now a key tool I use as a VC to judge the health of businesses I look at investing in. I don’t think I’m alone in this, as I now frequently give advice on this topic to early stage founders. Herein lies my how-to-guide.

Cohort analysis is the study of behaviors of groups over time, and is a critical piece of understanding how your business is performing and unit economics are evolving. Sometimes for fast growing companies when you look at high level data like total revenues, adding new customers or new markets can obfuscate whether a company has healthy retention of users. For example, let’s say you’re running an eCommerce company selling dresses. Total revenue might be increasing 100% year-over-year, but if this is all being driven by adding new customers, that’s not a sustainable model (eventually you will run out of new people to acquire, or it will get cost prohibitive to acquire new customers). In a healthy business you would expect to see revenue growth driven, in part, by repeat purchases.

Typically for eCommerce companies we do a cohort analysis grouping users by month of first purchase and tracking revenue over time. If you see that most customers who purchased 12 months ago are no longer shopping, it‘s’ a red flag. In contrast, if 60% of users are retained in month 12 and continuing to purchase dresses, the business is more interesting and it will be easier to continue to grow revenue over the longer term. As an example, the table tracking your net revenues might look something like this:

Table 1. Net Revenues by Month of First Purchase

This takes the guesswork and generalizations out of retention, because the retention rate is baked into each month’s actual net revenue number. You can see this when you divide your net revenues each month by the number of customers in the cohort (the number of customers in month 1). The table might look something like this:

Table 2. Average Revenue per User by Cohort (month of first purchase)

Here’s the data rolled up by quarter to make the chart less busy so you can see the trends, which will look something like this:

Chart 1. ARPU by Quarter Cohort over Time

This is dummy data, but pretty typical for what you’d expect to see for an early stage eCommerce company. Revenues generally drop off in month 2 but will hopefully find a steady-state in the out-months. If we look at the blue Q1 ’15 line as an example, the fact that the line in month 14 is above $0 means that customers acquired in Q1 ’15 are still purchasing 14 months after their first purchase. The higher the line, the better the company is at getting customers to stick around and repeat purchase. Another interesting thing you might see is that the line curves back up in the out months — if the users retained are power users and increase their spend due to buying more of the same thing, or if the company expands into new categories and is able to capture more revenue per user.

Often, as a next step we’ll convert the data from revenue to Lifetime Value (LTV) which we look at on a contribution margin basis. To convert revenue into lifetime value, multiply your net revenue by your contribution margin. For example, let’s say your contribution margin is 35%. You would take row 1 from Table 2 (customers who made a first purchase in Jan 15) and multiply each cell by 35%.

Next, create a new table where you capture the cumulative contribution $ to get a view of historical LTV by period of time (month, quarter). I’ve focused on quarters to simplify the visual. To do this you will sum the contribution margin in a given quarter with the contribution margin in all the prior quarters. That table could look like this:

Table 3. Cumulative Contribution $ by Quarter Cohort over Time

Here is what charting the table produces:

Chart 2. Cumulative LTV (on CM Basis) by Quarter Cohort over Time

Here you can see a nice positive trend, LTV by cohort is increasing over time (each line is moving up and to the right). Even though there’s incomplete data for Q1 ’16, you can see that the LTV trendline for Q1 ’16 compares favorably (is higher than) Q1 ’15. That means the company is improving how they monetize customers over time. The lines also don’t plateau in month 14, the steeper the lines the more customers are spending over time.

If you’ve made it this far, congrats! Here is your reward: see the sample spreadsheet used to build the tables and charts here.

Note, in this example we’ve broken out cohorts by time period (month, quarter), however for companies that are growing through geographic expansion, another critical way we’ll cut the data is by geography. For example, for a company like Uber we would want to create cohorts based on city to understand the characteristics of each market over time (e.g., how more mature markets hold up, whether they are continuing to grow and at what point they reach market saturation). Other interesting cohort types are acquisition channel, customer type, platform, and use case.

Final thought: I recently saw an interesting open source cohort tool on Product Hunt. While I have not yet tried it out myself, at first glance it looks interesting and may save you some time.