Power laws with pooling: a more realistic model of venture returns
TL;DR: In the last post I built some models of venture portfolios of different sizes based on the idea that venture outcomes are powerlaw distributed. The conclusion was, other things being equal, bigger portfolios should do better, with around 150 investments being table stakes. Those models assumed each investment was independent, with exactly one investor. The pooled model presented here is more realistic; overall it dampens expected returns without changing the overall pattern that more is better. Interesting implications are that dealflow and brand are key for VC.
In the last post we simulated over 45M independent outcomes (sum(x=5->300) x*1000).
But in practice portfolios are not independent and the universe of investible businesses is much smaller. Our first model is more like a simulation of venture builders, where the fund creates its own independent businesses.
How about instead we create a pool of businesses that receive investment, and our portfolios sample from this pool? In other words, in the investing life of a fund, a finite number of fundable businesses will be created. During this period, every fund will be choosing from that finite set of businesses.
How big should that pool be ? Well according to crunchbase, in 2016/2017 around 3,500 angel and seed funding rounds happened globally per quarter:
Let’s say there’s some under-counting, and that the number is growing, and round up to 5,000 rounds per quarter.
So in the 3–5 year investing life of a typical fund, there’s a unverse of 60,000–100,000 companies that they theoretically could invest in (of course they wouldn’t see that many deals, but that doesn’t affect our model).
Let’s split the difference, and take a pool size of 80,000 companies. What I’m going to do next then is generate this pool from the same power law distribution as the previous post, and have each portfolio draw its investments at random from the pool.
Here’s a histogram of 10,000 companies drawn at random from this pool, compared to the correlation ventures data, and the independent draws data from the original post:
Note this is only a gut-check since I don’t have access to the underlying data and so can’t fit the model, but it looks reasonable. I’ve also tried a variety of parameters for the power law distribution, and the overall trends are robust.
Let’s use the same method as before, sampling 1000 portfolios of each size between 5 and 300 investments, and looking at average statistics across each set of 1000. We’re interested in the triple rate (how often a random portfolio could be expected to triple the fund) and the failure rate (how often it could be expected to return <1x).
Compared to the original independent draws model:
You can see that the pattern is very similar — it’s hard to tell the difference.
The differences between the models come out when you look at the mean performance:
This time the mean is behaving more as we would expect; it’s not affected by the occasional extreme outlier that we get in 45M+ independent draws. It still looks high however compared (anecdotally) to historical returns. This may again just be the effect of outlier portfolios — the small number that do really really well as you’d expect with samples from an underlying power law. Again, median may give a better sense of ‘typical’ results:
This looks more in line with historical data (and again very similar to the unpooled model). Again, median performance continues to improve through this range.
The pooled model of venture returns is intuitively more realistic than independent draws from a power law distribution, but the underlying trends remain the same: at least within this range, more investments are better, and 100–150 seem to be ‘table stakes’ in the sense that after this, no portfolio in 1000 samples loses money.
Thinking about a cohort of ~80,000 investible companies within the investment lifetime of a fund leads to some interesting insights:
- Even at that scale, the biggest return within the cohort is a significant portion of the overall returns. In the pool that I generated in this post, the maximum return of any single investment is 18660x, comparable to Jerry Neumann’s estimate of 10000x for the return of Andy Bechtolstein’s $100K cheque into Google, or the return on YC’s investment in DropBox.
- What’s especially interesting is that single return is over 5% of the entire return from those 80,000 companies. This implies that in any given cohort, fund performance will be dominated by whether you get into that 1 investment :). Any fund that does will tend to outperform every other fund in that cohort.
- In fact 5% may be a significant underestimate. I simulated 1000 cohorts of 80,000 companies each; the average ratio of the single most successful company to the return of the entire pool was just over 17%.
What would you conclude if you took this modelling seriously ? Firstly, I think, deal flow is key: can you build a machine that can see and reliably process a high number of deals, while providing a great service to entrepreneurs ?
Secondly, and related, brand is enormously important. Success breeds success. Brand improves deal flow, and helps a fund get into deals that it likes. So the early years of an investment company should try to build up the brand.
[If you want to read & experiment with the code that generated these graphs, you can copy it from Google Drive or grab it from GitHub. If you’d like to help me extend it or turn it into a more re-usable library, email me stevecrossan @ gmail or send me a CL :)]
Next post: follow on strategy, fund returners.