Demystifying Startup Equity with Data Science

What percentage of ownership do investors acquire at each financing stage?

Sebastian Quintero
Jun 10, 2019 · 6 min read

If you’re a startup founder, you may struggle to find a reasonable percentage of equity to give your investors in return for their capital. While this is still something you need to figure out on a case by case basis, here’s some data to help guide your judgment.


Earlier this year my research lab at Radicle released a working paper and online model that makes it easy for anyone to approximate an undisclosed startup valuation. The log-log model we published uses the amount of capital raised by the startup and the financing round’s stage classification to predict the valuation. It’s super simple and works reasonably well.

Take this article from Fortune in April of 2019. Out of the 24 venture deals presented in that article, only one has a disclosed valuation — Outreach with a $114m Series E valued at $1.1b. If you input those values into our online model, you’ll get an estimated valuation of approximately $1.1b.

Our estimated valuation is right on the mark for this out of sample observation, which gives us confidence, beyond the performance statistics of the model, that it can help us roughly approximate undisclosed valuations out in the real world. Of course, the model doesn’t always work so well. In the paper, I go into considerable detail on how being off by millions and sometimes billions of dollars, depending on the stage classification, is inevitable given the log-normal nature of startup valuation distributions.

That said, we’ve received considerable positive feedback for the paper, and we’re happy it’s been so helpful to the community. This is the first in a series of follow-ups to release additional perspectives that are enabled by the data we collected. The rest of this work assumes some familiarity with the original paper.


We previously looked at the implied percent ownership acquired by investors at each specific financing stage as a way to assess the quality of our data for the startup valuation model. We presented these values in the paper as a summary statistic: We calculated the values by dividing the amount of capital raised by the disclosed post-money valuation and computed the median per stage classification. Since we had the original data lying around I figured it would be helpful to the community if we made it publicly available and further elaborated upon it. As is standard, our data was obtained from sources believed to be reliable (Crunchbase) and provided solely for educational purposes. We’re releasing this research with the aim of removing information asymmetries and enabling more constructive conversations for decision-making around valuations between entrepreneurs and investors, and more generally, anyone that engages with startups, including employees.

The raw data for this analysis was pulled from Crunchbase in early June 2019 (updated from the original set) and was processed at Radicle’s Inference and Machine Learning Group. We removed a few observations from the data that were erroneous and performed some general data cleanup as well as manipulation to produce the distributions in this article. In total, we have 8,639 observations––effectively all company financing events in Crunchbase that have a disclosed post-money valuation, amount of capital raised, and the associated venture capital stage classification. One limitation of the data we have is its level of granularity–the number of shares sold at each stage is distributed across a number of investors at each stage (if there are more than one), and we have no real way of measuring those drilled down distributions. And many others in the community have provided resources for calculating and thinking about dilution as companies issue new stock, so we leave that topic out of this analysis. The following table outlines the summary statistics for the data.

Image for post
Image for post

In general, we find that the median percent of shares acquired by investors peaks at Series A and then trends down as company valuations increase. It’s commonly believed that venture capitalists insist on obtaining 20 percent ownership for Series A rounds, which is nice to see backed up by data, but this overall trend is a somewhat curious phenomenon, and it isn’t immediately obvious why it plays out this way. Part of it may be explained by dilution dynamics––VCs generally know how much of a company they need at the Series A stage to return their fund if the company exits. It’s also intuitive to consider that investors buy increasingly smaller stakes in startups as they invest increasingly larger levels of capital, but only because they’re doing so at such high valuations that those smaller stakes would still yield considerable returns if the company continues to grow and eventually has a successful liquidity event. But that doesn’t answer the question–why does it peak at Series A? Why not Seed, or Pre-Seed? Or Series B?

Image for post
Image for post
Image for post
Image for post

Drawing the distributions with kernel density estimation (above, right) provides a slightly clearer picture of the story. For early-stage rounds, the distributions follow a similar mean and variance, with the only notable difference between the classes being the bump in percent ownership acquired for Seed stage companies at ~ 50%. The central tendency for the Series A distribution is pushed further out and has a higher variance. The distributions then start to collapse back from Series B onwards.

Digging further into the data we found that outliers are pervasive at the early stages, most notably at the Seed stage.

Image for post
Image for post

So what’s the takeaway?

We think it’s possible that some early-stage founders give up an outsized share of their company to investors at the negotiation table because they’ve struggled to raise capital, or otherwise don’t have the data to make a sound decision about how much of their company they should part with. Hypothetical but very real scenario: Cash is tight or nonexistent, credit cards are maxed out, payroll is due, and they may only have a few viable investors that are both interested and strategically positioned to fund their startup and add value to their company.

In addition, we know from previous research that the rate of failure to raise the following round changes by venture capital stage, which implies that it’s more difficult to raise capital at certain stages of the startup lifecycle. Progressing from Seed to Series A is the most difficult of the early stage sequences, statistically speaking, and therefore, it makes sense that it’s also the stage where investors tend to acquire a higher percentage of the company. From a probabilistic point of view, it’s intuitive to consider that investors are in a position of relative advantage at the negotiation table, and entrepreneurs at a relative disadvantage. This dynamic then inverting towards the later stages of a startup’s lifecycle, when valuations increase sharply and investors are competing to get in the company’s capitalization table before it goes public or is acquired.

That said, the above argument is just conjecture, as we simply don’t have the data that would enable us to tease out the causality. Since we’re working on a range of topics here––and we can’t really justify any additional research expenditure on this specific topic––we’re happy to throw it out there for any graduate students that may be interested. Might be a good research topic for an empirical dissertation. Hint: To figure this out you would need to measure how difficult it was for firms to raise capital, at least via some proxy measure, and have the associated ownership statistics. Throw in some experimental design, a few controls, and boom, you have a dissertation. Of course, this data is not easy to come by, but that’s for you to figure out.

Additional Perspectives

Journal of Empirical Entrepreneurship

Data-driven insights on startups and venture capital from…

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store