Venture Capital 2.0 — the revolution of Machine Learning & Data-Driven VC

A deep-dive into the potential of Machine Learning in Venture Capital and insights from the funds currently paving the way for the industry

João Nunes
Included VC
Published in
13 min readSep 21, 2022


I n an increasingly competitive environment among Venture Capital (VC) firms, fund managers continuously search for ways to develop an edge over their peers. This realisation has led them to pursue more data-driven approaches and to start diving into the potential of data in investment processes.


  • The benefits of Machine Learning (ML) & Artificial Intelligence (AI) in the VC investment process are now very clear and irrefutable.
  • AI will be involved in 75% of venture capital investment decisions by 2025.
  • The use of data and ML models for deal sourcing & screening is already widespread in the VC industry, whereas the same cannot be said about other steps of the investment process.
  • The VC investment process will most likely never be fully automated. The best results will be achieved by leveraging data to inform investment decisions through an “augmented approach”.

🪧 Introduction to data in VC

In the past most VCs tended to almost fully rely on network referrals and inbound deal flow, to identify investment cases. However, in such a competitive environment VCs are invariably competing to be the first ones to get to great founders. This has compelled them to move more and more towards outbound. As available databases start to be more comprehensive and reliable, VCs have started to regularly analyse and track exciting startups that they find. Some firms have even gone a step further and have started to keep track of high-potential startups at a much earlier stage than they typically invest in. This is the case of UK-based Notion Capital, a Series A investor that bets on developing a relationship with founders early on to make sure that they accompany the team’s journey until they’re ready to raise their Series A.

Companies in the US recognized as early as the 1980s that the success of companies on the stock exchange can be predicted using mathematical models. Today, algorithmic trading is standard for hedge funds. In the VC business, on the other hand, everything is private: start-ups, venture capitalists, and their investors.

Image Credit

While the use of AI in the start-up investment process is still in its infancy, data is already being used on a massive scale to track company development or to automatically pull new company data into databases. VCs have also started to analyse these databases regularly and at scale by building web crawlers that are constantly trying to find new companies that fit investors’ thesis and that might be good investments.

So far, this approach has proven to bring about a considerable increase in deal flow and does not seem to carry any material disadvantages or risks. According to global research firm Gartner, “AI will be involved in 75% of venture capital investment decisions by 2025”, up from less than 5% in 2021. Even though this is a rather optimistic prediction, we are finally at an inflection point where investors’ gut feeling is going to be transformed using AI.

“AI will be involved in 75% of venture capital investment decisions by 2025”

🤖 Applications of Machine Learning & Artificial Intelligence in VC

There are several areas of the VC investment and post-investment processes that can benefit from the use of ML & AI. There are mainly five areas where these benefits are more evident:

  • Finding investment opportunities (deal sourcing).
  • Deal screening & due diligence (e.g. identifying early company growth signs, to determine the right timing of an investment).
  • Post-investment tracking growth of portfolio companies.
  • Tracking portfolio company employee satisfaction.
  • Finding great Investors or VCs that might have been overlooked.

Machine learning can help investors make more informed investment decisions and decrease the information asymmetry between founders and investors. Even though there are a myriad of potential applications of ML & AI in VC, most funds tend to only leverage these technologies from a deal sourcing perspective. This happens mostly because this is the area where it is easier to develop a data-driven approach. It is fairly quick and simple to develop web crawlers or CRM integration systems that allow a team to surface investment opportunities that fit the fund’s investment thesis. Deal sourcing is only the “low-hanging fruit”. Building ML models to help with portfolio support or the due diligence process is much harder but it is also where there is even more value to be added.

Image Credit

Even if we take a step back and think about venture 30 years ago, only a select few people were really aware of who was doing layoffs, who was hiring or who was starting a new company for instance. Data, therefore, allows us to understand the market in a way that we may not be able to humanly understand on our own. It then becomes possible for example, to find that startup in Berlin that just went through a down round which means employees are more likely to leave after that. It is also possible to find those senior-level staff members of a successful startup that just hit their four-year vesting marks and therefore are more likely to leave because they now do not have the encumbrance of the stock vesting.

I do not think, however, that we are even close to a world where it is possible to successfully automate the entire investment process and delegate investment decisions entirely to an ML model’s prediction. British statistician George E. P. Box explained it best when he said: “All models are wrong, but some are useful”.

“All models are wrong, but some are useful”

I could not agree more with this statement and I believe that the solution lies instead in an “augmented approach” as introduced by Earlybird Venture Capital Partner, Andre Retterath where we combine the strengths of human and artificial intelligence.

⚙️ Machine Learning Models — The “highlights”

The first era of research attempting to use Machine Learning in Venture Capital, focused solely on a binary definition of potential outcomes, as success was defined by the event that a company reaches IPO or not. Later studies started to also include alternative success scenarios such as acquisitions and follow-on rounds.

Image Credits

In 2021, a team of experts also set out to build an ML model that they called CapitalVX ( “Capital Venture eXchange”). They investigated 3 classes for the exit models (failure, acquisition, and IPO). The model was able to predict exits with an accuracy of around 90%.

“The model was able to predict exits with an accuracy of around 90%.”

Some studies went even further and actually started to study the performance of ML models and compare it to the performance of actual investors.

Image Credits

In 2020, Torben Antretter and a team of researchers from the University of St. Gallen (HSG) developed an algorithm and pitted it against 255 angel investors in a simulation, asking it to select the most promising investment opportunities among 623 deals from one of the largest European angel networks.

The results of this algorithm were then compared to actual investment decisions taken by these angel investors. The algorithm achieved an average IRR (Internal Rate of Return) of 7.26% whereas the actual business angels only achieved an average IRR of 2.56%. In other words, this means that the algorithm performed almost 3 times better than human investors.

“…the algorithm performed almost 3 times better than human investors.”

Nonetheless, it is even more interesting to observe that if we zoom into the angel investors group and just focus on the group of top-tier experienced investors on the list, we can observe that this group of investors vastly outperformed the algorithm, achieving an average IRR of 22.75%.

“… top tier experienced investors…vastly outperformed the algorithm”

Image Credits

Finally, another really interesting example of a similar approach is the one taken by Andre Retterath. He also used Crunchbase data to train his model, but he complemented this dataset with Pitchbook and LinkedIn information.

After training the model he fed data on 10 anonymized European startups right after they raised their Seed round in 2015/2016 to the model. Retterath surveyed 111 VC investors and provided them with the same information on the same 10 companies.

In reality, 5 of these startups had been successful and 5 had not. The algorithm’s results in predicting which of the 10 companies would be successful and the VC investors’ results were then compared. The algorithm was found to outperform the average VC by 29%.

“ The algorithm was found to outperform the average VC by 29% ”

For a full version of this academic research analysis on Machine Learning Models used in research papers, feel free to check out the more geeky/technical version of this section here: (Link)

🚀 Tendencies and Trends of Data-driven VC funds

Taking a look at the landscape of VCs utilizing data and ML models in their investment process it is interesting to note that there seems to be a generalized tendency towards building these systems from within as opposed to acquiring them.

Whilst it is true that building in-house has the tendency to yield better results for VCs given that they can control every aspect of the system and build it according to their investment thesis and ensure maximum compatibility of the system and the VC team’s existing processes, it is also true that to make this possible there are great infrastructure costs that need to be incurred. A whole engineering team needs to be hired just to focus on developing and maintaining said system and multiple software infrastructure costs need to be considered as well.

This reality also contributes to the generation of a polarizing effect on the VC industry, given that these resources are usually only available to bigger VC funds that have enough inflow of management fees to sustain such costs. VCs are also generally very afraid of false negatives (meaning companies that would have been great investments but were classified by an algorithm as potentially bad investments), which leads them to be very skeptical when it comes to trusting algorithms that suggest investment decisions.

📣 The VC Funds paving the way for Data-Driven VC

Currently, there is a myriad of VC funds that are already leveraging data-driven approaches. Some of the most interesting examples are: EQT Ventures, InReach Ventures, Nauta Capital and SignalFire.

EQT Ventures has developed its famous “Motherbrain”.
Motherbrain rates investment opportunities on a scale of 1 to 340 which then serves as a prioritization metric guiding investment professionals on which companies to investigate first. The value of EQT’s Motherbrain does not really lie in the algorithm itself but rather in the fuzzy matching of more than 40 data sources. Of around 50 investments made by EQT Ventures from 2016 to July of 2020, 7 are in companies that were directly identified by Motherbrain.

InReach Ventures is led by Roberto Bonanzinga (former partner at Balderton Capital). Last year in conversation with Roberto, I learned how most of InReach Ventures’ staff are actually ML engineers given the core role that the team attributes to their data processes. One of the team’s initiatives has been the use of NLP (Natural Language Processing) in pitch deck analysis.

Nauta Capital developed a prediction engine (to assess venture success probability), a deal flow engine, and a dynamic reserves planner (that calculates the optimal distribution of reserves for follow-on).

SignalFire describes itself as the first VC firm built from the ground up as a technology company. The firm has a structured 5-step investment process, having developed a proprietary AI-recruiting engine that tracks talent worldwide called “Beacon Talent” to help portfolio founders recruit. They have also developed a competitive intelligence system that tracks more than 2 million data sources and half a trillion data points.

During my research, I had the pleasure of speaking with some of the teams that have understood the potential of ML & AI in the VC investment process early on. I was able to understand how each of these teams is leveraging data to assist with investments.

The insights presented come from conversations with:

What is your firm’s approach to data?

  • Francesco Corea, Balderton Capital

We don’t really have a platform with a nickname like other funds. Our general approach is connecting a set of databases (Crunchbase, Pitchbook, Dealroom, Prequin, Specter, etc) and through different systems that we’ve built, every single week we have a list of carefully screened companies that are sent to the investment team. The investment team then reaches out to these companies.

Right now we are mostly focusing on sourcing and screening, and use it less for other parts of the investment process. The screening is based on criteria that we have selected from the last 3 years. Usually, this starts from filters like sector, geography, stage, etc., and get augmented by specific criteria based on what we have seen in the past as success factors.

  • Kamil Mieczakowski, Notion Capital

Notion is a Series A investor so we have built a proprietary sourcing engine to track companies at earlier stages to build a relationship with them already at Seed stage for example, in the hopes that when it comes to Series A the company will already have a close relationship with Notion.

Our sourcing engine is called RISTA — in essence, it works like a database that tracks companies that fit Notion’s thesis. Each criterion (or column) is scored according to Notion’s investment convictions and then each company will get a score computed by summing up the scores for each column. Companies with the highest scores are surfaced first and classified as the top priority.

RISTA has yielded great results, for instance, one of the founder teams it signalled with whom we built a relationship at earlier stages when it came to raising their Series A the company had 15 term sheets on the table and they chose to partner with Notion.

  • Ludvig Wärnberg Gerdin, Earlybird Venture Capital

The Earlybird Team has developed “Eagle Eye” a data-driven sourcing and screening platform that follows an AI-enhanced approach in which deterministic criteria are compared with the prioritization proposals of machine learning models.

For us it has been mostly about solving a classification problem — identifying successful startups. Using Eagle Eye’s models, we find the best combination of features that lead to startup success. With this analysis, we can also see how much a particular feature contributed to a company’s success.

Seeing the investment process as a funnel, we expand the top of the funnel — the deal origination phase — by sourcing from a multitude of sources, then narrow the funnel significantly by screening for the most interesting deals using our models. That way we make sure that the investment team spends time on deals with the highest quality.

🔮 Conclusion

The benefits of Machine Learning and Artificial Intelligence in Venture Capital are now more than evident. These are not findings that will only impact the venture ecosystem in the future, they are already being leveraged by firms that have since then benefited from concrete returns. Since deal flow is a rather quick and easy application of data-driven processes it is expected that the competitive advantage of funds that focus on it will quickly be eroded until eventually, these processes become standard in the industry. It is therefore in the next stages of the investment process that the real differentiation potential will lie in the foreseeable future.

In the end, I do not believe in the possibility of a world where the investment process can be fully automated. Venture is not like commodities or currency exchange, it is a much more human and relationship-driven asset class. In fact, to me VC especially in the early stages such as Pre-Seed and Seed holds very little resemblance to any other existing asset class, it can even be called the most “human” of asset classes.

The future lies definitely in the aforementioned “augmented approach” where VCs use data to inform decisions, but other factors will continue to be paramount in determining VC access to deals, such as network-driven factors and brand value & reputation.

Having started my career in venture at Draper Startup House, Lightshift Capital and later at Techstars my passion for startups and the venture ecosystem has only grown and there is nothing that gets me more excited than meeting inspiring founders and being able to help them grow 🚀

I’d love to hear your thoughts on the topics discussed in this article so please feel free to connect and reach out!

👉 You can find me on Twitter and LinkedIn and keep up to date with my latest insights on Medium

👉🏼 Follow Included VC on Twitter & LinkedIn

🙏 Acknowledgements

A special thank you note to everyone who contributed to the development of this article (the Included VC team, Included VC fellows, and all the VCs whose work and testimonies allowed me to develop extensive research on the topic). Namely, I would like to thank Francesco Corea, Andre Retterath, Kamil Mieczakowski, Nikita Thakrar, Anu Panesar, Ludvig Wärnberg Gerdin, Johan Torssell and Qiwei Han.


Ross, Greg, Sanjiv Das, Daniel Sciro, and Hussain Raza. 2021. “CapitalVX: A machine learning model for startup selection and exit prediction.” The Journal of Finance and Data Science 94–114.

Antretter, Torben, Ivo Blohm, Charlotta Siren, Dietmar Grichnik, Malin Malmstrom, e Joakim Wincent. 2020. “Do Algorithms Make Better — and Fairer — Investments Than Angel Investors?” Harvard Business Review.

Retterath, Andre. 2020. “Human versus computer: Who’s the better startup investor? Insights from an empirical benchmarking study.”



João Nunes
Included VC

VC at @PlayfairCapital | Enthusiast about all things venture. @Techstars @IncludedVC @DraperVentureNetwork

Recommended from Medium


See more recommendations