Data-led investment: two fundamentally different approaches

Olivier Huez
Red River West
Published in
7 min readJun 13, 2023

The utilisation of data science for investment is becoming increasingly popular within the private equity investor community. As highlighted in Andre Retterath’s landscape study, certain funds are at the forefront thanks to their long-standing tech teams and the development of their own proprietary platforms, which grant them a competitive edge. Many others are also venturing into this realm, usually starting with simple insights from databases like Crunchbase. A few funds have even established rudimentary scrapers to gather additional data, only to discover that the process is more complex than initially anticipated, encountering challenges in data engineering along the way.

After a few months of playing with data, most give up but a few plow on and aim at leveraging data science to source startups (i.e identify investment opportunities) and screen them (i.e provide a first assessment with a score of the potential of the company).

As we started along this path a while back at Red River West, I thought I’d share some key learnings.

Clearly, it is crucial to identify suitable data sources and evaluate their quality to ensure an appealing signal-to-noise ratio (as I discussed in a previous article). However, if we take a broader perspective, there are essentially two fundamental approaches to leveraging data for Venture Capital.

I’d call the first one “AI first” :

The principle is to train an AI algorithm on the history of many startups or founders; so that the algorithm can then predict or assess the chance of future success of a given startups.

This requires gathering a data set as large and reliable as possible of founders and companies with many data points, features, context data, competition view etc… with for each of them, an objective measure of the success these companies had after a given investment round.

As the power of AI relies on its ability to detect patterns, on paper, this is an awesome approach: the algorithm could potentially identify features, metrics, behaviours that a human “eye” wouldn’t. It also keeps the solution rather flexible: the algorithm can be retrained as new data become available so that it’s constantly state of the art. It also means that it doesn’t require understanding each business in and out as the algorithm will take care of selecting the most relevant indicators of future successes.

There are naturally important limitations and challenges, starting with gathering the data set: it’s very difficult to find meaningful and reliable “old” data:
What were the main KPIs of AirBnB when they were at seed stage, what were the other companies doing in that space and what was state of the art technology at the time?
When founders are well known, it’s possible to find data from their book, blog or speeches on their background before starting the company, but the storytelling sometimes doesn’t match reality…
What was the website’s traffic initially (or the app download numbers later on) ?
What was their Customer Acquisition cost?
What did the employee evolution look like ?

A few experienced investors have kept very detailed records of the thousands of companies they analysed in the past, the pitch deck they shared, their meeting notes etc.. They could potentially leverage this data (assuming they’re allowed to) but for the others, it’s almost impossible to reconstruct such a data set.

The other difficulty is the measure of success of the founders and company in the data set:
How do you define success? A large M&A acquisition or a successful IPO are obvious rather obvious but often, 8–10 years after foundation, it’s still too early to have a financial outcome for all investors: many companies stay private for a long time (SpaceX has not IPO’ed nor being acquired but its early investors are very happy!). On the other hand, an early-stage investor often has the possibility to exit as secondary long before an IPO so their measure of success might be very different from the company’s own success.

Even if it was possible to build a near perfect dataset, success drivers at the scale of a company vary a lot: features that could have explained it in the past might not in the future: technology evolves fast and what took hours or days 10 years ago can now be done in seconds thanks to generative AI eg. Customers’ expectations and know-how also vary very quickly. Also, the macro-economic environment has a strong influence on the average round size, the pace of investment rounds etc…

A key challenge is the ability to extract the features that made a given company a success when it was created many years ago and decorrelate them from the context at the time so that it can be applied. In practice, this is again, really hard to do.

One possible approach which leads to interesting results is to focus on the founders themselves. Many entrepreneurs have experienced successes in extremely different environments, so what is it that they have and makes them successful ? is it possible to identify and measure these traits and find them in other founders ?

The second approach is “KPI first”

This approach starts from the business perspective and takes the assumptions that we are looking for are indirect markers of business growth. This seems easier than the previous method: there are many KPIs (Key Performance indicators) that are correlated with the business’ success: if the number of employees is growing fast, the activity must be good ! if the website’s traffic is skyrocketing, customers must be buying the product/service; if the company is granted patents, their technology must be good…

Oh, but wait…
Is the number of employees growing just because they raised money and are spending it?
Do customers visit the site only because the company is spending a lot on marketing?
Do they visit the website and bounce without buying anything?
Is the increase in webtraffic just a reflection of the sector' s seasonality?
Do patents actually translate into real defensible value?

Some KPI make more sense for B2B businesses, whereas others are relevant to B2C models, spending money on advertising is not a bad thing either…

And then, as the number of data points increases (we get hundreds of data points for each startup on our platform), some are more reliable than others, some more frequent than others… it becomes complicated to combine them…
If number of employees is up but app downloads are down, is that good or bad?
What is more important: the number of hires or the profile of these hires?
Is it better to have one powerful patent or many different ones?

Using a business first approach is much easier to start with but as the volume of data increases, there are more and more difficulties to overcome.

Data engineering is crucial to make sure the data is clean and relevant

The combination of different signals needs to be carefully thought through, leveraging the know-how of the teams’ investors (this is where having entrepreneurs and operators in the team offers a huge advantage).
Finally, there is an important risk of missing signals or patterns because they were not included in the “rules” defined on the platform or because new models appear (e.g distributed models on blockchain) that didn’t exist when the investors were running businesses or P&Ls within a larger organisation…

So, shall we go left or right ?

As we saw, an AI-first strategy may appear ideal in theory, but its implementation in practice proves to be challenging. This approach is likely more effective when directed towards the founders rather than the company itself, making it particularly relevant for early-stage investments where comprehensive information on the company’s business metrics is often limited.

Vice versa, for later stage investments, the data is more easily available and richer, so using a business rules approach is more likely to yield interesting results at the beginning but causes headaches down the line !!

At Red River West, we’re a late stage investor, we tend to focus on Series B investments and our partnership team includes entrepreneurs like Alfred, Antoine and operators (I ran a big P&L at Orange and was CFO of two different businesses) but we actually started with an AI-first approach until we realised that the business first approach worked best for us… That said, after a couple of years, we introduced some AI components on selected key features…

So the path to data driven VC does have an important fork between these two approaches but like most things, it’s actually not necessarily exclusive and after some time, it’s possible to leverage (we hope) the best of both !

Happy to chat further and answer any questions !

Olivier

PS: All photos are personal edits playing with Adobe firefly… and yes, I love hiking ;-)

--

--