The biggest challenges for data-driven VCs

Abel Samot
Red River West
Published in
5 min readOct 12, 2022

Many investors in private equity are leveraing data, but most of the time, this means either buying an off-the shelf CRM software or an analyst doing data crunching with pivot tables excel.

At Red River West, we decided to built our own data platform from scratch with a dedicated data base, an user friendly front end, user management, powerful algorithms etc. all of that hosted in the cloud.

The following article the describe the challenges of such an ambition.

To know more about data driven VC, you can read my first article on the subject here 👉 https://medium.com/@abel_samot/2497a28bb3fa

Building a great data platform is very challenging in any industry. You have to take care of performance issues, create an architecture that scales, choose a type of database that fits your purpose, pick the right data sources, build powerful algorithms that don’t consume too much, and much more…

And even if Hedge Funds have been doing it for years, it’s far more complex when we talk about private equity. Indeed, Hedge funds do it on listed companies where there is a lot of available information. But when analyzing private companies, most of the time, you don’t have access to any financial data and you have to create your own dataset from multiple sources. And the smaller the companies you want to analyze, the more complex it becomes.

It’s one of the reasons why most VC firms don’t have a Data-Driven approach yet and why so many of them don’t even think it could be useful. But with the right approach, it can not only be useful, but it can completely change the performance of any VC fund.

That’s why in this article I will enumerate some of the reasons why building a data platform for a venture capital firm could be challenging. I hope it will allow some of you to reflect on the ways to tackle these challenges while you are building your own strategy and thus save you from some serious headaches.

  1. The poor data quality from most of the sources. Most of the startup’s & founders’ data sources are inconsistent. You’ll never reach perfect coverage, a lot of data is outdated and most of all, the amount of missing data is tremendous. Whether you are using Crunchbase, Pitchbook, Dealroom or any other type of data source, you will not only find that a lot of startups are missing, but also that most of the data points these sources claim to provide are missing. You shall also find a lot of mistakes and inconsistencies in those data sources. That’s why it takes a lot of ingenuity and tricks to clean up your data and create powerful algorithms based on it.
  2. You will have to navigate through the noise: as the data quality of most of your sources will be poor, it will create a lot of noise that could trigger wrong signals. In addition, as you add more data sources, they will potentially be less reliable, less complete, or less easy to integrate. This will decrease the “Signal to Noise” ratio because the noise could increase faster than the signal detection. The “art” of algorithmic sourcing is to keep this ratio as high as possible by having reliable data, integrating them in an optimal way, and developing algorithms that will identify & clean this noise. All this requires real data-driven investor skills and deep technical expertise.
  3. Matching data sources: if you are using multiple startup data sources like Crunchbase and Pitchbook, it could be very complicated to make them match together and create a consistent dataset. Indeed, a lot of different companies have the same names and some companies have different designations across sources. You could use the domain name, but even with that, you will find that across sources the same company can have different domains making the matching process way harder.
  4. Backtesting your algorithms: the problem with data for VC is that the cycles are very long. In order to know if an investment decision was good, you might have to wait for 10 years. So, not only do you need to have a lot of historical data in order to create powerful algorithms that you will be able to backtest, but even with these data points, as the economic environment is always changing, predictions made from past data might be completely wrong now. It’s not because something was successful 10 years ago that a similar thing could be successful now. It makes it almost impossible to create algorithms that predict the percentage of chance of success of a company thanks to the data you provide to it because there are so many external factors at hand. That’s one of the reasons why AI can’t replace humans in the startup analysis process. At Red River West we believe AI and Data can enhance VCs and give them superhuman skills but never replace them.
  5. Creating a platform that is used every day: it might sound silly when you say it like that, but we met a lot of VC funds that were doing incredible things that no one uses. It’s not enough to build a good platform and create good insights, you have to build something people will want to use and will understand very easily. It means you need to focus on the UX & UI and use product management practices with multiple users interview in order to build the best platform for your users. Be careful, that’s not because they tell you that something would be cool that they will use it.
  6. You will move forward almost blindly: data-driven VC is a new industry. Despite a little bit of good content available on the web, there is no playbook on how to apply it to your funds, which sources should you choose, what type of team should you build, etc. It is up to you to figure all of this out and even if some ideas might seem shiny on paper, you will definitely reach a lot of dead ends. It’s the game :)
  7. Scaling can become a huge challenge: it’s easy to build a first tool helping you to source in a small country with only one data source. But as soon as you add more data sources and more startups, the size of your database will grow exponentially and your architecture will need to follow. Thus, you will need to build a top-notch data architecture and a lot of tools that you were using might become quickly outdated as your database scales. You will need real DevOps and advanced data engineering skills to tackle this!

I could continue to enumerate a lot of other challenges but I believe that these are amongst the biggest you may face while trying to build this type of platform.

But don’t let that discourage you, it’s worth it 100 times if tackling those challenges allows you to build a platform used across your teams to source, screen, and analyze startups 🚀

--

--