How to Assess Startups Using Machine Learning: Part I — Introduction

Fabien Durand
PreSeries
Published in
4 min readSep 6, 2018
Can Machine Learning help?

The data geeks of the VC community finally have their newsletter. Subscribe here!

How to Assess Startups Using Machine Learning

Part I — Introduction

Part II — The GASP Open Source Framework

Part III — The GASP for Predictive Modelling

Where We Currently are

At PreSeries we simplify the way startup investors source deals and assess them thanks to machine learning. We use data to help investors and analysts mitigate common biases and limitations inherent to scouting and due diligence in businesses with highly uncertain outcomes.

So far, we already built a complete IT infrastructure that consolidates and generates machine learning models from public sources (namely CrunchBase, Twitter, USPTO, etc.). Currently, our predictive models score and rank startups looking at 300+ variables. We simulate scenarios (IPO, Acquisition, Closure) for more than 400k companies. But basing our probabilistic scenarios mainly on public information is very limiting. Public data only brings you so far, we need to go below the surface. Any machine learning practitioner will tell you, the quality of the data used is more important than the model itself.

Therefore, the next logical step is to incorporate private startup information (growth metrics, financials, team data, etc.) which is the staple of the venture capital world. No investment decision will ever be made without a thorough look at it.

Overview of PreSeries

Designing A Modern Approach for Startup Assessment

But why is it so difficult for investors to combine the goldmine that is public data with their own proprietary startup information? Well there are 2 main aspects to consider:

  1. Building a sophisticated tech stack is not core to investors. Even for those than have some engineering background. Spending hundreds of thousands of dollars hiring a team has been the most popular solution so far. We expect this to change, as it is quite an engineering challenge.
  2. Investors work differently from one another: different data points, different investment criteria, different experiences and methods, …

Regarding the first point, we spent the last few years working on it. We now have an machine learning infrastructure already automating the processing of public startup data around the clock and ready to privately and anonymously integrate investors data. (You integrate another data source? You name it! Our system is flexible and quickly scalable)

The second point, is the reason for this blogpost. Because there is no standard industry practice in venture capital to assess startups, we took it on ourselves to design a framework that can be used and re-used freely by anyone anywhere. The objective of this framework is to offer a common set of variables that can be derived to easily perform feature engineering tasks for machine learning. In other words, by standardizing the collection of startup data we empower investors to put that data to use, uncover insights, and not let it sit on a dusty spreadsheet, never to be looked at again. We named our framework the GASP (Generally Accepted Startup Principles), a very obvious pun on the mother of accounting standards, the GAAP. We dive into the GASP in Part II of our series.

All of the above makes sense on paper. Using software to collect and process startup data is a great idea, but we need a reality check right? For this exact reason, we just announced the launch of our first ever Online Startup Battle. First, we are using our GASP framework to anonymously collect data on past startups in order to build our training dataset. Second, we’ll invite all startup interested in fundraising to apply and let our machine learning models predict the most likely to succeed and award the prize. We dive into the machine learning process in Part III, and the Online Startup Battle in Part IV of our series.

Dilbert

Let Us Know Your Thoughts

The GASP is free and always improving, don’t miss next iterations by joining our Venture Technology Newsletter (we email once a month).

If you’re interested in using our platform to use machine learning on startup data, please get in touch with us. We’re on Twitter too, we’d love to hear your story!

--

--