There seems to be a significant positive bias in the dataset, in that it only reviews data from companies that have already been selected by VCs, then determines which of that reduced set it expects will be most successful.
I think a more interesting model would be based on the entire dataset of the deal flow from the VCs polled, to try to find companies that they didn’t invest in that are predicted or demonstrated high performers.
Running the data with this implicit internal bias only reaffirms VC decision-making, while running data without the implicit internal bias would potentially challenge VC decision making.
Personally, I find challenging dogmatic assumptions to be more productive and leads to more opportunity than reaffirming them.
On the other hand, there would have to be some form of regressive compensation modeled into the data because for many startups being declined for funding is fatal, and they may have had superior performance if not for their lack of funding, but of course it’s impossible to measure events that didn’t happen. So it’s a struggle to determine who would have demonstrated superior performance if the VCs had selected them for funding.
This leads to a morass of questions surrounding whether the companies were successful because they received funding, or if they received funding because they were going to be successful.
Personally, I believe (without evidence, which is why it’s a belief) that VCs miss as many good deals (or more) than they invest in, due to network validation behaviors, existing personal biases, inherent blind-spots in pattern matching (aka it’s hard to identify a new pattern you haven’t identified previously, but it’s easy to identify a pattern you’re already familiar with identifying) and overfitting / underfitting their mental models for what makes a startup attractive.
