During my semester in school, I was exposed to a myriad of methodologies and techniques via machine learning or econometrics to help with prediction or improvements in forecasting. Coupled with my interest in start-ups and venture capital, a spark came about where I could possibly use these skills that I have learnt in school and analyze some data in a start-up ecosystem may help me get a better understanding on what entail success for a start-up.
In a recent paper by Gompers, Gornall, Kaplan and Strebulaev (2019) — How do venture capitalists make decisions; they conducted a deep analysis and survey on many venture capital(VC) firms on how they formulate their decisions to commit towards an investment. Although it was a comprehensive coverage towards a VC’s decision making on deal sourcing, investment, structuring of term sheet, etc, I will be mainly focusing on the investment portion and what factors the firms believe that attribute towards success.
Important factors for investment selection
Through their extensive survey findings, the team has uncovered key factors that VC firms scrutinize more heavily prior to their final decision if the start-up is worthy of an investment. VCs have ranked the management team to be the most important factor, followed by business related factors such as the business model, product market fit or the industry of focus. Although there are other factors such as fit with the fund, valuation, ability to add value, etc, they pale in importance as compared to the factors mentioned previously.
Further, the VCs have split the importance between early-stage and late-stage start-ups. Management team triumphs in importance for early-stage, as compared to business related factors for late-stage start-ups. For considerations on the management team, ability and industry experience takes precedence in the characteristics of the management team (with passion, entrepreneurial experience and teamwork following behind). Personally, this finding has been rather peculiar for me as I often had the perception that a serial entrepreneur with huge passion in their craft will appeal to the eyes of the investors. But to a certain extent, investors do know how to be pragmatic and choose practical factors when assessing the founders/management team.
It is to note that for a life cycle of an investment, from deal sourcing to due diligence and closing the deal — takes a considerable amount of painstaking time, which can be seen as an opportunity cost (such as managing their portfolio, connecting the start-ups to strategic opportunities, etc). VC firms on average, take 83 days to close a deal, 118 hours on due diligence during the deal period and having to refer to over 10 referrals when conducting their due diligence.
As such, this has sparked the question to me on — Can we actually find some significant factors that attribute to a start-up’s success likewise to a leading indicator, so as to significantly save the time spent on due diligence which can be used for more important things?
Introducing various types of regression to determine a start-up’s success
Obtaining a dataset from Kaggle, where there was around ~47 variables and 230 observations/companies contributing to a start-up’s success (Dependent = 1 (start-up has either grown into a unicorn or IPO-ed) or =0 (hasn’t done so or has failed)), can be a potential starting point for me to obtain some insights. I have split the dataset in a training:test observation of 180–50 and conducting various forms of regression/classification, will be ready to move forward into the analysis.
Firstly, a logit regression was conducted to find the probability that the start-up will reach to its eventual success given the desired variables/factors. Significant variables (taking only p-value of 0.05 and better) that are of higher importance to a start-up’s success are as such:
- If the start-up is developing a mobile application, there is a higher probability of success
- A larger size of the senior team count can contribute to a start-up’s success
- B2C-only models tend to fare poorer as compared to B2B or Both.
- Founders with a Masters/PhD degree, or having a niche set of domain skills are not usually associated with success. Additionally, having a Science related degree coupled with Engineering related skills will contribute to start-up success
- Having little to no difficulty in obtaining a great work force, coupled with relatively easy crowdfunding can lead to a start-up’s success
In summary, this comes about with a 46% misclassification rate.
Regression trees are ideal to conduct automatic variable interaction detection, where one is able to dive deeper between the fine lines of what the exogenous variables have to offer. More often than not, better findings are able to be churned out, coupled with a lower classification rate.
From the plotted tree as seen in Figure 1, there are similar variables that appear with respect to the logit model.
A good senior team count inevitably leads to a high success rate as mapped out by the regression tree, along with the B2C model faring poorly as usual. The additional analysis the regression provides here will be that the B2B models have its deeper caveats, with certain variables that can be identified for success. Funding from a top angel VC, along with the founders having great marketing skills and data science skills, coupled with industry exposure will inevitably lead to a start-up’s success. We can dive further into the analysis for the regression tree, but this will suffice for now.
In summary, the misclassification rate is slightly higher, coming at around 48%.
In comes random forest….
Random forest will prove to have an edge as it helps to decorrelate the trees, providing for better predictive ability. Technicalities aside, the top 10 variable importance has brought about such results as see in Figure 2.
Again, a senior team with business model comes in higher on the priority scale. The similar variables as seen with the regression trees start to re-appear such as the average investment time, founder’s array of skills along with the investor count.
The misclassification rate comes in relatively lower at 36%, signifying the enhanced capability of the random forest as compared to regression trees. Further, we can verify the recurrence of the variables can indeed lead us to certain insights.
*I have done a preliminary check with boosting as well, but for some reason — the misclassification rate is way higher, hence it will not be included as part of the analysis. (This can possibly due to a very small sample size, along with poor statistical significance as there are a wider variety of plausibilities on what can determine a start-up’s success)
By gathering the insights from the 3 findings as seen above, we can see that there are key predictors that help to evaluate a start-up’s success. Further, random forest comes in at a higher 64% prediction rate, which can be improved further if there is a more robust dataset coupled with stronger predictors. Nonetheless, this is most probably the best that can be uncovered as of now due to multiple limitations, but I will like to thank Kaggle and the owner of this dataset for letting me use what I have learnt in school to conduct some analysis on the topic of my interest. (Btw, I can’t seem to remember where I downloaded this .csv file from, but it was quite recent and its meant to be for a competition-so the predictive value here should be rather accurate)
Moving towards a new data set and conducting Principal Components Analysis
With some analysis conducted above, I do have a collated dataset from my previous internship which is on the Data Analytics/AI space of start-ups closing a fund raise exceeding $20 million over the past year. This is coupled with newer predictors such as
- Industry sector, group and code
- Amount raised, number of employees, year founded, location
- Type of first financing (via a VC/Angel/Accelerator etc)
- Last financing size
- Percentile in their growth rate, twitter growth, social media growth, etc
This dataset will be unable to be on par with the analysis above since I do not have an endogenous variable to conduct a logit regression/regression tree/random forest/boosting, which I will be limited to either K-means clustering or principal components analysis. Due to how haphazard the predictors are by nature (and this was just using whatever I had), I decided to stick to using principal component analysis — something I am more familiar with.
After cleaning the dataset and running through the analysis, I have come to find that the predictors can be shrunk into 2 principal components, accounting to almost 100% of the variance. Further, looking into the top loadings in absolute value,
We can see that financing related predictors tend to load heavier in its importance, coupled with the growth rates of their socials, revenue, company size, etc to be of importance too. The amount of active investors (including those that are prominent) are also a good gauge to possibly predict for start-up success too.
However, it is to acknowledge that the indicators may not be entirely useful when used to evaluate a start-up’s success, as this is only a preliminary check on how indicators can be interpreted. Further, the dataset only involves 1 specific industry, so there can be other variations when looking into other industries.
After running through some iterations by using what I have learnt in school this semester along with some economic intuition, it is still hard to conclude any significance in terms of establishing the indicators/attributes that can lead to a start-up’s success. Nonetheless, we can still find great insights where a strong senior team with a vast array of skills(Marketing and Engineering) along with good growth in their revenue/size/social media etc can be factors to look into when conducting due diligence in the future. Further, some indicators are similar to what VCs tend to look at (supported by the published paper) — where the partners tend to look for skilled founders, having the inclination to hop on an investment if the lead investor is a prominent one (like Sequoia, A16z, etc), etc. I found it interesting that a B2C business model tend to not be so successful, which is peculiar where I would assume a B2C model has a larger market size. However, there can be the case where it is more competitive as well.
Moving forward, when looking into start-ups, I may want to pay closer attention to factors that have been mentioned above, in order to confirm if these are actually relevant predictors to gauge a start-up’s success. Also, it will be ideal to create a new and more robust database in order to conduct deeper and significant analysis which can possibly predict a start-up’s potential and cut the time needed for due diligence.
I had a good time writing this paper and discovering these insights for myself, and I hope that I will do the same for whoever that is going to read this eventually!