With the advent of rich startup databases such as CrunchBase or AngelList, it has never been easier to gather venture capital funding data. Add on top of that the powerful APIs from social networks such as Twitter or LinkedIn and suddenly the amount of data that’s accessible to the common user becomes overwhelming. How do we make sense of it? Companies like Mattermark and DataFox have done tremendous work distilling insights from numerous databases like those above. Here I attempt to complement their efforts with some data-driven lessons.
Below I gathered five holistic insights from CrunchBase, which I find myself accessing more and more for general funding information. I strived to keep these of interest to the general public, hence they might seem obvious to folks working in tech or venture. In numbers, I gathered data on more than 220,000 companies, and shrunk the universe to only include US-based companies and funding rounds from 2010 to date. The cleaned data set contains about 25,000 venture funding rounds (more info on process at the end).
5 LESSONS FROM CRUNCHBASE
2014 is off to a strong start. While funding levels have hovered around $40-50Bn per year in the four years prior, venture capital deployments have picked up considerably this year. In Q1 alone $17Bn has been invested, 75% more than the same quarter a year prior and 37% of the entire 2013. This is of course correlated with the strong IPO market in recent months, as several major tech companies have raised capital on public markets or are planning to, such as Box or Varonis.
Keep in mind that the data above contains both primary and secondary financings, so part of the proceeds might have gone to compensate founders or early investors rather than be standalone cash infusions into the business.
The average size of venture rounds is on the rise. A clear trend at all stages of venture capital fundraising is the increase of the average size of checks. Anecdotally, there have been articles stating that seed rounds are the new Series A rounds or that the Series A is the new Series B, but here is some data backing up the above. In 2014, the average Series A round was $6.9M, 6% higher than 2013, the average Series B round was $14.7M, 20% higher than 2013, the average Series C round was $27.3M, 31% higher than 2013, and the average Series D round was $50M, more than 100% higher than 2013. The noticeable increase in size for late stage rounds is well-documented and points to the deep pipeline of tech companies that are delaying IPO plans and choosing to stay private for longer.
12-18 months is a good rule of thumb as timeline to seek follow-on funding. At each stage of the fundraising process, the average waiting time was more than a year until the subsequent round. While these dates are announcement rather than actual closing dates, they do provide decent breathing time for founders in-between fundraising pushes. This should be a good relief from the “always be raising” mantra and allow entrepreneurs to focus on the day-to-day operations of their companies.
As a guideline for how much to raise in subsequent funding rounds, the averages point to a 3X multiple of the Series A for series B, 2.2X of the Series B for Series C, and a 1.9X of the Series C for Series D. Of course, the amount raised will be primarily determined by cash-flow and growth needs (and availability of capital!), but the above should help in projecting long-term paid-in-capital evolution.
Capital flows towards more diverse locations than you might think. As much as Silicon Valley is at the nexus of the tech industry, venture capital is flowing to many parts of the US, beyond New York and Boston to up-and-coming entrepreneurship hubs such as Boulder or Austin. Of the $21.3B capital raised so far this year, 44% went to California, 11% to New York State, 10% to Massachusetts, 5% to Texas and 30% to other areas of the US. As capital becomes more mobile over time, I can only imagine areas outside of the major tech hubs in the US to increase in importance.
Enterprise software, hardware, analytics, security and education have been the darlings of the VC industry in recent years. The major sectors where capital has been flowing have shifted over time. Cleantech is still yet to recover from an investment bubble in supply-side technologies, and ad tech is also on a downhill path. Sectors such as consumer mobile, social and gaming have also seen a slowdown in investment. On the strong side, companies that sell to enterprises have received significant funding, increasing year over year, and 2014 seems to continue that trend. These startups build products in areas such as vertical SaaS, security or analytics. Hardware investment is also on the rise, and it would be an interesting analysis to see to what extent crowdfunding and rising consumer demand are impacting this trend. Furthermore education technology and fintech innovations are addressing two huge verticals that have yet to taste technological developments experienced by many Fortune 500 companies, and consequently are receiving lots of investor interest. Last but not least, biotech is still a major area of investment in the US and it would be interesting to see how correlated it is with the overall tech cycle.
Next? There is still plenty of room for deeper analysis from CrunchBase alone, and insights would only get more powerful once you triangulate this with other databases. Next on my list is to look at international funding patterns and see what lessons we can draw from there.
At the end I will give a broad overview of the process and software packages I used, in case other folks might be interested in running similar studies.
Data pull. I used Python 2.7 for the data pull, and the urllib2, simplejson and re libraries proved particularly helpful, for opening URLs, decoding JSON objects and handling regular expressions respectively.
Data source. With Python I connected to the CrunchBase API, which has very good documentation and as of now, no API call limits. Keep in mind that with the recent release of the long awaited CrunchBase 2.0, the old API will be discontinued, as the team is building an entire graph database from scratch. Documentation on the new functionality should be published in the next months. Pulling the entire data set took me 3 days in CPU time with a single-threaded non-parallelized program, curious to hear suggestions on how to reduce this.
Data exploration. Before I do any analysis with data I like to discover its structure and areas where one could find value, and a database with easy queries is the best environment for that. Given the intuitive JSON format of CrunchBase data, CouchDB seemed the no-brainer choice.
Data wrangling. To prepare the raw data for research, I used the good old R 64-bit 2.15. Call me a creature of habit, but I have yet to find a more efficient tool to clean up data reliably and perform sanity checks. When running under UNIX it is also surprisingly fast with large data sets, despite being a high-level functional language. There are signs that the 20-year old status quo might soon change, Trifacta for example is working on an innovative UX-friendly solution to prepare data for analysis.
Data insights. For basic charting I used Excel and for more complex but seamless drag-and-drop charting I used Tableau 64-bit 8.1. Very powerful for rich visualizations, provided the data is cleaned thoroughly beforehand. If you’re a student or academic, you should take advantage of their 1-year free-trial.