EDA on US VC Cap Raises YTD (I)

EDA on Venture Capital Data from Crunchbase (Part I)

Jerry Efremides
Analytics Vidhya
4 min readNov 16, 2020

--

This is the first part of a series of posts meant to illustrate the process of EDA in Python through the use of a Jupyter notebook.

EDA, or exploratory data analysis is “an approach to analyzing datasets to summarize their main characteristics..” according to Wikipedia.

Investing in private companies on the cutting edge of technology can be very exciting. For better or for worse, however, the general public is mostly not privy to investing in the private markets and must wait for companies to list publicly before investing. That said, we have picked venture capital as a use case for our EDA series of blog posts.

The following YTD data was retrieved from Crunchbase (Pro) and is current as of October 15th, 2020.

Please be advised that the data used here is intended to illustrate workflow rather than a fully accurate view of capital raising in the venture space. At times we omit null values or other values that have not been thoroughly vetted. However, an intent to further vet these values and dig deeper into this analysis does exist, and updates will be made as such on an ongoing basis.

For information on how Crunchbase aggregates their data visit:
https://news.crunchbase.com/methodology/.

SpaceX which did two private capital raises this year, launches a rocket into space. Image Courtesy of https://unsplash.com/s/photos/rocket-launch

Analysis

We start with the standard library imports and then import the dataset.

Taking a quick look with .shape at the data reveals the number of rows and columns and .head lists the features we have imported. By adding .T we transpose the columns into rows and the rows into columns.

Having decided to only include capital raises done in USD, we set the dataframe equal to line items restricted to USD by using .iloc to choose the rows, and setting the column money_raised_currency == ‘USD’

To drop columns that are not necessary for this particular EDA, we use .drop and set columns = the list of features we want to drop.

Listing our features with df.columns on line 8 makes it easy to copy and paste the features we desire to drop.

Cleaning the organization_location field will be a challenge as we notice the formatting includes City, State, Country, and Continent in one field.

As previously mentioned, we will be dropping null values for the purposes of this EDA, and set the organization_location and organization_industries fields to only include non-null values with .notna.

We set the dataframe equal to fields only containing “United States” in the ogranization_location field, and then rename our columns in the order we intend to split (delimit) the column field in. The number of columns must match the number of fields that will result after using .str.split and choosing “,” as the delimiter.

We engineer a “month” column using .dt.month on the announce_date field so that we can inspect totals by month.

Using .head(3) we check three examples from the dataframe.

So now we have a data set of American domiciled companies where the total amount raised and industry in the round was available. We will seek to answer a few of the following questions:

  • In what state domicile were most VC deals done
  • How much total capital was raised by state domicile
  • Break down of capital raised by stage
  • Some company highlights

We use the .groupby and .sum methods to aggregate the value of money raised by funding_type. Using .count, we show the number of raises done. With .sort_values(ascending = False) we sort the information by largest to smallest value. We notice that the series of many capital raises are unknown. This is one of the challenges when working with venture capital versus public company data.

It’s no secret that technology investing is hot, and with $55 billion in funding, it may come as no surprise that most of the venture capital raises done are domiciled in California, the home of Silicon Valley and the birthplace of many of the newest cutting edge technology companies.

Some of the private capital raising highlights include SpaceX’s two raises and Robinhood’s $1.3 Billion raised to date. Don’t be surprised to see SpaceX raise even more money after yesterday’s successful launch.

This is the end of Part I of EDA on venture capital raised to date (10.15.20).

Stay tuned for Part II, where we will use global data to drill further into specific industries and explore topics such as merging dataframes, formatting currency values, and creating various visualizations.

--

--