5 Minute EDA: Unicorn Startups of the World (visualizations)

Aya Spencer

Published in

5 Minute EDA

3 min readFeb 21, 2022

Exploratory data analysis of unicorn startups in the world

Source & Method

CB Insights came out with a list of all the startup unicorns around the world. I used this list as my base data to scrape and run analytics using Python, with visualization in plot.ly.

Prepare Data

Let’s import the base data:

from collections import Counter
url = "https://www.cbinsights.com/research-unicorn-companies"
df = pd.read_html(url)[0]

Now let’s check out the first ten rows to see what data we are working with.

df.head(10)

I check for the datatype of my columns and notice that the Valuation ($B) column is an object instead of a float:

df.info()

This is going to be a problem when I run my summary statistics, because I cannot run a summary on a non-numeric field. So I’ll have to remove the “$” in the Valuation ($B) column and change the datatype from object to float.

df['Valuation ($B)'] = df['Valuation ($B)'].str.replace(',', '')
df['Valuation ($B)'] = df['Valuation ($B)'].str.replace('$', '')
df['Valuation ($B)'] = df['Valuation ($B)'].astype(float, errors='raise')

Perfect.

EDA

To keep my EDA very simple and short, I’ll answer the below question using three different visualization methods — a bar graph, a tree map, and a scatter plot:

Which countries have unicorns with the highest total aggregated valuation?

Let’s get started!

Bar graph

fig = px.bar(df, x='Country', y='Valuation ($B)')
fig.show()

If I want to update this graph in the order of descending valuations, I can do this:

fig.update_layout(barmode='stack', xaxis={'categoryorder':'total descending'})

You can see that US, China, UK, India, and Germany are the top 5 countries with highest total aggregated valuations.

Tree Map

fig = px.treemap(df, path=[px.Constant('Valuation ($B)'),'Country'], values='Valuation ($B)',
                   hover_data=['Country'])
fig.show()

Tree maps are useful for when you want to see a quick snapshot of the valuations for each country in comparison to one another, without the focus being placed on the values.

Scatter Plot

If I want to add a date element to this visualization and track the valuations based on the founding date, I can use a scatter plot

fig = go.Figure()all_location = list(df['Country'].unique())for location in all_location:
    fig.add_trace(
        go.Scatter(
            x = df['Date Joined'][df['Country']==location],
            y = df['Valuation ($B)'][df['Country']==location],
            name = location,
            mode = 'lines+markers',
            visible = True))
fig.show()

Adding some bells and whistles (such as a filter button) can further intensify the effects of the plot:

There are so many ways to convey a message through data, and the right type of visualization strategy can be the difference between a captivating story and a boring one. Thanks for reading!

This is part of my 5-minute EDA series, where I run quick exploratory data analysis on an interesting dataset. Thanks for reading!