BEA Infrastructure Investment

Analytics Vidhya
Published in
4 min readAug 15, 2021


This notebook about Measuring Infrastructure in the Bureau of Economic Analytics National Economic Accounts.

Infrastructure provides critical support for economic activity. Hence it contributes in a significant way to our living standards. This notebook will analyze the trends of infrastructure investment through the years (decades) using the measurements of infrastructure data in the U.S. National Economic Accounts(NEAs).

For this notebook, I will use the chain_investment.csv dataset. I think that using the chained dollars will give us better insights about how important the investment was in that year(decade) comparing it with nowadays infrastructure investments.

Why is it important?

This study is a challenge that will provide a huge value in understanding the nature of the surrounding infrastructure and its behavior as a connected multi-network.

A glance to the data

Following the Bureau of Economic Analysis paper, there are three main categories of infrastructure: basic, social and digital.

  • Basic infrastructure: Mainly determined by trends in transportation and power. Water, sewer, and conservation and development (dams, levees, sea walls, and related assets) make up a relatively small share of basic infrastructure.
  • Social infrastructure: Determined by trends in health, education, and public safety. For social infrastructure, the share of privately owned net stock grew over time while the share of state.
  • Digital infrastructure: Communications, software…etc.
First 6 Rows in the chain_investment dataset

R Tidy Tuesday goal

With this project I’m aiming to try new and cool charts that could give us insightful value from the data. In order to do that, I will use:

  • Animated area and line charts for plotting the main categories of investment.
  • A sankey diagram for charting the infrastructure investment breakdown.

Infrastructure Investment in the main categories

The area chart represents the total invested per year in $, colouring each area by category. The area chart represents the total invested per year in $, colouring each area by category. As a highlight, it would be worth it to mention how since the 00s, while digital investments have increased, social investments have slowed down.

Total $ in infrastructure investment (animated area chart)
Total $ in infrastructure investment (comparison between area and line chart)

Infrastructure Investment Breakdown (sankey)

We already know how the money is invested in the three main categories (basic, social and digital) so let’s break it down in order to see how much money is invested in each subcategory.

In order to do that, Sankey charts are a viable option.

Following the Google Charts definition, a sankey diagram is a visualization used to depict a flow from one set of values to another. The things being connected are called nodes and the connections are called links. Sankeys are best used when you want to show a many-to-many mapping between two domains.

Splitting the dataset in different levels will be necessary for generating the Sankey chart (spoiler alert: There are three different levels regarding the investments). In order to do that:

  • Level 1: We will start analyzing from the meta_cat to the category when the category is one of the three main categories: Basic, Social or Digital. That’s because Total infrastructure will be our root node.
  • From Level 2 to Level 3: We will use meta_cat using the categories from the previous level in order to get the new nodes and so on and so forth.
Infrastructure Investment Breakdown (sankey)

The code is included in:

And that’s all from this Tidy Tuesday. Thanks for reading ! If you have any feedback that’d be really welcomed and appreciated.

Contact info



Analytics Vidhya

🎯 Senior Data Scientist at Bravo Studio | 🎮 Ex-FRVR Game Data Scientist | 🤖 MSc in AI & Computer Science