Image Credit: photographer Martin Reisch

Exploratory Data Analysis of Green Building Market Share using Pandas & Matplotlib

Clair Marie McDade
6 min readJun 27, 2020

--

This exploratory data analysis begins with the question, “what is the market share of green building rating systems for commercial buildings in the US?” This blog shows the process of Exploratory Data Analysis (EDA) using Python, pandas, and Matplotlib to combine data from multiple sources with commonly available information found on google. EDA is the process of viewing and understanding data when you are not quite sure what you want to do with it yet. Pandas is a popular python library with a variety of functions for viewing and manipulating dataframes. Matplotlib provides graphing functions. Import these libraries easily using the following code:

The first step is finding the data. Green building data was obtained from two sources. Information on LEED Buildings was found via Statista, and information on Energy Star Buildings was found in the most recent Energy Information Administration CBECS. Both of these sources included excel files so the pandas function pd.read_excel( ) was used to import the files. The following code shows the import and a few steps to clean up the data.

In addition to the excel files, a few other statistics were found online such as the total number of commercial buildings in the US, 5.6 million (Source: 2012 US Energy Information Administration Commercial Buildings Energy Consumption Survey (CBECS).)

After looking at the pandas dataframe, the number of buildings seemed to be more interesting than the floorspace, because a wide range of values could be seen. This same pattern was seen in the LEED certifications and the Energy Star certifications. However, a quick look at the data shows that the categories of buildings listed vary from one source to another. Here is a snapshot of the building types included in the Energy Star Data.

Here is the python code for the chart above:

Comparing these bar charts is hard because of the categories are not the same. They need to be mapped into a new set of categories that they both have in common. This can be done either by reformatting the excel file or combining rows using pandas. Here is an example of the data cleaning process using pandas functions .drop( ), .rename( ), .astype( ), and indexing using .iloc( ) to target specific cells within the dataframe:

Next, categories like Retail, Supermarket/Grociery and Bank/Financial Institution were combined to find the sum for a new retail category.

After quite a bit of work reorganizing the data, the combined categories were reorganized into a dataframe including only 8 shared categories.

The cleaned data categories was plotted showing the number of buildings in each green building rating system using Matplotlib.

Here is the code:

From this graph it’s clear that LEED is the market leader (pun intended) over Energy Star for all of the building types except K-12 education. But what is the market share of all Green Building Rating Systems? Here’s a quick run down on the different rating systems out there, and how many buildings have been certified by each system, in the US:

Energy Star — 16,960 buildings

LEED — 69,233 buildings according to the Statista data above, totaled. Note that this number is different than what is noted on Google, and should be studied further.)

Living Building Challenge — 380 buildings certified.

BREEAM—550,000 buildings certified worldwide. Data was not available for US buildings only, so BREEAM was later omitted from the analysis.

Enterprise Green Communities — This certification is used for neighborhood developments. Data was not available for commercial buildings. Enterprise counts count only the number of homes, such as the number of units in a multifamily building. Since the other data is counting each multifamily building as one building, this would throw the data way off, so it was not included.

Green Globes — 1603 buildings per the Green Building Initiative.

New Buildings Institute — There are 580 official net zero buildings according to the New Buildings Institute as of 2019. Zero Energy Buildings were not counted because these projects may have other certifications, which would result in double-counting. For example, a LEED building can obtain net zero status.

Passive House —while the certification can be used for commerical buildings, there was no data available.

This leaves us with LEED, Energy Star, Green Globes, Living Building Challenge, and BREEAM. First, a dataframe was made from a dictionary.

Then, a graph was plotted. It was obvious that the number of BREAAM certified buildings was way higher than any of the others. A deeper look showed that BREAAM is counting buildings worldwide and the other rating systems are only counting buildings in the US.

Created with this code:

This is a very interesting finding, because few architects in the US use BREEAM. LEED has the industry spotlight in the US architecture and construction industry, but in relation to BREEAM, it is far behind. Unsurprisingly, Living Building Challenge, the most rigorous system, has the smallest market share. Next, BREAAM was removed and this graph was plotted to show the market share of US green building rating systems only.

Finally, what is the market share of green rated buildings to all commercial buildings in the US? With a total of 5.6 million commercial buildings, (Source: 2012 US Energy Information Administration Commercial Buildings Energy Consumption Survey (CBECS), green-rated buildings take a 34% slice of the pie.

Here is the code:

The result of this preliminary analysis shows a strong movement in the industry towards green building practices. However, in my career as an architect, I’ve seen many projects that were excellent candidates for certification go without it, so 34% is very surprising! Further analysis could be done to verify the accuracy of the results.

Image credit: 贝莉儿 DANIST@danist07

--

--

Clair Marie McDade

Founder of Archneura, Registered Architect, and creator of the Building Quality Index, an application of data science for commercial real estate.