How to: Open Source Energy Data and US Maps in Python

Michael Street
Black Men Code
Published in
4 min readJun 21, 2016
One hundred commercial and industrial buildings in the US and their energy use intensity.

Python’s plotting interfaces have been solid tools in the data science and engineering communities for some time now, but with the advent of more flexible browser integrations some of the most used libraries are starting to show their age.

Nevertheless, there’s an interesting open source data resource of buildings and their energy use hosted by EnerNOC, which is a good way to show off some of the features (and frailties) of matplotlib. I’ve hosted this project on GitHub; feel free to clone the repository (including the data) to follow along.

Setup and Data

Developing can generally boil down to getting your environment properly established. When it comes to Python I find that Anaconda is one of the better environment managers I’ve come across, especially on the Windows platform.

With that in mind there are a number of dependencies needed to get this project off on the right foot:

Pre-processing

There are two high-level goals to hit in order to reproduce the map above: (1) color the building types based on their industry and (2) calculate the energy use intensity of each building. An often overlooked feature in matplotlib is the colormap object, which can easily generate an iterable set of rgb color tuples. In the snippet below I’ve taken a subset of the data into a pandas DataFrame and now create a Python dictionary (i.e., relational database or JSON) that links an industry type to a color. I’ll use this dictionary later for the plotting:

Finally, I need to calculate the energy use intensity (EUI)in kBTU/sf for each building. The energy data is stored at five minute intervals in kilowatt-hours and I’ll account for conversion before plotting. Since I want to plot the EUI values as the size of each building’s circle, I’ll also need to scale the value before passing to the plotting methods:

I set four break points for the scaling at 5, 10, 20 and 40 pt with linear interpolation for sizes in between these. You’ll see that the energy data is stored separately from the floor area data and I’m reading each of the unique sites’ data.

Plotting

The last step now is to plot an underlying map using the Basemap package and to overlay the locations of the buildings with their points colored by industry and points sized by EUI. Basemap provides a number of convenient methods for displaying a world map once some initial parameters are defined:

Next, I’ll add in the buildings one-by-one to the map.

The issue here is that we will need to trick the legend to only display four lines, which I accomplish by directly creating instances of the low-level Line2D object:

Conclusions

And there you have it: a relatively clean map of the U.S. with a glimpse into high-energy consuming industries and their locations. I find static graphs less and less compelling in the age of rapid web development and it can be tempting to bypass a number of the great packages Python has to offer. With that said, we also see some of the age of packages like matplotlib start to show through especially in regards to the API’s overall intuitiveness.

What kinds of visualizations would you make with the data? Leave a comment below and let’s see.

Join our mailing list.

If you like this article, please recommend it to help others find it!

--

--