Machine Learning + Energy Efficiency + Maps: Need I say more? (Part I)

Published in

Building Data Needs Love Too

7 min readJan 9, 2020

This is the first in a four-part series exploring ways that machine learning can help drive insights from public data on building energy use.

All my non-engineers out there: raise your hands if you know what ASHRAE is? I have a friend who thought it was a medical term for someone with ashy knees and knuckles (where’s a lotion emoji when you need one!), so if you didn’t go down that route, you’re already winning.

ASHRAE stands for the American Society of Heating, Refrigeration, and Air Conditioning Engineers and it is the professional body that sets all the standards related to energy efficiency. I and a great data scientist friend of mine (who runs an awesome real estate data science blog) recently participated in a data science competition ASHRAE has held before but decided to hold again. The goal of the competition: develop better algorithms for forecasting a building’s energy use. Any energy efficient retrofit underwriting makes an assumption about what your building’s energy use would be without the retrofit (e.g., the baseline) to calculate your potential savings — so getting this forecast relatively accurate is, well, kind of a big deal.

It was my first data science Kaggle competition, so my coding results quickly fizzled once the seasoned engineering professors and data scientists around the world competing flexed their data science muscles, but one of the best features of the competition was all the great data it provided.

Take for example this (summarized) excerpt below from one article highlighting which types of machine-learning models tended to perform best in the prior competition:

Neural networks again provide the most accurate model of a building’s energy use. However, the accuracy of the model depends on the assumptions the contestants made about the training data sets and how skilled they were in configuring their model. Surprisingly, cleverly assembled statistical models appear to be as accurate, or in some cases, more accurate than some of the neural network submissions.

What’s amazing about these findings is that they aren’t actually new: that excerpt is from an article written back in 1998 summarizing the second ASHRAE competition! In general, machine learning and data science applications have been around for many decades. What hasn’t been around until recently is the abundantly affordable and accessible computing power needed to perform them.

Beyond highlighting the fact that machine learning has more gray hairs than people think, another simple way to make data science more accessible is to apply the close cousin of the “K.I.S.S.(a.k.a Keep It SImple, Stupid!)” principle, which is called the “K.I.R.S.” principle: “Keep It Relevant, Stupid!” Most people love trees and random forests, but unless they’re the kind that are a) ones you planted, b) giving shade, or c) magical and full of fairies, most people don’t have an innate desire to understand how they work.

For real estate asset managers, the K.I.R.S. question here is simple: “Can you create a better mousetrap for identifying markets with strong value-add acquisition opportunities?

The default screening approach for most asset managers asking this question is to whip out a massive excel “market ranking” sheet that consists of 30 columns of demographic data for all of the major metropolitan areas in the U.S. with a standardized sum of scores across these 30 metrics. The next step is to either a) debate the rankings using professional judgement to qualitatively adjust them or b) adjust the weightings used to calculate that sum of those 30 metrics so that areas that your team already invests in show well.

For all the hate Excel gets as a tool, if this is the standard for market ranking, then Excel is a more-than-sufficient tool. However, if you want to actually perform repeatable analytics at the asset-level to guide your strategy…well, that might be a bridge too far for Excel.

So, are there any tools in the data science toolkit that can help an acquisition and asset manager do their job better? It’d be pretty silly of me to be writing this article if I didn’t think so! First, though, we need a bit of help from some publicly (and therefore free) available data specifically related to energy efficiency that has come to the fore in the past few years.

Since 2015, the City of Atlanta has required public and private buildings as small as 25,000 SF in size to report their energy and water use annually. This data is a great source of asset-level data as it has detailed owner-submitted info on building characteristics (e.g., age, square footage, energy star score). Like most public data, when used in the right way, it allows for an unprecedented level of detail and transparency on the built environment in Atlanta.

I find that maps are some of the best ways to communicate insights to technical and non-technical minded audiences alike, so I decided to use some of this data to see if it could help us answer the K.I.R.S. question posed earlier: Can data science help asset managers better identify value-add acquisition opportunities?

Like any data source (particularly public ones), it does have its shortcomings. To name a few:

Of the 2,380 buildings with Atlanta building IDs subject to the ordinance, 419 have downloadable 2018 energy data. Of those 419, only 198 (8% of the total) have Energy Star score data.
Although the Atlanta building efficiency website has lots of visualization tools, downloading all of the raw data file can be tricky.
Once downloaded, there’s cleanup involved before you can achieve the same type of visuals on your own.
For all its useful charts, there is one critical visualization that is noticeably missing: a map!

While these are some shortcomings, they aren’t deal killers by any means and are well worth living with given the breadth of the data. As far as maps go,while it is technically possible to map the addresses in excel, most solutions that let you process 2,380 addresses involve a) having some Visual Basic coding ability; b) getting clean latitude/longitude data for each address; and 3) getting what’s called “an API key” from a mapping provider like Google maps. If you are going through all that trouble, you’re already in the realm of programmer!

Using Python makes this process easier, though. While you still need to have someone with basic Python skills onboard or retained, you don’t need a whole analytics team or subscription provider to at least make it this far. To make it easier to get started, here’s a link to a Python notebook I wrote up that allows you to clean up the raw data, geocode the addresses, and export the latitudes and longitudes for use in other mapping applications.

Using these tools, I was able to create that 3D map (using Kepler.gl) that I couldn’t get directly from the Atlanta efficiency site. The maps below are shown in dark to provide a bit more contrast.

3D Map of 2018 Building Energy Efficiency Database for the City of Atlanta (Perspective is from SouthWest looking to the NorthEast)

Here’s the same map, but only showing buildings that have 2018 Energy Star scoring data:

Where did all the dots go? It’s pretty clear that although buildings subject to the ordinance are widely dispersed across the metro, most of the ones with actual Energy Star score data are concentrated in the Downtown and Midtown cores.

Looking at the data by property type, the lighter map below shows that office (dark green dots), retail (light green dots) and municipal buildings (purple dots) are a large part of the database.

3D Map of 2018 Building Energy Efficiency Database for the City of Atlanta (Grouped by Property Type)

However, when you look again at only those buildings with actual Energy Star score data, it becomes apparent that offices and hotels are the main ones with actual data. See the map below of just those buildings with Energy Star Scores:

In other words, if you want to gain insights about non-infill, non-office properties from this data, we may need to drill a bit further and see if there are other methods we can use to extrapolate some findings.

Maps are cool and offer some of the best “bang for the buck” in terms of insights, but we need a lot more than just a map to help see how this data can be useful to investors. In the next part of this series, I’ll do just this and take a look at “unsupervised learning models,” which is just a fancy way of saying tools that a) group buildings based on how similar they are to one another across what seem like unrelated characteristics and/or b) get rid of variables from your models that don’t really contribute much explanatory value.

In the spirit of K.I.R.S, these methods may help asset managers answer at least two questions:

Is there an 80–20 rule in terms of characteristics I can use for identifying buildings with poor energy efficiency profiles and, therefore, high value-add potential through energy-efficient upgrades?
Can I use this rule to potentially identify energy inefficient buildings that don’t have the same degree of publicly available data?

In the meantime, here’s to happy mapping and to making the old new again!

Machine Learning + Energy Efficiency + Maps: Need I say more? (Part I)

Written by David Anderson