What Does The Data-Driven Utility Look Like?

Colin Davy
State of Analytics
Published in
6 min readMay 9, 2016

I moved into the utility space after several years in the field as an energy engineer. The most rewarding part of having been in both worlds is trying to do everything I was doing before, except for 5 million customers at once. Ten years ago, that would have been unthinkable, but thanks to some timely investments in data capabilities, a single analyst can get more information than they might know what to do with.

It has been really great seeing utilities have such a high level of interest in building out their data warehousing capabilities over the last couple of years. The level of commitment to doing the smart stuff has never been higher: collect and store all of our smart meter data, de-silo all of our databases, and put some cloud services and analytics on top of all of it. By now, most places have a good idea of what they should be doing, but the biggest hurdle I’ve seen to committing to boosting their data capabilities has been a shortage of concrete use cases for all of these upgrades. I understand the hesitation: new data warehouses aren’t cheap, and if department heads are going to request funding to pay for these upgrades, they need to have some expectation of what their ROI will be.

Utility analytics to date hasn’t produced a “killer app” to date that promises anything like a 3–5 year simple payback, mostly because so many of the analytics services are still in pilots and have yet to prove themselves on a larger scale. And if there are services that offer concrete savings of that level, they’re usually implemented by third-party providers, which not only have licensing fees, but also get to keep the institutional knowledge to themselves. Shelling out for a data warehouse so you can pay even more money for high-risk analytics services isn’t exactly a great sales pitch.

Fortunately, there are enough utilities committed to staying on the cutting edge that have taken the plunge and modernized their data capabilities, confident that the promised benefits will eventually materialize and the investment will be justified. I have been fortunate enough to work at one such utility, and as a data scientist with an energy engineering background, I’m now only limited by my imagination. I have everything I could possibly want to know about over 5 million customers in a single location: multiple years of sub-hourly energy usage, demographic information, historical weather, and electric rates. I can design tools that leverage every ounce of all that information to solve all kinds of problems. After having been through a couple of these projects from start to finish, I have a much clearer understanding of the value of great data warehousing and governance. Here are the biggest takeaways I’ve seen firsthand:

The analytics doesn’t have to be rocket science.

There are two basic decisions on how to utilize the data: who will analyze it (in-house vs. outsourced) and how will they analyze it (state-of-the-art analytics vs. basic metrics). Most of the hype is around state-of-the-art analytics, and since most utilities don’t have an in-house data science team to build and maintain their own analytics, they have to outsource most of their state-of-the-art stuff, which leads to the high-risk barriers outlined above. As someone who loves to do the state-of-the-art stuff, I get why most people spring for the latest and greatest in machine learning, but in terms of practicality, a good analyst with basic metrics can deliver just as good a return, if not better, than the most sophisticated algorithm available.

One of the more satisfying projects I completed was developing a tool to help the utility identify potential leads for which of their customers might be illegally stealing natural gas. The end result was a Tableau dashboard that utilized everything I had access to: historical bills, daily gas usage, demographic information, and weather data. The dashboard used metrics from all of those sources to help the user pick and choose which customers had the highest chances of stealing gas before resources were dispatched to investigate. Sure, I was happy that the dashboard worked great, but what made me really happy was that it was powered almost entirely by a single SQL query. No machine learning required, just a single query that does a lot of aggregation and filtering under the hood of the dashboard, all possible because the utility’s data was all located in a single warehouse. It has a host of other benefits too:

· Limiting the analytics to query-only makes it much easier for the utility to own and maintain the tool.

· Adding custom features to the dashboard based on the utility’s domain-specific knowledge kept them engaged and invested in the project’s outcome, as they got to incorporate their expertise more easily.

· Having all of the data in a single location cut the development time of the tool by at least half, allowing rapid iterations and changes.

And if the utility decides they want to take it to the next level and sprinkle in some more advanced analytics? That’s not mutually exclusive with the infrastructure, it will be additive. Having a starter dashboard that’s built from the basics and gets the low hanging fruit really demonstrates the value of great data management and whets their appetite for those next-level projects.

Answers to your questions become much faster.

Utilities are good at understanding the long-term problems they face and know the questions they need to ask, but they’re often limited in what questions they get to answer due to how much time and money it will take. If they want a data-driven answer for things like the effect of solar growth on their system load or understanding average load profiles of different customer types, that usually involves turning those questions into entire projects. A lot of time on those projects is spent just understanding and getting the data and figuring out how to process it. With a great data warehouse, those questions can often be answered in a matter of days or hours instead of the weeks it normally takes.

I did a one-off study on determining 24-hour load profiles for the utility’s small business customers, where they were interested in seeing if there were usage patterns based on their classification of what type of business they were. Normally getting the interval data alone would be a multi-week process, much less aggregating it and running it through a clustering process to get the desired result. I was able to get them an answer in two days thanks to the work they put into their data infrastructure. Processing and aggregating over 10 million rows of data is not a problem when it’s centrally stored and has good data management principles behind it.

When you get to check off a lot of boxes on a utility’s long list of what-ifs, the relief is palpable. Their long-term planning has a lot more clarity, since they get to pick and choose what to investigate in greater detail. They can go to their public utility commissions with much clearer answers and articulate their rate cases with concrete answers.

Good data is the best way to cultivate a culture of data science inside a utility.

For every data scientist that is happy to jump in at a utility and start developing models, there are probably three more employees already at the utility that have an interest in learning it themselves. Getting started in data science is always tough- tutorials will only take you so far- and the best way is to start with data sets you already know. Utility employees have lots of domain-level expertise, and that’s valuable when picking a data set to getting started in data science. (Mine was sports.) Once a utility can provide its employees great data sets that can measure and record subject areas they know something about, it’s a natural progression to start asking questions of the data. Before you know it, they’re learning how to build their own models and collaborating with each other. The interest level of utilities’ employees in getting more out of data has always been there, it’s just a matter of getting them the right data.

The developments in data-driven solutions for utilities has been amazing to watch even in just the last five years, and it’ll only get better from here. The limiting factor will probably be utilities’ commitment to investing in their data infrastructure, and it’s tough to do without a clear vision of getting a return on that investment. Hopefully, as more concrete examples and case studies come out of places that do, the easier it will be to get that clear vision.

--

--

Colin Davy
State of Analytics

Colin is a consulting data scientist in San Francisco, a two-time winner of the Sloan Sports Analytics Conference Hackathon, and a Jeopardy champion.