Shutting Down California — The Billion Dollar Prediction Problem

Geek Culture
Published in
10 min readSep 30, 2022


In 2018 the Camp Fire left 85 people dead and caused untold destruction. Victims groups would later estimate the liability of Pacific Gas and Electric (PG&E) at $54 billion. It was a terrible tragedy coming only a year after the Wine Country Fires, in which 22 had perished.

The Camp Fire, which began in 2018 on Camp Creek Road in Butte County, CA

Poor maintenance was the culprit. Attention would focus on a transmission line that had previously been damaged in December 2012. Flames were reported under the same wires in 2018, around 15 minutes after damage to the line was detected. The smoking gun was, in this case, more of a smoking steel hook. Its job had been to hold up the high voltage transmission lines.

Public Safety Power Shutoffs

The exact cause of future fires in California is unknowable. But as the Bob Dylan song reminds us, at least part of the answer is blowing in the wind. Trees are too close to power lines, and when they come in contact, tragedy can ensue.

In theory, trimming trees, burying lines, and replacing fatigued parts could greatly reduce the probability of catastrophe. But this has been known for a decade or more (slow progress was reported by the San Francisco Chronicle in 2019). In the meantime, Californians live with a necessarily evil called Public Safety Power Shutoffs (PSPS).

PG&E Potential Outage Map

A PSPS is explained on the PG&E information page thus

High winds can bring tree branches and debris into contact with energized lines, damage equipment and ignite a wildfire. As a result, we may need to turn off power during severe weather to help prevent wildfires. This is called a Public Safety Power Shutoff.

According to a survey by Gabrielle Wong-Parodi (paper), these voluntary restrictions of power have public support. Intended to save lives and protect property, they are not without their own risks, however. In El Dorado County one resident, Robert Mardis, could not switch to his backup oxygen system. His death was reported by Fox News.

The California Public Utilities Commission (CPUC) has allowed shutoffs since 2012. The program is now open to all suppliers and the cost and frequency of these disruptions are mounting. There was only one restriction of electricity supply in 2013. In 2019 there were 19. In 2020, a total of 20 such events.

There are rules established to provide customers with various types of warnings, culminating in urgent SMS text messages and phone calls which may arrive an hour before the power is turned off. The power is resumed a day or two later, typically, but in the worst case, an outage lasted a week.

The suggestive symbology of the PG&E timeline for Public Safety Power Shutoffs (PSPS)

The Devil’s Wind

Things are only going to get worse. Patrick Murphy considers the primary factors driving increased fire risk in a recent article for PSE, a policy think tank. These include gradual increases in reported maximum temperatures and also a clearly discernable trend in drought severity over the last century.

But equally important are the so-called Diablo winds — hot, dry gusts from the northeast hitting the Bay Area and beyond. As air passes over the coastal ranges and falls, the pressure rises and it is heated by up to 20 degrees Fahrenheit.

When these winds reach speeds of 40 miles per hour it creates perilous conditions — not only increasing the chance of ignition but hampering subsequent defense. As Murphy notes, Diablo winds show an alarming trend even in the last few decades alone (the y-axis is hours per year).

Reported Diablo Winds by Year. Units are Hours per Year.

The California Distribution Function

Murphy also reports the cumulative distribution function of outage durations by year and provider. For example, in 2020:

The cumulative distribution function for Californian public safety power shutoff duration (hours)

There is calculator called the Interruption Cost Estimate (ICE) developed by Lawrence Berkeley National Laboratory and Nexant that provides an estimate of the economic cost of any interruption (you can key in the number of commercial and non-commercial users impacted). Those numbers are justified in a meta-analysis updated in 2015 by Sullivan, Schellenberg, and Blundell (paper).

Those authors provide the following calculus, and also more detailed breakdowns conditioned on the season and time of day, and the impacted industry. Their numbers are based on surveys of over 3,000 residential and SME consumers, and more in-depth interviews with 100 larger business customers.

Sullivan, Schellenberg, and Blundell interruption cost estimates by kWh

The authors find that the cost is roughly linear in the duration of the interruption. And the numbers add up. In 2019 alone the economic cost of turning out the lights might have been in the range of $2 to 3 billion dollars.

Community Distribution Functions

Evidently, a crucial real-time decision must be made balancing both sides of the ledger: should the power be cut off, or not? Due to the complexity of the decision, the most highly stylized utilitarian quadrature that springs to mind may be inadequate.

But it stands to reason that any approach will benefit from more accurate short-horizon distributional estimates of wind speeds — not to mention other measured quantities that inform wind or the chance of fire directly. An example of a one-hour ahead cumulative distribution function for wind speed is shown below. The live version can be viewed here.

Example of a one-hour ahead wind-speed cumulative distribution function (CDF). It is the result of fierce competition between prediction algorithms whose authors are free to use whatever exogenous data they can find. Perhaps these should be called “Community Distribution Functions”, “Competitive Distribution Functions”, or “Collective Distribution Functions”.

What you see isn’t terribly easy to beat. Before claiming the contrary you should see if you can improve it by running your own algorithm and seeing where you end up on the leaderboard. (A few lines of Python may suffice to put your favorite approach in the fray — see the docs ).

Even if you do succeed in winning more points than you lose in this game, remember that your own model alone will not beat every data scientist in the world, and certainly won’t forever. Even if you are somehow “the best” the combination of your input with others will be the superior probabilistic estimate, and that’s what can be drawn out of these CDFs.

Those wind-speed predictions were initiated some time ago when I decided to spend a couple of hours writing a script to send numbers published by NOAA to the high-velocity prediction market that creates these CDFs (that, in full disclosure, I wrote and maintain).

There is nothing mysterious about those CDFs — let’s call them Californian Distribution Functions when they apply to Diablo wind speeds. The corresponding density functions are simply the aggregation of all guesses of a future measured quantity. All the numbers that were sent into the API by all participating algorithms, that is, at least one hour ago.

By the way, one isn’t restricted to land-based measurements and the example above is actually bobbing in the Gulf of Maine — the buoy has its own webpage here. It is trivial to publish a wind speed or direction number from California, or the height of water somewhere on Earth, or the number of visitors to your restaurant, or any instrumented process whatsoever.

The target for predictions could alternatively be a function of multiple public and private numbers — anything you care to devise. One could send to the general purpose prediction API the difference between a reported wind speed and a previously made prediction from an in-house data science team, say a team using advanced meteorological models. By publishing model errors the “continuous space lottery” (it can be called many things) can instantly serve as a model review function, but also improve the model itself.

What Needs to Be Done? (Not Much)

These days we are accustomed to hearing endless buzzwords, and we receive a torrent of advice about the alleged best practices for deploying predictive analytics in the field — usually a very expensive project with many stages and people. There is data gathering, cleaning, offline model estimation, detection of model drift, sundry side-pipelines for anomaly detection, and so on.

Don’t bother.

The mechanics of the open-source microprediction information market make a mockery of this cost model sooner or later. “California” — to refer to interested parties in both private and public roles — doesn’t need to do any of that. Estimating the chance that wind speed will exceed 30 miles per hour at a named measurement station requires only the following steps.

  1. Start publishing relevant quantities (wind speeds)

Why it works, sooner or later

That’s right there is only one step, and it is potentially accomplished in minutes. If there is a Step 2, it might constitute publicizing the existence of the live streams, and encouraging civic-minded statisticians and machine learning enthusiasts to enter the fray and run prediction algorithms.

Even if they don’t, there are already hundreds of time-series algorithms already watching via API. They will sooner or later discover the new Californian streams (especially if prizemoney is offered, Step 1.5), and start predicting whatever is sent. Improvement in the quality of distributional prediction of wind speed could be three lines of Python away.

Three lines of Python that make the traditional “data science project” see odious.

With such a small barrier to entry, there’s little to prevent state-of-the-art prediction algorithms from making their way to the streams with just a little help from data scientists or engineers. In particular, and to make a more technical comment, any one of a hundred algorithms currently benchmarked offline and assigned Elo ratings can be inserted into a live algorithm merely by subclassing StreamSkater, as explained here, then changing one line of code, and running it.

You can visualize all these algorithms as little pellet guns firing probability atoms at the distribution of a future event. If the “Q” measure represents the current market distribution and some very clever person knows the truth “P”, then it can be shown that when they fire “P” at the distribution they will be rewarded over time by an amount that is proportional to the Kullback-Leibler distance between “P” and “Q”.

Points gained are, on average, proportional to KL-distance between true (P) and market (Q) probability if one is somehow able to identify P and then submit it to the live market on a regular basis.

Naturally, that isn’t the only way to win. Algorithms can also try to fill in the missing mass, if they perceive some, as they can in any market. They supply more Monte Carlo samples where P > Q, thus helping the latter converge to the former.

Could Utility Prediction Ignite a Prediction Utility?

In my book Microprediction: Building An Open AI Network, I define a microprediction oracle as follows:

I also make the argument that APIs such as this one might meet this definition within a cost factor of two. That (somewhat loose) logic aside, the case that California could benefit from this particular oracle can also rest on the vast literature related to markets. The oracle is a very lightweight market mechanism.

The more interesting question is whether “deep competition” can emerge, defined as multiple layers of competitive prediction initiated when one competing algorithm uses the oracle for a sub-task. Furthermore, it is likely that in the future this need not be the only way to achieve a fanout of tasks akin to the supply chain for a complex consumer good. Nothing prevents others from creating their own competitive prediction oracles.

Something as important as California wind speed prediction might be exactly what is needed to spur this development — and with it the benefits of specialization, exogenous data search, and recurrent use. To see the end state of this more advanced possibility, you need to imagine a network of “micromanagers” of models that combine the outputs of their peers in serving an upstream master and compensate them accordingly. They will have plenty of economical statistics, and statistical economics, to think about.

“Deep” real-time statistical contests and some theoretical issues they throw up

My book considers the cost of “AI” through this type of lens. The application to “public safety probabilities” in California is very clean. But more private prediction inside secretive firms is also possible thanks to advances in things like secure multiparty computation. Eventually, it is prophesized by yours truly, real-time operational challenges such as those faced by California will be addressed in a truly collective way. They will plug into a “prediction web”.

However, for this to happen, organizations and businesses need to see through the veneer of “AI” — probably the least useful phrase invented in recent history — and recognize that most things going by this description are either “bundled microprediction” or a poor use of data-hungry methods.

By unbundling rapidly repeated quantitative tasks from other business logic we can drive cost toward zero, due to the benefits of sharing in a real-time feature space used by all, and due to the negligible cost of algorithms reusing themselves.

In summary, the challenges faced by Californian utilities might be the perfect catalyst for exactly the type of prediction utility the entire economy needs.