Self-healing weather prediction

Published in

Cloudrun

4 min readSep 17, 2019

Cloudrun serves a diverse set of customers, from academic researchers and meteorology consultants to energy forecasters and competitive sailors. While some users are technologically savvy, they’re typically not expert modelers and don’t know all the necessary ingredients of a successful weather simulation. As the core product of Cloudrun is custom and on-demand numerical weather prediction, it’s essential that it just works, no matter the configuration the user chooses.

This is much harder than it seems. Models are complex — the most ubiquitous weather model (WRF) is about a million lines of code, and can be set up in millions of possible configurations. A major challenge in operational weather prediction is that models occasionally become numerically unstable. This happens for a number of reasons, like bad data in initial or boundary conditions, strongly forced flow over steep topography, or the model being configured with too long of a time step. If you’re wondering what’s a time step, think of a weather model as a clock that ticks with discrete increments in time. A weather simulation is computed in discrete steps that march forward in time. The longer the time step, the fewer total steps need to be computed and the forecast finishes sooner.

Numerical instabilities cause the model to crash, making the forecast incomplete or fail altogether. Resolving such issues requires an expert weather modeler and can easily take hours of debugging. Here’s an example of how a numerical instability manifests itself in the model fields, in the final step before the model blows up:

Numerical instability in a weather model configured at a 1-km resolution, occurring over the southwest slopes of Mt. Fuji in Japan. Topography is shown in contours and vertical velocity in color.

In this example, a weather model configured at a 1-km resolution blows up shortly after initialization, in an area over the southwest slopes of Mount Fuji in Japan. In this case, it shows up as extreme values (> 50 m/s) of vertical velocity in the lowest layer of the model. Not only is this unrealistic for a 1-km resolution model, but indicates that this forecast is in trouble — a numerical instability like this grows explosively with no chance of going back. If you’re wondering what caused the instability to happen here, it’s because of a custom forecast region being configured with steep terrain (Mt. Fuji) near the model boundary. This caused the winds from the global model, which are imposed at the boundary, to be dynamically imbalanced in presence of a steep and tall mountain that isn’t resolved in global models.

In research meteorology, such issues are indicative of improper model configuration and are handled manually and ad hoc. In an operational setting, numerical instabilities can be mitigated by configuring the model over a fixed region and carefully tuning model parameters to ensure that the model remains stable. For Cloudrun, this is not an option — our system must ensure that your model is properly configured and runs all the way through without hiccups, regardless of the model region or resolution. While Cloudrun automatically resolves many potential issues beforehand, we needed to prevent runaway numeric instabilities at run-time, before they occur.

Configuring a model with region boundaries that cross steep mountains can easily make a weather model blow up unless carefully configured.

To address this problem, we implemented a system to intelligently and automatically reduce the model time step, just enough so that any transient instability becomes merely a speed bump, rather than a road block. As soon as the model is “in the clear”, it will recover its time step so it can complete the forecast as quickly as possible.

Here’s what it looks like in action:

Example of Cloudrun’s time step size over the course of one forecast day, in a model that would otherwise crash due to numerical instability.

This is what we call self-healing weather prediction: In scenarios where a regular model would blow up due to numerical instability, Cloudrun catches this early and heals itself! What used to be a dreaded nuisance for graduate students (myself included) and operational meteorologists, we have automated away and made custom forecasting even more accessible to non-experts. We believe that this and Cloudrun’s other automated features are game changers for businesses that need operational weather prediction that is reliable, timely, and that they own.

Are you curious to learn more about how weather models work inside and out? Write to us at hello@cloudrun.co! We respond to all emails and would love to hear from you.

Self-healing weather prediction

Written by Milan Curcic