Consider the Source: Climate Metrics and Their Underlying Models

Projecting future risk—especially for extreme events—demands careful application of the right global climate models

Meghan Purdy
Jupiter Intelligence
6 min readOct 27, 2021

--

As a product manager, I try to stay abreast of the metrics and techniques used by my fellow public and private “climate translators:” credible experts who interpret climate projections; understand their assumptions, strengths, and weaknesses; and may derive metrics from them. Sometimes I’m impressed with a new way to think about, say, wildfire prevalence…and sometimes I’m worried that what’s being put out there hasn’t been generated with sufficient scientific rigor.

In previous posts I’ve generally discussed a few places where climate analysis can go astray, such as a reliance on old models, inappropriate metrics, or vague scores. Today, I’m diving into the very first step: how should users evaluate a climate translator’s selection of the source climate models that underlie their metrics? What are the pitfalls that could harm metric quality? Unfortunately, a wide range of quality is available, but, assuming a data provider is open about their methods — and they should be — we can learn to identify both red flags and the signs of sound analysis.

General Principles for Model Selection

No matter the metric, climate data users should hunt for metrics generated using a few key principles.

  • Using the models from several modeling groups increases the likelihood that the results represent the broader scientific consensus, as opposed to relying on the view from a single group. And including data derived from independent sources helps maximize the information content that is input into any quantitative analysis.
  • Prior to use, each individual model should be assessed for its skill in simulating the peril of interest in the region of interest. All models have strengths and weaknesses; these can be sussed out by surveying academic papers on the topic and by comparing a model’s simulations of historical climate to the actual prior conditions to identify bias.
  • Uncertainty should be clarified where possible. This may be a qualitative description about different climate projections, based on how they are produced. It may also be a quantitative analysis based on underlying data samples.

Following these principles is a good step towards producing a quality metric about the future, particularly if that metric is a chronic peril. Chronic perils are those that occur frequently, such as drought, thunderstorm, and average temperatures. Because they are common, they are well represented in a future projection. It’s still necessary to examine multiple years of data, such as the targeted projection year ±5 years — this helps tease out the underlying climate change signal from the effects of shorter-term climate oscillations, such as the El Niño–Southern Oscillation (ENSO), that also affect climate projections — but the statistics are relatively manageable.

Chronic perils do have their pitfalls, but, as we’ll see, they present a more straightforward challenge to climate translators than acute perils.

Model Selection for Acute Perils

Acute perils occur at the “tails” of a probability distribution: the extreme winds, floods, and rain events that can cause severe damage. By definition, acute events are rarely seen in climate models. For example, a 100-year flood only has a 1-in-100 (1%) chance of appearing in a climate projection of a particular year at a particular place. The World Meteorological Organization (WMO, 1989, 2007) suggests that at least 30 years of data be used to fit a distribution that can characterize acute metrics; for climate data, that means 30 simulated years.

So: how can climate translators find 30+ data samples from simulations that are relevant to, say, Miami’s wind speeds in 2040? I’ve seen a few techniques:

  • (Valid) Use data from a range of years surrounding the target point. As mentioned earlier, this helps isolate the climate signal from sources of interannual variability like ENSO. However, getting to 30 samples by using the central year ±15 years is too wide of a range: because the climate is changing, metrics for the target year can be skewed by results from distant (warmer) future years, particularly when the focus is on those extreme events. This technique can only get you so far.
  • (Valid) Use models with an ensemble of projections. Ensembles are generated by running the same climate model with slight differences in its initial conditions, such as local temperature and pressure. Each ensemble member follows the same pattern for carbon dioxide concentration and radiative forcing — this is determined by the climate scenario — but the initial perturbations allow for different weather to evolve. Ensembles were rarely found in the CMIP5 generation of models, but they are more prevalent in CMIP6 and in standalone experiments such as CESM-LENS (Community Earth Systems Model — Large Ensemble). It’s statistically sound to draw samples from a climate model’s ensemble members to fit a distribution and thus “fill in” the tail of the curve.
  • (Invalid) Pull data from multiple climate models and treat them as ensembles of one system. Doing so implies that each sample is being drawn from the same distribution, but that is not a statistically sound assumption. Climate models can be vastly different: they can have different spatial resolutions, predict or ignore different aspects of the earth system, have different model physics, and other factors. In short, each model has its own climate (and no model’s climate is the same as the real climate). Statistical methods that require the assumption of identically distributed data — a common assumption in many statistical theories — are invalidated, and we cannot expect trustworthy results from using this technique.

At Jupiter, we use a combination of the first two techniques: 11 simulated years and only those global climate models (GCMs) with at least three ensemble members. For each GCM, the 33+ years of simulated data that this process generates is enough to define a distribution and calculate that GCM’s view of an extreme peril, after which it can be combined with the perspectives of other GCMs. We believe that this strikes the right balance: require too many ensembles and the pool of acceptable GCMs is too small, require too few ensembles and you have to sample from a larger set of years and potentially skew your results. While these techniques limit the GCMs that we can include for our extreme perils, it still leaves us with plenty of GCMs to work with and results in a more responsible understanding of tail risk.

Implications for Climate Data Users

Several months back, I wrote about the importance of commencing any physical climate risk analysis with the “right” data, stressing the value of data appropriateness: selecting the right metric for the job. For example, if you want to understand the impact of future hail risk to your solar farm, you should find a way to measure future hail risk directly, instead of using a proxy like atmospheric instability.

Occasionally a climate translator will sacrifice metric appropriateness because their methods don’t allow them to produce high quality metrics describing the tail of the curve. For example, they’ll model “extreme wind events” by calculating the maximum wind gust at the 1-year return period: the hardest the wind will blow during a year. While such a metric may not require a complex model ensemble approach, the metric itself isn’t appropriate for understanding a peril like a tropical cyclone or European winter storms, which don’t strike the same location every year. Instead, it is critical to look further out on the tail — to the 50-year, 100-year, and more extreme frequencies. And that can only be done responsibly with techniques like those described here.

Worse still, an unqualified climate translator won’t employ these techniques, but they will produce an extreme tail metric anyway. And they might get away with it — for now. These are tricky concepts to grasp, and it’s easier for users to assume that the data is credible than to dig in and possibly discover it to be highly suspect. But as climate data moves into the mainstream and users become more comfortable with these concepts, they will be demanding more from their data providers. And we should be ready to step up.

Meghan Purdy is a Senior Product Manager at Jupiter Intelligence. Learn more about Jupiter at jupiterintel.com.

--

--