It shouldn’t take hundreds of years to estimate climate sensitivity (according to the models)

Summary:

1) Climate models with different sensitivities reproduce the historical temperature record just as well (or badly)

2) An interpretation could be the historical temperature record cannot be used to estimate or constraint climate sensitivity. This would imply that either the historical record is too short, or the forcing it involves is too small.

3) This interpretation would be wrong: the time period and forcing involved in the historical record are long/big enough for the climate models’ sensitivity to emerge clearly. In other words, climate models should not reproduce the same temperature changes across the historical record; high-sensitivity models should show more warming than low-sensitivity ones.

4) What climate models actually suggest is that true long-term climate sensitivity will be very similar to the sensitivity one can infer from the historical record.

5) Okay, so then why don’t climate models actually diverge when reproducing the historical record? Because the high-sensitivity models have less forcing and / or more heat going into the ocean. If the same forcing and ocean heat uptake levels were applied to all climate models then the divergence between models of different sensitivities would be obvious.

Can climate models be used to estimate real-world climate sensitivity?

When attempting to estimate how sensitive the Earth’s climate is to an increase in greenhouse gas concentrations, researchers often turn to climate models. These offer a wide range of estimates on how much the Earth might warm as a result of a doubling of CO2 concentrations — roughly from 2ºC to 5ºC. So, in order to narrow (i.e. constrain) these numbers, many researchers try to focus on some aspect of the world’s climate and rank models by how well they emulate it. The reasoning is that, if for instance high-sensitivity models do a better job of mimicking real-world cloud behaviour, then these same high-sensitivity models may be better at representing other aspects of the climate system — including climate sensitivity.

This method is technically known as the emergent constraints approach. While there is in principle nothing wrong with trying it, a person who is new to the topic may be wondering why researchers don’t simply look at the most obvious constraint: temperature itself. The point of estimating climate sensitivity is to know how much the atmosphere will warm for a given increase in greenhouse gas concentrations (or radiative forcing to be more precise). In fewer words, we want to know future warming; doesn’t it make sense to look at how well the models have performed representing past warming?

The answer is that it makes sense but is not feasible, because models of widely varying sensitivities tend to reproduce the same temperature increases since the start of the observational record (∼1850). See this open-access paper and go to figure 3.

Models grouped into high- and low-sensitivity categories produce very similar amounts of warming until the present. Furthermore, the CMIP5 modeling groups knew the exam’s questions beforehand, so to speak, up to 2005; temperatures modelled before that year are not a forecast but a hindcast. And it’s around 2005 that a divergence starts to appear between high- and low-sensitivity models.

In short, you cannot use the historical temperature record to know which model is more accurate. Does that mean we don’t have enough data yet, or does it mean something else?

Man-made radiative forcing is already big enough for models to diverge according to their sensitivity

Look at the previous figure. The left panel shows temperature changes by the year 2100 under RCP 2.6, a scenario in which radiative forcing by the end of the century is 2.6 w/m2 above the baseline… and though I cannot find the exact definition of this baseline anywhere, I know it’s the figure around 1750 or 1850. As I’ll explain below, the exact baseline doesn’t matter much.

Clearly, 2.6w/m2 is enough for models to diverge; the temperature projections of low- and high-sensitivity models are separated by more than 0.5ºC. So how long will it take us to get to the RCP2.6 scenario? It turns out we’re already there. From the recent Lewis & Curry paper (hereinafter LC18), figure 2:

(The actual numbers can be found in the above link. Download the zip and open the AR5_Forc.new.csv file)

Total anthropogenic forcing as of 2016 was 2.82w/m2. LC18 use 1750 as a baseline, but using 1850 (if that’s the baseline the RCP scenarios use) would only reduce this figure by 0.1w/m2. In any case, current man-made radiative forcing is about as high as by the end of the RCP 2.6 scenario, or even higher.

Now, to be fair there is some divergence in current (2018) modelled temperatures under the RCP2.6 scenario, but there was virtually none by the end of the hindcast period, 2005. And by then, forcing was already 2.2w/m2 (again from an 1850 baseline). The point is, in simulations in which a forcing of 2 or 2.5 w/m2 is applied by the end of the XXI century, the models diverge; in reproducing historical temperature changes with similar forcing levels, the models don’t diverge. May that be because the historical record, while having a big enough forcing, is too brief for the models’ sensitivities to reveal themselves?

(Some readers will also be wondering: maybe LC18’s estimate of real-world forcing is higher than the forcing applied in models’ historical simulations? There is some evidence that indeed that’s the case. But if models have a smaller forcing than LC18, they should show a divergence in the historical simulations).

The historical record is more than long enough for the transient sensitivity of climate models to be estimated

First, some definitions. Transient sensitivity is technically called transient climate response (TCR). The ‘colloquial’ description of TCR is the amount of warming that has happened by the time CO2 concentrations have doubled. Because CO2 concentrations in the real world haven’t yet doubled, and we haven’t reached the equivalent forcing level even when including other greenhouse gases, observational studies have to use some approximation or extrapolation. For example, imagine that between 1950 and 2010 temperatures increased by 1ºC. Imagine, for the sake of illustration, that forcing increased between these two years by 2w/m2. That would mean warming so far is 0.5ºC/w/m2. Since the forcing associated with a doubling of CO2 concentrations is about 3.7w/m2, extrapolating you’d get that the TCR is 3.7w/m2 * 0.5ºC/w/m2 = 1.85ºC.

(Actually, observational studies don’t pick a single year, because yearly temperature and forcing can vary drastically due to El Niño, volcanoes, etc. Instead they look at the difference between two period averages, say 1950–60 and 2000–2010. Sometimes they use regression over the whole of the time period covered).

The definition of TCR in the context of climate models is a bit more formal. First, for matters of consistency it’s usually estimated by increasing CO2 concentrations 1% each year, thus doubling concentrations by year 70. Models also have ‘internal variability’, so to get a better idea of the warming caused by this doubling of CO2, what scientists actually calculate is the average temperature over years 60–80 of the simulation.

While this process may sound very artificial, there is evidence that climate models have virtually the same TCR whether driven by the 1%-a-year simulation or by the forcings of a historical simulation (see LC18, supplementary information, section S3). In any case, the main point is that estimating TCR in climate models takes about 70 years. Not hundreds.

(Wait. Did you the forcings applied in historical simulations by climate models are known? Actually, no — for a lot of models they are not known, which is why in order to estimate their sensitivity we have to resort to other kind of simulations).

More definitions of climate sensitivity

The acronym ECS has been used for different things over time so let’s back up a bit. First, the basic definition of equilibrium climate sensitivity is the eventual warming caused by a doubling of CO2.

Let’s say you double CO2 concentrations over 70 years. By the time concentrations have doubled, you measure temperature and thus calculate TCR. Supposing temperatures have increased 1.5ºC, then that’s the TCR.

But the planet will keep warming even if CO2 concentrations remain constant from that point on, because the climate system will be out of balance (i.e. the Earth will be taking in more energy than it’s releasing; that’s why the ocean is gaining heat). The process by which a body regains energy equilibrium is by increasing its release of heat, which happens when it gets hotter. Roughly speaking, an rise in ocean temperatures does not increase the energy release of the Earth because the ocean does not radiate to space; an increase of energy release requires a rise of air temperatures.

So if the Earth is in an energy imbalance, air temperature will keep rising until said imbalance reaches approximately zero, i.e. until the climate is in equilibrium: neither gaining nor losing heat, on the net. Let me emphasize: approximately zero. Everybody knows that the climate is never in complete equilibrium — and we couldn’t measure that even if it was, so it’s an irrelevant point. What matters is, how long does that take for the energy imbalance to get down to, let’s say, 0.1w/m2?

If climate model simulations are right, thousands of years! That’s a long time to wait — even in computer simulations. As a result, climate models are almost never run to equilibrium — it takes up too much computer power and time. Instead, the models’ equilibrium sensitivity has to be extrapolated from shorter simulations.

Here comes a complication. Continuing the previous example, let’s suppose upon doubling CO2 the planet has warmed by 1.5ºC, but there’s still an imbalance of 0.74w/m2. You can conceptualize it like this: out of the 3.7w/m2 of forcing that a doubling of CO2 involves, 2.96w/m2 have warmed the atmosphere while 0.74w/m2 has warmed the ocean. Thus, there is a ‘remaining’ forcing of 0.74w/m2 that has not yet acted to increase air temperature. Yes, physically that’s butchering the details, but I just want to get the concept across. What does the ‘remaining’ 0.74w/m2 mean, in terms of future atmospheric temperatures?

The standard formula used in many observational studies assumes that this remaining 0.74w/m2 will raise temperatures with the same efficacy as the previous 2.96w/m2. Following simple extrapolation, that means that if TCR = 1.5ºC, then ECS = 1.5 * (3.7 / 2.96) = 1.875ºC. Put other way, the ECS-to-TCR ratio would be 3.7 / 2.96 = 1.25.

Now, in climate models, that’s not exactly right. Usually, forcing applied at a later point in time raises temperatures more than if applied at an earlier point; in other words, their sensitivity increases over time. This is NOT to say simply that future temperatures will be greater than past temperatures; that will also happen if sensitivity is constant (or even declining!) over time. Rather, what it means is that future temperatures will be higher than if you simply extrapolated from past temperatures and forcings.

(The inverse of climate sensitivity is the feedback parameter λ. If you read a paper and it mentions ‘declining feedback parameter’ or something similar, what it actually means is increasing sensitivity).

This raises a problem: if papers based on observations about historical temperatures and forcings get an ECS result that depends on an assumption, and climate models’ behaviour doesn’t follow that assumption, then maybe the difference in ECS between observations and models is due to different definitions, not a mistake of the models. This was the point raised by Kyle Armour in this paper (henceforth I’ll call it A17).

A17 came up with a method to calculate the equilibrium climate sensitivity that climate models ‘would’ get if one used the same assumption as papers based on observations. A17 called this measure ECS_infer, referring to sensitivity as ‘inferred’ from the historical record; other studies termed it ICS, referring to sensitivity as calculated over the ‘industrial’ era. Previously, to distinguish true long-term sensitivity (ECS) from the results of observational studies and comparable measures of model sensitivity, sometimes the term ‘effective’ climate sensitivity was used, being equivalent to both ICS and ECS_infer. Finally, LC18 uses ECS_hist to refer to the same concept. I find ECS_hist more intuitive than the other denominations so I’ll stick with it.

LC18 took A17’s measure of ECS_hist, and added two more methods of measuring the same thing. Their three measurements are very similar (correlations among them are between 0.95 and 0.99). The ‘main’ ECS_hist result in LC18 is the average of the three methods.

Going back to the previous example, if the forcing caused by a doubling of CO2 concentrations is 3.7w/m2 and by the time CO2 concentrations have doubled there is still an imbalance of 0.74w/m2, then 1.25 is not the ECS-to-TCR ratio. Rather, 1.25 is the ECS_hist-to-TCR ratio. And the ECS-to-TCR ratio will be unknown, though climate models suggest higher.

Now, that’s a lot of mumbo jumbo. Surely at this point you’re wondering where I am going with all this discussion?

If you know a model’s TCR, you mostly know its ECS

In the previous section we saw that climate models have both an ECS (the warming that will take place over thousands of simulation years) and an ECS_hist (an estimate of how much they would warm if their climate sensitivity remained constant over time, as observational studies of climate sensitivity assume). LC18 provide, apart from the three measures of model ECS_hist and their average, one measure of model ECS; this data is available for 31 climate models. The actual numbers are in the ECStoICS.csv file; to make the following two plots I added the TCR values, taken from their table S2.

As you can see, TCR tells you pretty much all you need to calculate a model’s ECS_hist. For brevity I only post the plot showing TCR vs the mean of the ECS_hist estimates, but in the three cases correlation (r) is above 0.9.

There is the aforementioned caveat, that maybe ECS_hist is not a good measure because it differs from ECS. We can skip the ECS-to-ECS_hist step by directly comparing TCR with ECS. The relationship not so strong, but even there correlation is 0.74. Which is to say: more than half of the variance (0.74² = 0.55) in model ECS is explained by their TCR. This is remarkable: remember that ECS is designed to estimate temperature changes over thousands of years, whereas TCR looks at temperature changes over 70 years (and through a different method).

Differences between ECS and ECS_hist will likely be small

So, if the models are right, then true long-term climate sensitivity (ECS) will be higher than what can be estimated from the historical record (ECS_hist). In LC18 the mean difference between both measures is 12% (median 9%). Put other way, the ECS-to-ECS_hist ratio is above 1.

LC18 suggest one reason to be skeptical of the difference between both measures is that models with a high ECS_hist also tend to have a bigger increase when going from ECS_hist to ECS; in other words, if real-world ECS_hist is low, then the real-world difference between ECS_hist and ECS might be small as well. But this association is surprisingly weak: correlation (r) between ECS_hist and the ECS-to-ECS_hist ratio is 0.13, and statistically it’s nowhere near significance (p-value = 0.47).

Now, if ECS is indeed higher (or simply different) from ECS_hist, why could that be? Going back to the example of a doubling in CO2 concentrations, imagine that, by the time concentrations have doubled, the climate is already almost in equilibrium, with an imbalance of 0.2w/m2. In such a case, TCR will be very similar to ECS_hist… and, since only 0.2w/m2 of forcing can still warm the Earth, there is little possibility for deviations between the purely extrapoled value (ECS_hist) and the actual sensitivity (ECS). It doesn’t matter much if the remaining 0.2w/m2 has a different efficacy than the previous 3.5w/m2, because the effect on long-term temperatures will be tiny anyway.

By contrast, imagine that by the time CO2 concentrations have doubled there is still an imbalance of 3.5w/m2. You could visualize this as: out of 3.7w/m2 of CO2 forcing, only 0.2w/m2 has actually warmed the atmosphere; the other 3.5w/m2 have gone into the ocean and so haven’t affected air temperatures yet. In such a scenario, ECS_hist would indeed be a very poor measure, because there’s so much extrapolation! You’d be using the effects of 0.2w/m2 to predict what would happen with the next 3.5w/m2.

In short: my thesis is that the closer TCR and ECS_hist are, the more reliable ECS_hist will be, in terms of being close to ECS.

In more words: if the ECS_hist-to-TCR ratio is low, then the ECS-to-ECS_hist ratio will also be low. If the energy imbalance is small as a proportion of forcing, then long-term climate sensitivity (ECS) will be very similar to the sensitivity that can be inferred from the historical record (ECS_hist).

Here I have plotted, from LC18, each models’ ECS_hist-to-TCR ratio and their correspondent ECS-to-ECS_hist ratio. The relationship is quite strong (r = 0.42, p-value = 0.019). I calculated three more ECS_hist-to-TCR ratios, one with each of the independent ECS_hist measures, and in all cases the correlation with the ECS-to-ECS_hist ratio is about 0.4.

The question is: where on that plot would the real world be? Of course we cannot know the real world’s ECS-to-ECS_hist ratio, but according to LC18 its ECS_hist-to-TCR ratio is 1.25, below that of any climate model. This suggests that, if indeed the real-world ECS is higher than the ECS_hist, i.e. higher than the result obtained from observations of the historical record, the difference is likely to be minimal.

The reason climate models don’t diverge in the historical record: more sensitive models have less forcing and more heat going into the ocean

This is the part of the article I know the least about, so I’m mostly just going to point you to this paper by Stephen Schwartz and others. They look at 24 climate models and report their climate sensitivity and a combined measure of forcing and ocean heat uptake. Why combined?

Remember that the puzzle described at the beginning of the article was, why do climate models with differing sensitivities reproduce similar temperature changes over time? This could happen for two reasons. One is that high-sensitivity models have less forcing. The other is that high-sensitivity models have more heat going into the ocean, i.e. more forcing which has not yet warmed the atmosphere.

In the chart below, the global energy imbalance is called the ‘heating rate’ (roughly equivalent to ocean heat uptake), and denoted by N. Forcing is F. Thus, the amount of forcing that has affected the atmosphere is F minus N. And as you can see, models with high sensitivity (in the upper part of the chart) also have a smaller level of F minus N (they are on the left side).

Data on climate models’ historical forcing is hard to find, so I wouldn’t say Schwartz’s paper is definitive; I haven’t seen this kind of chart reproduced elsewhere. But so far it seems the best explanation.

PS: throughout the article I discuss how forcing is ‘applied’ to models. This is technically wrong, because radiative forcing is not prescribed; you cannot simply input a different forcing quantity and see how the model reacts. Rather, what is prescribed is the concentration of greenhouse gases and aerosols, and the physics for their interaction with each other, the clouds, sunlight, etc. So forcing levels ‘emerge’ from the models’ physics. That’s more accurate, but hard to use in a normal sentence.