We often want to do climate model analysis with statistics and machine learning, but accessing climate model data can be a barrier. This could be because you don’t have access to a global climate model or the data may be too large for your needs or running a global climate model might take too long/cost too much/require hardware you don’t have. Wouldn’t it be great if there was a basic climate model you could run from your own laptop, with your choice of model set-up, parameters and resolution?
This is exactly what the Climate Modelling and Diagnostics Toolkit (CliMT) is designed to do. Being a Python based climate model, it may be useful to data scientists who want to test out machine learning algorithms. Currently, algorithms can be tested on very simple toy models or on large expensive data from complex models that require expensive hardware. CliMT lies between these extremes and allows users to easily add in components to build on the complexity of the model.
What is CliMT?
CliMT is a Python based library which provides a modular and intuitive approach to writing numerical models of the climate system. CliMT provides state-of-the art components and an easy-to-use interface to allow writing research quality models without the hassle of modifying Fortran code.
Like the Met Office’s Unified Model, the majority of climate models are written in Fortran or C. This is great for performance but it is difficult for researchers to edit and change model configurations. The key benefit of CliMT is that the user can more easily build up complex models, change configurations and swap in various components.
CliMT uses Fortran code within its components, maintaining the same structure and code as traditional climate models, with high performance. However, the user interacts with CliMT components through Python, which is commonly used for climate model data analysis, allowing the modelling and analysis to be integrated. This opens up more doors for analysis methods that combine statistics or machine learning with the physics. An example of this is probabilistic Bayesian inference which combines a known model with observations to infer latent variables, which can then be used in other aspects of the model.
How does it work?
Models can be built up from basic components to create more and more complex models, without the need for the user to rewrite code at each stage. To do this, CliMT uses Sympl, A System for Modelling Planets, which is designed for writing modular Earth system and planetary models, making them easily readable and understandable.
- Initialise components
- Initialise state
- Step forward in time
1D Radiation Model Example
I’ll outline how to build a CliMT model, starting with a simple radiation model of a 1D column of air. You can follow this on a Jupyter notebook here (notebook 03).
1. Initialise components
Components represent the physical processes. Examples that are already built into CliMT include convection, radiation, surface, dynamical core, ice processes. See here for more examples.
Components in CliMT return:
- Tendencies: the local rate of change of a quantity with time at a given point in space.
- Diagnostics: variables not directly tracked by the model itself, that are instead derived from other variables.
For our simple 1D model, we will provide only 1 component: radiation. We can use the built-in radiation scheme
The output of a component are tendency and diagnostic dictionaries. We can check the properties using
diagnostic_properties, which tells us the names, dimensions and units of the returned dictionaries.
2. Initialise state
We need to define the grid the state is built on. We will start with a 1D grid, an atmospheric column of air. We can get a default
grid from CliMT, which gives us a dictionary of dimensions.
Next we create the
state itself using
get_default_state. This creates another dictionary of quantities required for the grid and component set-up (the radiation scheme) that we have specified.
The list of quantities in the state is similar to the grid, but we have additional quantities that are inputs or outputs of our radiation scheme. These are:
surface_temperature. The call to
climt.get_default_state sets up the default initial conditions to be realistic for the system, but we can edit these if required. Below, we plot the initial profile of the air temperature against the vertical coordinate, air pressure.
Now we will take a look at what the radiation component does. The tendencies and diagnostics are calculated by making a call to
radiation with the argument
state. The tendencies are instantaneous, but we can use them to calculate the new model state, after a short period of time. First, we have to define a timestep using Python’s
Now let’s run the radiation scheme, step forward in time by 20 minutes and plot the air temperature profile before and after.
We can see that the radiaton scheme has changed air temperature from its initial state and a temperature gradient has developed — its hotter at the surface and cooler at higher altitudes. Then we could update the current state using the tendencies provided and repeat this in a loop.
We can see the temperature difference between the surface and the top of the atmosphere increases with time. Eventually this will reach an equilibrium state.
3D Global Climate Model Example
In this example, I’ll show how you can build a simple climate model in CliMT with a 3D longitude-latitude-height grid and with more components of the climate system. It involves the same processes as before, with (1) initialising components, (2) initialising the state and (3) stepping forward in time. This time I’ll introduce more options that might be useful.
1. Initialise components
Following a set-up similar to the one described in Monteiro et al, 2018, we will create 3 components:
- longwave radiation
- simple physics component for the boundary layer,
You can put in arguments to these components, but for now let’s keep these at the default values (chosen to be sensible for the Earth’s system).
At this point, we can also use a
Wrapper to modify the behaviour of components, if necessary. These are used to change the inputs or outputs of a component to make it apparently work in a different way. Below, we use the
UpdateFrequencyWrapper on the radiation scheme to provide a longer timestep for this component. We also use the
TimeDifferencingWrapper which provides the output of the components as time tendencies, using a finite diference scheme. This is required on the
SimplePhysics component because we will be using a spectral dynamical core, which doesn’t work well with components that work in grid space.
2. Initialise state
This time we will build a 3D grid. Then we can collect the components into a time stepper, which will automatically calculate the updated state from the tendencies. This is also particularly useful when we have multiple components, as they are all called from one function. Examples for timesteppers include common numerical integrators such as
Here, we will use the
climt.GFSDynamicalCore which integrates the dynamical core from the Global Forecast System (GFS) model built by the National Centers for Environmental Prediction (NCEP). This is written in Fortran but we won’t need to interact with it directly.
Here we can see there are quite a few extra quantities in the state, required by 3 components in our model. The default values for most variables are zero. Instead, we can use pre-defined initial conditions from
climt such as the Dynamical Core Model Intercomparison Project (DCMIP) initial conditions. This should give us a realistic initial state that will speed up the time for the model to converge to the final state.
We set up these initial conditions and plot a zonal mean contour plot, where the variable is averaged in the longitudinal direction and plotted on a latitude vs height axis. I’ve plotted some important meteorological variables: the eastward wind, air temperature and relative vorticity. I’ve also plotted a surface map of the eastward winds and temperature. We can see that the initial conditions already include a warmer temperature near the surface and at the equator and two jets of faster wind speeds in the tropics — this is as expected.
3. Step Forward
Now we will put these into a loop to step forward in time. We can call the timestepper,
dycore on the state for our given timestep. This returns the diagnostics and the new state.
Now we can track how the state evolves with time! See the Jupyter notebook here. On my laptop, it takes around 5 minutes to run around 1 month in model time, which is great for getting quick results! We hope to use this with Bayesian inference in the following section.
Can we do Bayesian Inference on a CliMT Model?
CliMT appears to be a useful tool for data scientists who want to get climate data quickly. I’ve put this to the test by carrying out Bayesian inference using Pyro, in the same procedure used in this blog post. You can follow this in a Jupyter notebook here.
I’ve used a similar model to the 3D one described above with radiation, convection and a boundary layer. In CliMT, you can easily change the parameters of the system to run the model in different configurations. I’ve been editing the rotation rate of the Earth to see the effect on the eastward wind speed down at the surface.
The true rotation rate is once every 24 hours, or 7.29 × 10^-5 s. The Pyro model now follows the same set up as the previous blog posts:
- Define a prior on the rotation rate (I have used a Normal prior located at 7 with a standard deviation of 2, working in units of × 10^-5 s)
- Carry out some deterministic computations. In this case, the deterministic step includes setting up a CliMT state and stepping forward in time, to obtain the eastward wind speed.
- Make an observation. I’ve set it up so that we observe the wind speed at one latitude (56 degrees, around the location of the UK). The observation is made with some small measurement error, as before.
I want to see if we can infer the rotation rate of the Earth based on data from the CliMT model for the true rotation rate of the Earth. As before, we condition on the the observed data in just one line:
and then we set up the guide for stochastic variational inference (SVI):
For simplicity, I’ve stuck with normal distributions, although this should extend to other distributions. We want to infer 2 parameters to describe the rotation rate:
guide_loc , the location of the mean, and
guide_scale , the standard deviation of the distribution. Then we run the SVI in the same style loop as previous posts.
Unfortunately, this gets expensive very quickly. Each call to the model takes around 4 minutes to run on my laptop which means running the SVI for only 50 iterations takes around 3 hours. This isn’t long enough to know whether its heading in the right direction. Here’s a plot of how the losses and the two parameters look for 50 iterations:
The mean parameter looks like its hovering around 7, but we can’t be sure without running this for much longer! You might have also noticed I started the SVI with the mean parameter at 6.5, quite close to its true value 7.3. If we really had no idea of a good starting point for the SVI, we could be waiting a while for it to converge.
This is just one piece of evidence that faster running models could be useful in the data science and machine learning community. This could be done by speeding up CliMT models with access to more powerful hardware or integrating Dask with CliMT. It also might be useful to introduce surrogate modelling such as Gaussian process emulators, which could be used to emulate CliMT models. I’m hoping to see more of this in the future to give us access to quicker climate model data.