Kriging the French temperatures
A tutorial on simple kriging, from open-source data to temperature map visualisation
Have you ever wondered how temperature maps on your daily TV weather report are produced? They seem obvious to us, since we have been so used to watching them daily. They are easy to read: each pixel of the map indicates the temperature level by a colour (usually from blue to red), and lets you know how cold or warm your day will be tomorrow at your place.
One pixel of the map represents an area of, say, 10km x 10km. In order to create a map, your first intuition might be that a weather station has been installed every 10 kilometres northward and eastward in your country, to collect all the temperatures and then draw one pixel per measured value.
Yet, you will certainly not stumble on a weather station every 10 kilometres on your way. You are probably even far away from one of them if you live in the countryside! Think about this: for example Mainland France area is 544000 km². If you need one station every 10km x 10km, it actually means that a network of 5440 stations must be deployed, perfectly meshing the whole country. But this is not what you will find in real life: the network geometry is neither regularly structured nor so dense.
So how to create a temperature map if the stations do not regularly cover the whole country? The technique to do so is named interpolation in a general way. The global idea is the following: one pixel indicates
- either an exact measured value if a station exists at its corresponding geographic coordinate,
- or an interpolated value otherwise, meaning that the value is “guessed”, based on what the nearby stations measure.
How to interpolate temperatures?
Over an area as large as a country (e.g. Mainland France), there are two important facts to keep in mind.
- Imagine that you live in Paris. You know how warm it is right here right now. However you cannot guess exactly how warm it is 650km away at Marseille, based on simple intuition. The same goes for Bordeaux, Toulouse, etc. As such, we can say that temperatures become random on large distances…
- … Now, if you are still in Paris, you could have an accurate guess of how warm it is a few kilometres away (e.g. Versailles or Roissy), because you know that the temperature is almost the same there. In fact, we can reasonably say that the temperature field is highly correlated on short distances. And the physics dictates that it is a continuous function of space anyway.
Explaining physical models with these two characteristics (randomness and spatial correlation) is possible via appropriate probabilistic models. The most used of them is called Gaussian process.
If there is no correlation, the process is simply a Gaussian noise with independent samples. But if the process is correlated by the neighbouring samples, then the process becomes smooth!
It is precisely this smoothness that allows us to go for interpolation. Indeed, suppose that this field is measured at a 20 points only. Is it possible to interpolate the field between these points?
The answer is yes! As you can see above, the interpolation from these 20 points helps to provides a good guess of the true field. The interpolation method used here is called kriging [1, 2, 3]. Note that there is a trade off between the number of known samples and the interpolation accuracy. In areas where measured points are few, the interpolation has more uncertainty, and is likely to be less accurate. However, there is zero uncertainty at points where values are known.
The purpose of this article
Here I present straightforward guidelines for producing your own temperature map by kriging, based on open-source data. In this article, I prefer to stay away from the maths, since many resources already do the job very well [1, 2, 3, 4]. My intention is not to introduce the theory of Gaussian processes and kriging, but to give you a practical tutorial to use these tools from data pre-processing to visualisation.
The full Python Notebook to reproduce the results is available on my GitHub page.
Disclaimer: I am NOT a meteorologist, and this tutorial is nothing more than a playground for signal processing enthusiasts! Although Météo France mentions the use of kriging in their process for temperature, humidity, wind and pressure, I have no idea about all the misleading approximations that I am doing here to produce professional-level maps.
The tutorial
In this tutorial, we will produce one temperature map on the Mainland France, with open-source data provided by Météo France. We will focus on the freely available resource: the SYNOP database, updated every 3 hours.
I encourage you to download and follow the Python Notebook at the same time.
A. Get the data
To begin with, you need two download 2 csv files:
- the list of the stations with their geographical coordinates and their unique identifier (ID),
- and one file which contains temperature time series of all stations. Each time serie contains one point every 3h, on a period of one month. Here, the ‘numer_sta’ index corresponds to the ‘ID’ in the previous table.
First, the stations from the table are filtered to preserve the ones into the Mainland. Let’s visualise their positions on a map with folium.
We also anticipate the bounds in which we wish to draw the map. Let’s choose for the moment the grey rectangle and keep a little extra margin with respect to the land.
With the help of unique identifiers, it is possible to fetch the data points of one specific station. For example, ‘Toulouse-Blagnac’ is identified by the number 7630.
I chose the archive from January 2020, but you may choose any month you want. Be careful: some archived files contain NaNs, probably because stations or communication links are sometimes out of order.
From now on, we must also choose a day & hour to focus on, because the map describes the space fluctuations at one specific time only. The processed data below corresponds to the date of 2020/01/01 at 3am.
B. Choose a relevant Gaussian process model
The crucial step before kriging is to make a good choice for the probabilistic model, because the whole kriging interpolation relies on it. Remember the two characteristics of Gaussian processes: randomness and spatial correlation.
Signal randomness is something that you cannot capture by nature. However, you can predict by how much the signal is spatially correlated according to the data. If you visualise the distance between the two points vs the temperature difference at these two points, you obtain a graph called variogram.
What I called the temperature “difference” actually corresponds to the squared difference up to constant in Figure 7 — this remains a detail if you want to grasp the general idea. The most important here is that you can fetch the level of spatial correlation from this point cloud, by a regression with an analytical function of the variogram.
Different analytical variogram functions exist, and the appropriate choice of one of them requires some experience… I have made the “default” choice of a Gaussian kernel [3, slide 22] to describe the spatial correlation.
C. Interpolate the temperatures by simple kriging
Once the Gaussian process model has been chosen, we have all the elements to do kriging. Basically, the inputs that you need are:
- the measured values at the sampling points,
- the geometric coordinates of the sampling points,
- the geometric coordinates of the target points to interpolate,
- the “calibrated” probabilistic model, with the spatial correlation obtained by data in the step B.
From that, kriging gives two outputs:
- the estimated values at the target points,
- the estimated uncertainty (named variance in the estimation theory) at the target points.
In the Notebook the code to do simple kriging is given, and again I encourage you to read [1, chapter 6] if you want to understand how the estimator is derived. If you are curious, and want to try different kriging methods, have a look at PyKrige!
D. Draw on a map
To easily draw the map with Python I use cartopy and matplotlib. Cartopy automatically draws the countries, boundaries and oceans. More importantly, it creates the appropriate projection in a matplotlib figure, in order to draw a map referenced by latitude and longitude coordinates. Thus, there is no need to make any cumbersome conversion between cartesian and geographic coordinate systems.
Below are the raw results of the expectation (left) and the standard deviation (right), which is the square root of the variance.
You can see that the map looks fine within the Mainland. However, as soon as the interpolated points are far from the sampling points, the obtained values quickly become nonsense, and the uncertainty level soars.
This is normal: kriging is good at interpolating inside the sampled region, but is very bad outside of it. Remember that a Gaussian process is above all… a random process! As long as an interpolated point remain well surrounded by nearby sampling points, the spatial correlation helps. However, it is naturally impossible to make estimates too far away from every sampling points.
For the final touch, I clip the map portion that goes beyond France, since kriged values are not relevant in that region. To do so you need the polygon delimiting the border of country, and find all the map points located into this polygon. Finally, I use contours rather than a continuous colormap to observe the temperature rounded to the nearest degree.
References
[1] Andreas Lichtenstern. Kriging methods in spatial statistics (2014).
[2] Rodolphe Le Riche. Introduction to Kriging (2014).
[3] Rodolphe Le Riche & Nicolas Durrande. An overview of kriging for researchers (2019).
[4] Yuge Shi. Gaussian processes, not quite for dummies (2019).