TDS Archive

An archive of data science, data analytics, data engineering, machine learning, and artificial intelligence writing from the former Towards Data Science Medium publication.

Kriging the French temperatures

8 min readDec 13, 2021

--

A typical temperature map of the US — Image by National Weather Service.

Have you ever wondered how temperature maps on your daily TV weather report are produced? They seem obvious to us, since we have been so used to watching them daily. They are easy to read: each pixel of the map indicates the temperature level by a colour (usually from blue to red), and lets you know how cold or warm your day will be tomorrow at your place.

One pixel of the map represents an area of, say, 10km x 10km. In order to create a map, your first intuition might be that a weather station has been installed every 10 kilometres northward and eastward in your country, to collect all the temperatures and then draw one pixel per measured value.

Yet, you will certainly not stumble on a weather station every 10 kilometres on your way. You are probably even far away from one of them if you live in the countryside! Think about this: for example Mainland France area is 544000 km². If you need one station every 10km x 10km, it actually means that a network of 5440 stations must be deployed, perfectly meshing the whole country. But this is not what you will find in real life: the network geometry is neither regularly structured nor so dense.

So how to create a temperature map if the stations do not regularly cover the whole country? The technique to do so is named interpolation in a general way. The global idea is the following: one pixel indicates

  • either an exact measured value if a station exists at its corresponding geographic coordinate,
  • or an interpolated value otherwise, meaning that the value is “guessed”, based on what the nearby stations measure.

How to interpolate temperatures?

Over an area as large as a country (e.g. Mainland France), there are two important facts to keep in mind.

  • Imagine that you live in Paris. You know how warm it is right here right now. However you cannot guess exactly how warm it is 650km away at Marseille, based on simple intuition. The same goes for Bordeaux, Toulouse, etc. As such, we can say that temperatures become random on large distances
  • … Now, if you are still in Paris, you could have an accurate guess of how warm it is a few kilometres away (e.g. Versailles or Roissy), because you know that the temperature is almost the same there. In fact, we can reasonably say that the temperature field is highly correlated on short distances. And the physics dictates that it is a continuous function of space anyway.

Explaining physical models with these two characteristics (randomness and spatial correlation) is possible via appropriate probabilistic models. The most used of them is called Gaussian process.

Figure 1 — Comparing two Gaussian processes: uncorrelated (left) and locally correlated (right). Image by author.

If there is no correlation, the process is simply a Gaussian noise with independent samples. But if the process is correlated by the neighbouring samples, then the process becomes smooth!

It is precisely this smoothness that allows us to go for interpolation. Indeed, suppose that this field is measured at a 20 points only. Is it possible to interpolate the field between these points?

Figure 2 — Interpolation on the line by simple kriging. Image by author.

The answer is yes! As you can see above, the interpolation from these 20 points helps to provides a good guess of the true field. The interpolation method used here is called kriging [1, 2, 3]. Note that there is a trade off between the number of known samples and the interpolation accuracy. In areas where measured points are few, the interpolation has more uncertainty, and is likely to be less accurate. However, there is zero uncertainty at points where values are known.

The purpose of this article

Here I present straightforward guidelines for producing your own temperature map by kriging, based on open-source data. In this article, I prefer to stay away from the maths, since many resources already do the job very well [1, 2, 3, 4]. My intention is not to introduce the theory of Gaussian processes and kriging, but to give you a practical tutorial to use these tools from data pre-processing to visualisation.

The full Python Notebook to reproduce the results is available on my GitHub page.

Disclaimer: I am NOT a meteorologist, and this tutorial is nothing more than a playground for signal processing enthusiasts! Although Météo France mentions the use of kriging in their process for temperature, humidity, wind and pressure, I have no idea about all the misleading approximations that I am doing here to produce professional-level maps.

The tutorial

In this tutorial, we will produce one temperature map on the Mainland France, with open-source data provided by Météo France. We will focus on the freely available resource: the SYNOP database, updated every 3 hours.

I encourage you to download and follow the Python Notebook at the same time.

A. Get the data

To begin with, you need two download 2 csv files:

Figure 3 — The csv table of SYNOP stations. Image by author.
  • and one file which contains temperature time series of all stations. Each time serie contains one point every 3h, on a period of one month. Here, the ‘numer_sta’ index corresponds to the ‘ID’ in the previous table.
Figure 4 — One csv file of temperatures over 1 month, for each station and every 3 hours. Image by author.

First, the stations from the table are filtered to preserve the ones into the Mainland. Let’s visualise their positions on a map with folium.

Figure 5 — Preserved SYNOP stations. The temperature will be interpolated within the grey rectangle. Image by author.

We also anticipate the bounds in which we wish to draw the map. Let’s choose for the moment the grey rectangle and keep a little extra margin with respect to the land.

With the help of unique identifiers, it is possible to fetch the data points of one specific station. For example, ‘Toulouse-Blagnac’ is identified by the number 7630.

Figure 6 — Temperature time serie at Toulouse-Blagnac in January 2020. Image by author.

I chose the archive from January 2020, but you may choose any month you want. Be careful: some archived files contain NaNs, probably because stations or communication links are sometimes out of order.

From now on, we must also choose a day & hour to focus on, because the map describes the space fluctuations at one specific time only. The processed data below corresponds to the date of 2020/01/01 at 3am.

B. Choose a relevant Gaussian process model

The crucial step before kriging is to make a good choice for the probabilistic model, because the whole kriging interpolation relies on it. Remember the two characteristics of Gaussian processes: randomness and spatial correlation.

Signal randomness is something that you cannot capture by nature. However, you can predict by how much the signal is spatially correlated according to the data. If you visualise the distance between the two points vs the temperature difference at these two points, you obtain a graph called variogram.

Figure 7 — Variogram on 2020/01/01 at 3am. Image by author.

What I called the temperature “difference” actually corresponds to the squared difference up to constant in Figure 7 — this remains a detail if you want to grasp the general idea. The most important here is that you can fetch the level of spatial correlation from this point cloud, by a regression with an analytical function of the variogram.

Figure 8 — Variogram regression. Image by author.

Different analytical variogram functions exist, and the appropriate choice of one of them requires some experience… I have made the “default” choice of a Gaussian kernel [3, slide 22] to describe the spatial correlation.

C. Interpolate the temperatures by simple kriging

Once the Gaussian process model has been chosen, we have all the elements to do kriging. Basically, the inputs that you need are:

  • the measured values at the sampling points,
  • the geometric coordinates of the sampling points,
  • the geometric coordinates of the target points to interpolate,
  • the “calibrated” probabilistic model, with the spatial correlation obtained by data in the step B.

From that, kriging gives two outputs:

  • the estimated values at the target points,
  • the estimated uncertainty (named variance in the estimation theory) at the target points.

In the Notebook the code to do simple kriging is given, and again I encourage you to read [1, chapter 6] if you want to understand how the estimator is derived. If you are curious, and want to try different kriging methods, have a look at PyKrige!

D. Draw on a map

To easily draw the map with Python I use cartopy and matplotlib. Cartopy automatically draws the countries, boundaries and oceans. More importantly, it creates the appropriate projection in a matplotlib figure, in order to draw a map referenced by latitude and longitude coordinates. Thus, there is no need to make any cumbersome conversion between cartesian and geographic coordinate systems.

Below are the raw results of the expectation (left) and the standard deviation (right), which is the square root of the variance.

Figure 9 — The raw result: interpolation by simple kriging (left) and standard deviation uncertainty level (right). Image by author.

You can see that the map looks fine within the Mainland. However, as soon as the interpolated points are far from the sampling points, the obtained values quickly become nonsense, and the uncertainty level soars.

This is normal: kriging is good at interpolating inside the sampled region, but is very bad outside of it. Remember that a Gaussian process is above all… a random process! As long as an interpolated point remain well surrounded by nearby sampling points, the spatial correlation helps. However, it is naturally impossible to make estimates too far away from every sampling points.

For the final touch, I clip the map portion that goes beyond France, since kriged values are not relevant in that region. To do so you need the polygon delimiting the border of country, and find all the map points located into this polygon. Finally, I use contours rather than a continuous colormap to observe the temperature rounded to the nearest degree.

Figure 10 — The final result: interpolation by simple kriging (left) and standard deviation uncertainty level (right). Image by author.

References

[1] Andreas Lichtenstern. Kriging methods in spatial statistics (2014).
[2] Rodolphe Le Riche. Introduction to Kriging (2014).
[3] Rodolphe Le Riche & Nicolas Durrande. An overview of kriging for researchers (2019).
[4] Yuge Shi. Gaussian processes, not quite for dummies (2019).

--

--

TDS Archive
TDS Archive

Published in TDS Archive

An archive of data science, data analytics, data engineering, machine learning, and artificial intelligence writing from the former Towards Data Science Medium publication.

Charles Vanwynsberghe
Charles Vanwynsberghe

Written by Charles Vanwynsberghe

Senior Researcher in 6G | Signal Processing Scientist | https://www.vanwynsberghe.eu/