Spatial Reasoning in Random Field Models: Part II

Rachel Prudden
Met Office Informatics Lab
5 min readMay 21, 2021

In Part I of this series, we discussed the mathematics underpinning Gaussian process (GP) and Gaussian random field (GRF) models, how they can incorporate observations of either points or spatial averages, and how these two kinds of observation have differing effects on the model output.

As we’ll see, things get even more interesting when we combine both types of observation.

Why might we want to do this? Let’s look at a motivating situation. Suppose you have an autonomous vehicle recording a stream of observations relating to the rainfall at its current location. These tell you a lot about the situation along the vehicle’s past trajectory, but not much about what it will encounter further on its journey. On the other hand, radar observations will give you some information for the entire journey, but with a resolution that’s much lower. The question then is, can we gain any further insight by combining these two sources of information so that the vehicle can improve its awareness of the conditions it is likely to encounter?

In fact, something more interesting is true. At least in theory, there is information we can glean from the combination of these two types of observation that we couldn’t get from either individually. What’s more, it emerges from the mathematics describing the situation in a beautiful and natural way.

Illustrative example

To illustrate the idea, in this post I will use a toy model combining a GRF model with synthetically generated data and perfect observations. This will let us get into the interesting mechanics of the problem without getting hung up on thorny practicalities like noisy measurements and the non-Gaussianity of rainfall. Of course, these are important practicalities that would need to be considered in a real application.

First, we’ll generate some synthetic data. Here’s a sample from a Gaussian random field*, generated using the tools from the previous post:

Here’s the same data, but with reduced resolution:

We can use the techniques from the previous post to get a distribution for the high resolution data (assuming for a moment that we already have a good estimate of the prior lengthscale). If we sample from this distribution, we get something like this:

Now, what happens if we also have a point observation? To make the difference stand out, we’ll use an observation that’s quite a bit higher than its surroundings. This will let us see how the observations propagate through the mathematics, and affect the final result.

Below, I’ve shown a sample from a model incorporating a single point observation in addition to the low resolution averages, compared against a sample from our original model using low resolution averages only. The point observation has a value of 2, and its location is indicated by the red arrow. It’s clear that the point observation has pulled up the value of the field, not only at that point but in the surrounding area.

Analysing the distribution

To get a deeper understanding of what’s going on, we’ll need to go beyond looking at samples and visualise the distribution as a whole. We can do this by looking at the mean field and the variance at each point.

Shown below are the mean fields for the models with and without the point observation, and on the right the difference between the two.

Now we can see there’s a bit more going on than was immediately apparent. The point observation hasn’t just pulled up the surrounding values, it’s actually pushed down the values in the rest of the grid box.

The reason this happens is the interplay between the two kinds of observation. We’ve observed a high value at the marked point, meaning that the values in surrounding neighbourhood are also likely to be high. But we’ve also observed a low-ish average value for the grid box as a whole. To compensate for the high values in one corner, the values in the rest of the box must be lower. This effect even spills over into neighbouring gridcells; the point observation induces a higher value in one corner, and lower values elsewhere.

What’s nice about this is that we didn’t have to build this reasoning into our model. The logic arises from the standard linear algebra calculations we used to condition the model on our observations. If you remember back to the previous post, the pattern of negative values in the posterior covariance matrix for the spatial model was another manifestation of the same effect.

Now let’s have a look at the variance of the model distribution at each point. The first thing to jump out is the checked pattern present in both models. This isn’t an artefact, but reflects the information available from the spatial averages: these are more informative near the centres of the grid boxes, less so on the boundaries between them, and least of all at the corners where four of them meet. That said, they do somewhat obscure the influence of the point observation.

The difference field on the right tells a clearer story. The biggest reduction in variance is in the area surrounding the point observation, as expected. What’s more interesting is the reduction in variance at some distance from this point, most noticeably in the opposite corner of the grid box. Comparing this pattern to the mean field diff above, the reduction in variance lines up closely with changes, positive or negative, in the mean field.

Wrapping up

Taking a step back, what does all this tell us? By combining two types of observation, a point value and a spatial average, we have gained information; this is shown by a reduction in variance. The neat part is, some of this information gain happens at some distance from the point observation, such as in the opposite corner of the grid box. This is information beyond what we’d already get from just the spatial observations. It is an emergent effect, in that neither the spatial nor the point observation alone tells us anything about this corner. It’s only when we combine the two that we learn something new.

That’s all for this two-part series on spatial reasoning in random fields!

--

--

Rachel Prudden
Met Office Informatics Lab

Rachel is a researcher in the Informatics Lab. Her current focus is on probabilistic super-resolution of weather models at convective scales.