What have you done?

Niall Robinson
Met Office Informatics Lab
3 min readNov 29, 2019

Recording how we use data to come to decisions is a perennial challenge in science. At the Met Office, we have very complex data pipelines which can go from raw observations such as satellite data, through physical models such as our weather forecast running on a specialist high-performance computers, to bespoke analyses by weather experts, before consultation with stakeholders leads to a subjective decision.

Moreover, we have to mix our data with other people’s data and systems both up- and down-stream of our weather forecast. All this means that it can be hard to effectively record the journey of information from start to finish.

The Informatics Lab is currently working hard researching ways to tackle this problem. This is the first in a series of blog posts about what we’re up to, and will focus on the motivation.

Recording the journey of information is fundamental to understanding what we know. As such, if we can crack it, it would have wide-ranging (and often overlapping) implications. For instance…

Provenance

How can we trust knowledge if we don’t know how much we can trust what that knowledge is built on? What if we find a bug in some satellite post-processing routines? Do we know we can still trust the decision to ground flights because of a weather warning? Can we prove it?

In addition, data is not simply “good” or “bad”, there is a grey area in the middle — perhaps data that is good enough for some applications is not good enough for others.

Accountability

If the wrong decision has been made, who or what is responsible for it? This is partly about identifying problems so they can be fixed. However, there is also the potential for the attribution of legal culpability for the consequences of bad advice.

Apportionability

Who has used my data for what? Generating data often costs lots of money, and that funding needs to be justified with evidence. We need to quantify how much our data is used and how important the impact of the use is. In isolation a weather forecast is useless, however, it allows people to make useful decisions, for example to ground a flight and save lives, so the UK funds it as a national capability. As data pipelines get more complex, it’s getting harder to directly quantify the value of the weather forecast. How much more money do we estimate financial traders will make if we upgrade data compression algorithms on our weather satellite…

Reproducibility

Science is fundamentally build upon the concept of reproducibility. However, there has been a noted “replication crisis” across science, including computational science. The emerging field of data-science has its own version of the replication crisis, with people all too often not able to reproduce results because they can’t access comparable data, systems, or algorithms. Earth science works with chaotic systems — as the godfather of Chaos Theory, Lorenz observed, you need to be very sure you are running exactly the same weather forecast in the same way in order to get the exactly the same answers.

Operationalisation

If data processing chains are robustly reproducible, they can be deployed as operational data pipelines which are run regularly to generate standardised products (for instance, a weather forecast).

Collaboration

If we have recorded precisely what’s happened, can other people adapt them? For instance, could a scientist repeat a published experiment with a new dataset? Similarly, can people collaborate by sharing datasets, effectively stitching together different provenance chains.

As you can see, this covers a bunch of overlapping concepts: provenance; data pipelines; work flows; task graphs; version history.

So — what to do? Over the next couple of months we’ll be following up with posts talking about what we’re doing, thinking and making.

--

--

Niall Robinson
Met Office Informatics Lab

I'm the Deputy Head of the Met Office Informatics Lab and a Senior Lecturer at the Global Systems Institute, Exeter Uni. Trying to make data useful.