Probabilistic programming for event detection

Published in

Met Office Informatics Lab

7 min readDec 7, 2020

Imagine you are a scientist, studying a particular kind of event: perhaps a volcanic eruption or a lightning strike. You have some data in the form of a time series, which you are confident you can use to detect the event you care about. You can even describe with some confidence the characteristic signature by which the event will reveal itself in the data. Problem solved… right?

Unfortunately, real time series are often noisy and difficult to interpret. Even when we know what to look for, separating the signal from the noise can be a challenge. And even if we can manage it by eye, reliably automating the process may be another matter.

In this post, I’ll explain why probabilistic programming can be a powerful tool in this scenario, and show a very simple example of how it can be applied.

Probabilistic programming

We have blogged about probabilistic programming in the past, in a series on inferring parameters of physical models. The first post gives an introduction to many of the basic concepts we will use here.

For our purposes, the most important point is what we might call the fundamental principle of probabilistic programming:

In order to interpret data, we describe the data generating process using code.

Once we have described the data generating process, inference can be carried out by standard methods. Probabilistic programming languages (PPLs) provide a lot of helpful scaffolding for this part of the process. As scientists, our job is to describe the data generating process as accurately as possible, in a form the PPL can understand.

This approach is quite different to other techniques we might use for event detection, such as supervised learning. Instead of training a model on many examples, we can simply write down a model of how our data has been generated. It is especially useful for handling noisy time series data; as long as we can write down a generating process for the noise, disentangling it from the signal essentially comes for free.

If you’re a statistician, these ideas are probably familiar to you. Indeed, probabilistic programming can be seen as a framework for carrying out statistical inference. However, PPLs do have some convenient properties:

Data models are described using code. This gives a lot of flexibility in their definition, and connects more naturally with how scientists often think about the processes they are studying.
The data modelling is decoupled from the inference. As a result, domain specialists can define data models based on their expertise without needing to worry about developing inference algorithms.

As in previous posts, the PPL we’ll be using is Pyro, a toolkit built on top of the PyTorch framework in Python.

Data generation

We’ll start by generating some synthetic data representing the events we are looking for, representing an event by a “1” and a non-event by a “0”.

These events are just instantaneous blips. In reality, we’re often interested in events which extend over a longer period of time. We can model this by convolving our event time series with a filter describing the time evolution.

In this case, we’ll just use a simple filter with an initial peak and linear decay:

This gives us a clean observed time series like so:

Data science would be a lot easier if all observations were as clean as this. To make our detection task more challenging, we’ll add some noise.

In fact, we won’t just add any noise. To make our job more challenging, we’ll add noise which is quite difficult to distinguish from the signal we’re trying to detect. We can do this by introducing some spacial correlation into our noise, using a Matérn kernel.

Combining the signal and noise gives us our synthetic observed time series. Hopefully you’ll agree that it’s not easy to distinguish the event signals from random fluctuations.

Defining our Pyro model

Defining a model in Pyro means describing the process we believe generated our data, in code. Since we’re working with synthetic data, this looks a lot like our original data generation code. The main difference is that we’ve replaced our event series with a Normal distribution passed though a sigmoid function, and that we declare our variables using pyro.sample and deterministic.

Why the sigmoid function? Working with discrete events in Pyro turns out to be a little tricky, so we’re going to cheat! Instead of using a discrete model, we’re going to approximate one by passing a continuous variable through a sigmoid activation. That way everything is continuous, which will make our lives easier when we get to the model inference.

It’s also worth noting that we centre our distribution for p slightly below zero. Roughly, this indicates that we expect events to be infrequent, and could perhaps be treated as a hyperparameter.

Inference

For our inference step, we’re going to use stochastic variational inference (SVI). This means we’ll be estimating parameters with specified distributions instead of performing full Bayesian inference. To use SVI, we’ll first need a guide function to define the parameters we want to estimate and how they relate to our model. We’ll use a parameter p0 to describe our event time series before it passes though the sigmoid activation.

Before running the inference, we condition our model on our synthetic observations.

Now we run our SVI. If you’re familiar with PyTorch, you may notice that this looks a lot like regular gradient descent.

Now that the inference is done, we can see what our model has learned. Here is our inferred parameter p0:

To get an estimate for our event time series, we need to pass p0 through a sigmoid activation the same as in our model. Here’s what we get:

Success! We’ve recovered the times when events took place. Not bad, considering the mess we started with:

Mis-specified models

These results are encouraging, but perhaps a little optimistic. After all, we’ve assumed perfect knowledge of the response signal to an event, as well as the distribution of the noise. If we don’t have this knowledge, would our approach be robust?

We can get an idea by replacing our Pyro model with another, one that is deliberately mis-specified. Hopefully, if we just change the model a little bit the results will still be useful.

With this in mind, let’s try replacing our noise model by one with a longer lengthscale. We’ll make it twice as long, so the lengthscale is 6 instead of 3. Here’s what we get:

Not bad! Using a slightly mis-specified noise model does not seem to have hurt our results.

Just as a sanity check, we can also try using an extremely mis-specified model. We’ll turn the lengthscale parameter for our noise model all the way up to 60, which is 20x longer than our original value of 3. Here’s what happens:

Not surprisingly, it’s a bit of a mess. Although the events have been detected, there is no longer a clear distinction between the events and the noise. The lesson here is that we can get away with using a mis-specified model - but only up to a point!

Discussion

This post has shown how the probabilistic programming language Pyro can be used for simple event detection in time series. All that is required is a model of the data generating process, expressed in code, including the event response and a noise model.