Time series trend detection with Bayesian methods

Photo by Alessia Cocconi on Unsplash

Motivation

Lackluster fitness performance by the author
  • There are metrics that tend to change in stepwise fashion — they sometimes jump as you release new features or change business processes, and then stay at the new value until next change.
  • You want to large trend changes to be detected and reported, in case they are unexpected and negatively impact the business
  • You don’t want minor changes to be reported, as otherwise investigating them costs more than any benefits
  • Individual measurements are normally distributed around a trend
  • The trend may change only on Monday

Classical probability model

# data is an array with 28 measurementsimport scipy.stats
scipy.stats.ttest_1samp(a=data2[7:14], popmean=10)
Ttest_1sampResult(..., pvalue=0.0028212592801565937)
  • Classical statistics is only applicable in specific cases, and quite some knowledge and care is necessary to apply it correctly. Even above, I should have first applied another test to see if data is really normally distributed, and if that test fails, I have nowhere to go.
  • We have no parameters to control this model. Even if we have prior information how likely is trend change, we can’t encode this.
  • If we have multiple related metrics, there is no way to encode their relation
  • We either keep the old trend, and completely forget the past and use new week data

Bayesian model

  • It uses computer to brute-force an answer; you specify which data is randomly generated using what family of distribution, and the computer tells what parameters of the distribution would make sense.
  • If you have prior assumptions about parameters of the distribution, you can specify them too
  • Parameters, in turn, can be hierarchically connected to other random variables
  • In the end, you can get not just single number, but a distribution of likely values. Or, if you still need a single number, it can the most likely value, not necessary one of two options.
with pm.Model() as model:
# Week 0 has a clear trend
trend_0 = 10

# On each Monday, there is a normally-distributed trend change
# sigma controls how likely a trend change is
changes = pm.Normal("changes", mu=0, sigma=10, shape=3)
# Week 1 data is normally distributed around its trend
# Sigma here controls how much variance inside week we have
trend_1 = trend_0 + changes[0]
actual_1 = pm.Normal("actual 1",
mu=trend_1, sigma=10, observed=data[7:14])
# Week 2 is similar
trend_2 = trend_1 + changes[1]
actual_2 = pm.Normal("actual 2",
mu=trend_2, sigma=10,observed=data[14:21])
trend_3 = trend_2 + changes[2]
actual_3 = pm.Normal("actual 3",
mu=trend_3, sigma=10, observed=data[21:28])
# Compute most credible values for trend changes
map = pm.find_MAP()
$ {'changes': array([ 5.41521218, 1.32171058, -0.51981033])}

Regularized linear regression

Using the Laplace distribution

with pm.Model() as model:
....
# b parameter of 10/math.sqrt(2) results in the same variance
# as the normal distribution with sigma of 10 that we tried
# initially, only the shape is differene
changes = pm.Laplace("changes", mu=0,
b=10/math.sqrt(2), shape=3)
...$ {'changes': array([5.69020714e+00 1.87704650e-04 7.52844191e-09])
  • There’s a large trend change on the second week that we can report
  • The changes on the third and fourth weeks are infinitesimal, we can reasonably ignore them

Conclusion

  • Instead of intuitions about distribution shapes, we’d look at our data and systematically build a data generation process
  • We’d model seasonality, assume the trend itself is linear in time, and employ covariates.
  • Finally, we’d keep track of detected trend changes, whether they are confirmed by users or not, and adjust our model sensitivity.
  • The complete code can be found in a companion notebook.
  • Prophet ideas can be found in the original paper
  • Using PyMC to find the location of a single changepoint in time series data is discussed in many posts, for example by Chad Scherrer.
  • Using PyMC to model seasonality is discussed in PyMC Examples.

--

--

--

Big Data Engineer at Joom.

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Data analyst: Support African media to create compelling data-driven stories

A Tutorial About Market Basket Analysis in Python

[Data Project] Identifying Toxic Comments

L2 Regularization | one minute summary

Inspiration to create a property insights web app after reading Blackstone real estate article

MY CAPSTONE PROJECT-CHENNAI_ ZOMATO RESTAURANT DATA.

Rat City: Visualizing New York City’s Rat Problem

How we optimized PostgreSQL queries 100x

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Vladimir Prus

Vladimir Prus

Big Data Engineer at Joom.

More from Medium

XAI — Explain Single Predictions with a Local Explanation View Component

Local Explanation View - XAI Understanding ML Models

Model Drift

Explainable AI with PDP (Partial Dependence Plot)

A Comprehensive Guide to Time Series Analysis and Forecasting