Studentized Residuals for Time Series Anomaly Detection

Andris Piebalgs
Cognite
Published in
6 min readMar 24, 2022

Introduction

Anomaly detection refers to a useful set of techniques in Data Science that help spot outliers in a data set. At Cognite, anomaly detection is one of the many weapons in our data science arsenal that we find especially useful when dealing with time series data. The time series data that we encounter comes from sensor measurements (such as pressures and temperatures) which is typically noisy and contains many anomalous points due to issues such as equipment malfunction and transient phenomena. Anomaly detection helps to remove these point outliers in an effort to refine the signal in time series data. In this blog post, we will take you through the implementation of a simple but highly effective algorithm that can be used to detect point outliers. It is a technique that was implemented in an academic paper by one our Data Scientists which you can find more details about here.

Time Series Anomaly Detection Algorithm

The diagram below illustrates a typical example of time series data that can be observed in the day to day operation of a measurement sensor. The orange line denotes the underlying signal while the blue peaks represent the point anomalous that can occur due to spikes in the measurement reading. The aim of our desired anomaly detection tool in this case is to simply refine the signal by removing those anomalous points.

Plots of the raw and clean time series in a toy data set

We define a point anomaly as any point that is radically different from its expected value. The algorithm that we are showcasing in this post can identify these anomalies through the use of polynomial regression and Studentized deleted residuals. The first step is to define a polynomial curve that provides an estimate for the underlying signal of the data set.

Polynomial equation of degree N

To fit this curve to the data, the coefficients (up to degree N) have to be determined by minimising a certain loss function. Typically this loss function can be defined as a minimisation of ordinary residuals which is calculated as the difference between the actual value and its prediction.

Ordinary residual of point i

There are however inherent limitations in using ordinary residuals to identify outliers. The presence of anomalies can risk skewing the regression coefficients such that outliers aren’t flagged. This limitation can be addressed by re-fitting the polynomial regression for each observation on the same data but with the data point that we are trying to evaluate removed. Thus deleted residuals can be calculated as the difference between the prediction of the re-fitted model and the observation.

Deleted residuals of point i

The above approach suggests that for each data point the regression model would have to be re-fit to determine its corresponding deleted residual. However, there is a mathematical trick that can determine the deleted residuals and standardize them by only calculating the regression fit once on the entire data set. The end residuals are known as Studentized deleted residuals (i.e. dividing the residual by its standard deviation) and can be calculated as follows:

Studentized deleted residuals

The mathematical trick is to use the diagonal of the Hat matrix to adjust the SSE (sum of squared errors) for each observation i. This hat matrix is calculated as :

Hat Matrix Expression

The Studentized deleted residuals can then be used to find anomalous points by looking for exceptionally large deviations. These residuals follow a T distribution with n-1-p degrees of freedom and thus a suitable threshold can be established by calculating the Bonferroni critical value defined as:

Bonferroni Critical Value

α refers to the significance value (typically set to 0.05) and allows us to identify values that we expect to be within an expected confidence interval.

Threshold for anomaly detection

This threshold can then be used to identify and remove any point anomalies in our data set. Additionally, we can apply a correction factor to the BC value to achieve better results (in the paper a value of 1/6 was found to give best performance).

Implementation in Python

To generate the toy data set, we use a baseline polynomial curve that has Gaussian noise added to it. We then add 20 random points to this data which we consider to be our anomalies.

Polynomial regression on this data set can be performed with numpy after converting the datetime index to a list of integers (in this case it is converted to the time elapsed in milliseconds from 1970–01–01).

Calculation of the hat matrix can be performed as follows:

Calculation of Studentized deleted residuals and their corresponding p values according to the T distribution can be performed as :

Finally, to filter out the anomalies with the Bonferroni critical value, threshold appropriately and plot results.

Results

Right enough theory, let’s show some results! We will generate some synthetic data and use the typical classification metrics of precision and recall to determine how well our model works. Using the above method on our synthetic data set, we get a 95% recall rate and a 86% precision rate. This means that we only missed one of our 20 anomalies and that we had 3 false positives.

Anomaly Detector Algorithm Results on Toy Data Set

So far so good, but how would this algorithm perform on actual, real life data? To test this, we can use Open Industrial Data which is a playground data set that is available to the public (more details can be found here). The amount of data in there can be truly overwhelming, but let’s pick a single sensor that is quite important for the industrial process. In this example, we will take a pressure transmitter that measures the surge pressure for the 1st Stage compressor (external id of the tag is pi:160696) and examine the last 50 days of hourly values.

Anomaly Detector Results on Open Industrial Data

A quick visual inspection shows that the anomalies have indeed been successfully removed and that the signal has been refined for further analysis.

Conclusion and Looking Ahead

Hopefully this blog post gives you a small insight into how we here at Cognite achieve some of our Data Science tasks. If you would like to try out this algorithm and any others that you might find interesting, we encourage you to sign up to Open Industrial Data and start playing with real-life data!

Happy coding!

Nomenclature

--

--