RAPID Fractional Differencing to Minimize Memory Loss While Making a Time Series Stationary

Ritchie Ng
RAPIDS AI
Published in
6 min readOct 3, 2019

Summary

From machine learning to deep learning models on time series data, we typically require stationarity for stable out-of-sample predictive accuracy.

Typically we attempt to achieve some form of stationarity via a transformation on our time series through common methods including integer differencing. However, integer differencing unnecessarily removes too much memory to achieve stationarity. An alternative, fractional differencing, allows us to achieve stationarity while maintaining the maximum amount of memory compared to integer differencing. While existing CPU-based implementations are inefficient for running fractional differencing on many large-scale time series, our GPU-based implementation enables rapid fractional differencing.

This post walks you through our (ensemblecap.ai and National University of Singapore) intuitive GPU implementation of fractional differencing (GFD) leveraging on RAPIDS cuDF that achieves more than 100x speed-up over a CPU implementation.

The following link leads to the notebook to reproduce all the results (plots, tables, and code) in this post: https://github.com/ritchieng/fractional_differencing_gpu/blob/master/notebooks/gpu_fractional_differencing.ipynb

Necessity of Fractional Differencing

Before we get into why fractional differencing is critical, let’s plot a simple time series, S&P 500 pulled from the Fed’s database. Without running any tests, we can clearly observe a non-stationary time series here.

Traditionally we typically might difference our time series by one day (daily returns) or more to reach some form of stationarity via tests like Augmented Dickey–Fuller (ADF) test, Kwiatkowski–Phillips–Schmidt–Shin (KPSS) test, Phillips–Perron test, and more. The following plot shows a stationary time series after a 1 day differencing.

However, these forms of integer differencing causes more information than needed to be lost. Fractional differencing allows us to achieve stationarity while maintaining the maximum amount of memory compared to integer differencing.

This was originally introduced in 1981 in his paper “Fractional Differencing” by J. R. M. Hosking and subsequent work by others concentrated on fast and efficient implementations for fractional differentiation for continuous stochastic processes. Recently, fractional differencing was introduced for financial time series through the fixed window fractional differencing instead of the expanding window method by Marcos Lopez de Prado.

Fractional Differencing

In the following plot, you can see how for integer differencing, specifically when the value of d is 1, you can see how the weights are 1 for lag 0, -1 for lag 1, and 0 subsequently for longer lags. Intuitively, this implies that to calculate the integer differed value at lag 0, we simply do an element-wise multiplication of the time series raw values at each lag with the weight values associated with each lag.

To sum it in a simple equation: Integer differenced value at lag 0 = 1 * raw value at lag 0 + (-1) * raw value at lag 1

And you can see for fractional differencing, you simply have smaller weights from lag 1 and further to encapsulate more information from the past and difference against those raw values. Essentially you can observe how you are decaying the cumulative weights slowly for smaller values of d for fractional differencing. The key is to find a good value in between 0 to 1.0 for fractional differencing to have a stationary time series while minimizing information loss.

Fractional Differencing Example

Assume raw time series have values 100, 99, 98, 97, 96, 95, 94 from lag k=0 to k=6.

For d = 0 (no differencing)

differenced_value = 100 * 1 + 99 * 0 + 98 * 0 + 97 * 0 + 96 * 0 + 95 * 0 + 94 * 0 = 100

For d = 0.4 (fractional differencing)

differenced_value = 100 * 1 + 99 * -0.400000 + 98 * -0.120000 + 97 * -0.064000 + 96 * -0.041600 = 30.576

For d = 1 (integer differencing)

differenced_value = 100 * 1 + 99 * -1 + 98 * 0 + 97 * 0 + 96 * 0 + 95 * 0 + 94 * 0 = 1

Fixed Window Fractional Differencing

For computational efficiency, we want to stop the calculation of weights when the weights are too small (below a certain small predetermined small value). Imagine we had a time series with 1,000,000 data points, we do not want to compute the weights up till 1m lag for our latest data point! We can simply calculate the weights till they’re too small which is what this function does. This is called fixed window fractional differencing.

In the following plot, we show the weights associated with d = 0.5 and a minimum weight value (floor) of 5e-5 for a fixed window fractional differencing function. You can see how weights smaller than 5e-5 are ignored.

When we run fixed window fractional differencing function for our S&P 500 time series, we’ll arrive at a stationary time series.

Running an Augmented Dickey–Fuller (ADF) test, we’ll have the following results. You can see how we’re able to have a stationary time series with fractional differencing without requiring integer differencing.

GPU Fixed Window Fractional Differencing

In the notebook, we gave both CPU and GPU implementations of Fixed Window Fractional Differencing. And you can see more than 100–400x speed-up over CPU implementations for 10m-100m data points.

The GPU implementation is done through RAPIDS cuDF.

This notebook illustrates how you can easily port over CPU-based numpy and/or pandas codes with GPU capabilities through cuDF. For more advanced users, NVIDIA recently released a prototype comparing our method with theirs that leverages on Numba and gQuant that has a further 100x speed-up. A blog on this to come shortly. Although their implementation is slightly more complicated and you should start with this introductory notebook before diving into the more advanced code.

Summary

In this post, we covered fractional differencing, in particular the fixed window fractional differencing variant where we can achieve stationarity in our time series while minimizing information loss. On time series with millions of data points, fractional differencing can be computationally expensive, hindering rapid experimentation or real-time deployment. We show, through RAPIDS cuDF and numba, we can achieve 100x to 10000x speed-up depending with links to various implementations from easy (100x speed-up) to more complex (10000x speed-up).

To learn more about fractional differencing and the derivation of the equations, you can refer to Hoskin’s original paper or Prado’s book .

Feel free to raise an issue on the official Github repository to suggest improvements, report bugs or simply join in the conversation with a comment below!

--

--