In this first post we are going to look for ways to test for mean reversion on time series using the Python programming language, which will give us the basic toolbox to deal with cointegration in future posts.
Testing for mean reversion
Mathematically, a continuous mean-reverting time series can be represented by an Ornstein-Uhlenbeck stochastic differential equation in the following form:
Where θ is the rate of reversion to the mean, μ is the mean value of the process, σ is the variance of the process and, finally, Wt is a Wiener process.
The given equation implies that the change of the time series in the next period is proportional to the difference between the mean and the current value, with the addition of Gaussian noise.
Augmented Dickey-Fuller Test
The Augmented Dickey-Fuller test provides a quick check and confirmatory evidence that your time series is stationary or non-stationary. We should take into account that is a statistical test, and as such it can only be used to inform the degree to which a null hypothesis can be accepted or rejected. So the given result must be interpreted to be meaningful.
The ADF test is based on the simple observation that if the value level is higher than the mean, the next move will be downward while if the value is lower than the mean, the next move will be upward.
We can describe the value changes with the following linear model:
where Δy(t) ≡ y(t) — y(t-1), Δy(t-1) ≡ y(t-1)-y(t-2) and so on.
The function of the ADF is to test if λ=0. If the hypothesis λ=0 can be rejected, that means the next move Δy(t) depends on the current level y(t-1) and so it is not a random walk. The test statistic is the regression coeficient λ divided by the standard error of the regression fit:
Dickey and Fuller have already calculated the distribution of this test statistic, which allows us to determine the rejection of the hypothesis for any chosen percentage critical value. Since we expect mean regresion to be negative and it has to be more negative than the critical value for the hypothesis to be rejected.
In this exercise we will simply interpret the result using the p-value from the test. A p-value below a specified threshold (we are going to use 5%) suggests we reject the null hypothesis (stationary), otherwise a p-value above the threshold suggests we accept the null hypothesis (non-stationary).
A naive implementation of the ADF test in Python is shown here:
For the verification of our implementation we can compare the output of our implementation with the result of the function adfuller, included in the Python module statsmodels from the SciPy ecosystem, when used with the same input.
It is also possible to determine the stationary nature of the time series by measuring its speed of diffusion, because a stationary time series should diffuse from its initial value more slowly than a geometric random walk.
We can model the speed by the variance:
where z is the log values, τ is an arbitrary time lag, and <…> is an average over all t’s. For a geometric random walk, we know that
This relationship turns into an equality with some proportionally constant for a large value of τ, but it may deviate for a small τ. So if the time series is mean reverting this won’t hold.
Now we have to introduce the Hurst exponent H:
For a time series with a geometric random walk behaviour, H=0.5, for a mean reverting series, H<0.5, and, finally, for a trending series H>0.5. H also is an indicator for the degree of mean reversion or trendiness: as H decreases towards 0, the series is more mean reverting and as it increases towards 1, it is more trending.
This is a simple implementation I made of the Hurst Exponent described above. Unfortunately, I couldn’t find any suitable implementation in any SciPy or standard library to compare it to:
Because of the finite sample size, we need to know the statistical significance of an estimate value of H to be sure we can reject the null hypothesis that H is 0.5. To verify this, we are going to use the Variance Ratio Test which can be expressed as
We can express this ratio in the following Python code:
Half-Life of Mean Reversion
Calculating the half-life of a mean reversion time series is very interesting because it gives us the measure of how long it takes to mean revert.
This measure is a way to interpret the λ coefficient in the equation we have already seen:
For this new interpretation we have to transform this discrete time series into a differential form while ignoring the drift βt and the lagged differences Δy(t-1) … Δy(t-k) and we get the following Ornstein-Uhlenbeck formula for the mean reverting process:
This form allows us to get an analytic solution for the expected value of y(t):
If λ is negative for a mean-reverting process, the expected value of the value decays exponentially to the value -μ/λ with the half-life of decay being log(2)/λ.
Finally, to calculate the half-life mean reversion test we can use this simple implementation in Python:
All this theory and practical examples in Python will help us to understand and master the concept of cointegration, which will be thoroughly discussed in a following post. Stay tuned!