Quant Post 3.1: A guided path into Mean Reversion

This article aims to tackle some of the foundations of mean reversion using stock market data.

Antonio Constandinou
6 min readJul 25, 2018

Purpose

In a previous post, I built a stock data warehouse. In this post, I will analyze various mean reversion concepts with our stock data between the dates of ‘2004–12–30’ and ‘2010–12–30’. This date range covers about half of our sample data.

Let’s jump right into it!

We will briefly cover the following topics:

  • Time series models
  • Augmented Dickey Fuller test
  • Hurst Exponent
  • Half-Life

In part 3.2 of this blog series, I will dive into the concept of cointegration vs. correlation and establish equity pairs to be used in a pairs trading model.

Assuming things go well in post 3.2, in post 3.3, we will back test the equity pairs established to verify the viability of our pairs trading model.

GitHub Repository

For access to the code base that I will be discussing in this blog post, here is a link to the GitHub repository.

*Disclaimers*

The research presented here is meant to be strictly educational and is not meant to be used for any trading or investing ideas. Any information you use is completely at your own risk.

Time Series Models

In this research series, we are making use of time series models in analyzing mean reversion in stock data. Time series models use the temporal ordering of our market data at equal intervals. Since our stock data is daily data, our time interval is daily. The temporal ordering of our series implies that the statistical tests to follow use a market’s already taken path.

Main Goals

The two essential goals of time series models are to identify the nature of the phenomenon we are looking for, and to forecast it going forward.

Mean Reversion Theory

In the context of mean reversion, our goal will be to test statistically whether the behavior of our market data differs from that of a random walk.

A random walk is a time series where the next directional movement is completely independent of past movements. The term having ‘no memory’ is often applied to the concept of a random walk.

For mean reversion to exist, we need to have a behavior that exhibits a path where the change in value in the next time period is proportional to the current value compared to its mean.

So what continuous time series model can we use to display mean reversion? The answer, a time series that behaves like an Ornstein-Uhlenbeck (OU) process.

Ornstein-Uhlenbeck

An Ornstein-Uhlenbeck process will measure the change of a price series in the next continuous time period and see if it’s proportional to the difference of the mean price and current price, with the addition of Gaussian noise.

Augmented Dickey Fuller Test (ADF)

Before jumping into the ADF test, it is worth mentioning python’s elaborate time series library: statsmodels.tsa. To compute the ADF test we will use the adfuller method.

The ADF assumes that the time series stems from some autoregressive process of order p — (AR(p)). It is typically satisfactory to set p = 1, but keep in mind that this does introduce another parameter into our ADF test.

ADF Null Hypothesis

  • Calculate a test-statistic using our adfuller method.
  • Our null hypothesis is a unit root of 0 which would indicate that our time series is a random walk (non-stationary).
  • We can reject the null hypothesis if our test-statistic is smaller (more negative) than various critical values supplied by the adfuller method. We are provided with three critical values, 1%, 5% and 10%.
  • In our use case, we will have two sets of stocks that we can consider as statistically significant.

Rejecting the Null Hypothesis

Group 1: we reject the null hypothesis at a t-test that is less than our 1% critical value — p-value < 0.01

Group 2: we reject the null hypothesis at a t-test that is less than our 5% critical value — p-value < 0.05

Sample Size

Jumping into the command line, I want to first see how many stocks have a start date of ‘2004-12-30' in our database. We can see that there are 423 stocks that will be used in the tests to follow.

Total number of stocks used in our statistical tests.

Python specific parameters + output

  • Python script is titled aug_dickey_fuller_test.py (link).
  • Our date range for our ADF test will start in ‘2004–12–30' and run until ‘2010–12–30’ which covers around half of the available data.
  • Output text file mr_stocks_adf_1.txt will include all stocks who’s t-test pass the 1% critical value.
  • Output text file mr_stocks_adf_5.txt will include all stocks who’s t-test pass the 5% critical value.

Results

Of our 423 stocks tested:

  • 1 has passed the 1% critical value. That stock is ticker ‘SYMC’.
  • 8 stocks passed the 5% critical level for this time period. Stock tickers are CNC, WFC, GPN, JPM, EBAY, BIIB, IRM, QCOM.

Let’s now use the Hurst Exponent in the next section to re-verify our data set for mean reverting stocks.

Hurst Exponent (HE)

The output of the Hurst exponent will be a value between 0 and 1.

  • a value > 0.5 indicates a trending time series. The greater the value above 0.5 the more trending it is.
  • a value = 0.5 indicates a random walk.
  • a value < 0.5 indicates a mean reverting time series. The closer the value gets to 0 the more mean reverting it is.

HE Measure of Mean Reversion

The Hurst exponent will measure mean reversion in a time series by examining whether the rate of diffusion of the series is less than the rate of diffusion in a Geometric Brownian Motion.

Python specific parameters + output

Here is a link to the HE python script.

  • lags of 2 to 100 will be used
  • any stock whose HE is < 0.5 will be collected into an array and outputted to a text file he_stock_list_2010_12_30.txt

Results

For all 423 stocks:

  • average HE: 0.4827
  • median HE: 0.4857
  • standard deviation: 0.0518

For all mean reverting stocks (HE < 0.5) — totaling 265:

  • average HE: 0.4517
  • median HE: 0.4591
  • standard deviation: 0.0362

Let’s further analyze this subset of stocks by measuring their half-life.

Half-Life

For our 265 stocks identified as mean reverting in our HE test, let’s examine statistics on their computed half-life.

Half-Life

The half-life measures how long it would take a time series to revert back to half it’s initial deviation away from the mean.

One could use this as a selection criteria to only trade a time series whose half-life is no more than some time period. For example, we can consider a time series for mean reversion only if it’s half life is ≤ 50 days. Secondly, the half-life can be used as a time limit after entering a given trade.

Python specific parameters + output

Here is a link to the half-life python script.

  • load all tickers from our HE output file that are considered mean reverting (i.e.: HE < 0.5)
  • compute their half-life
  • for all values ≤ 50, output a text file halfL_passed_tickers.txt
  • for all values > 50, output a text file halfL_failed_tickers.txt

Results

  • only 1 stock had a half-life ≤ 50, stock ticker WFC
  • average HL: 313.3
  • median HL: 187.2
  • standard deviation: 431.49

Conclusion

It is clear that depending on the test implemented, one can derive very different mean reverting characteristics from stock data.

  • Only 9 stocks passed our critical values test in ADF (1% and at 5% combined) whereas 265 stocks were evaluated to be mean reverting using HE.
  • Half-Life of HE stocks were on average ~ 314 days. This poses limitations in considering single stocks for a mean reversion trading model.

Next Steps

We’re now ready to move on and develop a pairs trading model.

In our next post (3.2), we will utilize the concept of cointegration alongside our stock database to isolate pairs of stocks that we can pairs trade with.

--

--

Antonio Constandinou

A finance professional who is passionate about programming, big data and quantitative research.