Using A Pairs Trading Statistical Arbitrage Approach on Digital Assets

Published in

Digital Alpha Research

11 min readMar 1, 2020

Introduction

In this article, we take it a step further and examine a simple pairs trading statistical arbitrage strategy using the most liquid cryptocurrency pairs.

Statistical arbitrage is primarily based on pairs trading mean-reverting returns. Therefore, for cointegration purposes, we will use the Augmented Dicky-Fuller test, Hurst Exponent and other cointegration approaches to evaluate crypto pairs.

The Importance of Cointegration

A cointegration test is used to establish if there is a correlation between at least two different time series.

Before the 1980s, many economists used linear regressions on non-stationary time series data, which Nobel laureate Clive Granger and Paul Newbold showed to be a dangerous approach that could produce spurious correlation (we’ll discuss this more in detail later). Hence, stationarity is essential.

Stationarity means that the statistical properties of a time series do not change over time. Stationarity is vital because many useful analytical tools and statistical tests and models rely on this property. The main features of a stationary time series are that the mean and variance is constant over time.

Finding Cointegrated Pairs

To effectively find cointegrated pairs, we will perform a specific statistical test on these crypto assets. However, we need to do this carefully to prevent multiple comparison bias. In our case, we will examine BTC and ETH.

A good selection process is to observe the results of a test for cointegration between crypto pairs. After that, we will gather returns results of the cointegration test score matrix, p-value matrix, and select any pairs for which the p-value is less than 0.05.

Testing Pairs with the Engle and Granger Procedure

Cointegration reduces the observed prices to one common factor, the spread et.

We obtain the fitted residual and ADF-test for:

I) Stationarity

II) The Cointegrating Vector

III) The Equilibrium Level

We insert the residual into the error correction equation in order to confirm the statistical significance of coefficients

It is required to confirm the significance for (1 - ⍺) coefficient.

Trading Signals and Design

Now we decide signals. We assume a normal distribution with our trade design to enter on bounds

and exit on et reverting to about the mean level μe .

The Z-score is calculated as,

where:

μ is the mean of the population.

σ is the standard deviation of the population.

Cointegrated prices have a mean-reverting spread, and our signal generation will occur when the spread goes significantly above or below μe. For example, signals are generated from et crossing μe + σeq or crossing μe - σeq.

Therefore, the signal generation and positions for assets A and B are

Half-life and Hedge Ratio

Denoted as t¹⁄₂ , in simple terms, the half-life is the time required for a quantity to reduce to half of its initial value.

In pairs trading terms, this is the duration in which it will take a spread to revert to the mean. Therefore, incorporating a half-life will be a useful decisive measure.

To compute the hedge ratio, we run an Ordinary Least Squares (OLS) regression on the closing prices. The hedge ratio is utilised to generate the spread between the two prices.

Ornstein-Uhlenbeck (OU) Process simulated

Now we look into a bit of stochastic calculus and shed light on a standard stochastic differential equation. We consider this OU process because it generates a mean-reversion process for the spread.

Ornstein-Uhlenbeck Process

Θ is the speed of reversion to the equilibrium μe, and Xt is the Weiner Process

Fitting to Ornstein-Uhlenbeck Process, we obtain the solution to the stochastic differential equation

Two terms of SDE solution comprise of the reversion and autoregression terms.

After we run a regression, we can generate the signal Bounds.

Augmented-Dickey-Fuller Test

The Augmented-Dickey-Fuller (ADF) test is one of the most used tests to test stationarity. Below is the general set-up for the ADF test:

To obtain the optimal lag k, we compute the Akaike information criterion (AIC) or Bayesian information criterion (BIC).

For this project, we will use a unit root with lag 1.

In this case, we would also need to include a linear trend in the ADF formulation. Since when we trade the pair, we do not want to amend the quantities every day, the ADF formulation we use from now on is:

The ADF statistic, used in the test, is a negative number. The greater this negative value is, the stronger the rejection of the hypothesis that there is a unit root at a certain level of confidence.

Hurst Exponent

We implement the Hurst exponent to see whether the time-series is mean reverting, trending, or a Geometric Brownian motion, with this knowledge, we can optimize and compute varied weights to the type of market condition.
To compute the Hurst Exponent calculation, we can use the variance of a log price series to assess the rate of diffusive behaviour with an arbitrary time lag τ.

We then modify the equation to include an exponent value “2H”, which gives us the Hurst Exponent value H:

Hurst Exponent

A time series can then be characterised in the following fashion:

H < 0.5 — The time series is mean reverting
H = 0.5 — The time series is a Geometric Brownian Motion
H > 0.5 — The time series is trending

Performance Indicators

Our evaluation of the strategy will be on the following performance ratios; Sharpe Ratio, Rolling Sharpe Ratio and Drawdown.

The Backtest

With the theories we have mentioned, let’s run a test, compute results and discuss some analysis.

We will observe BTC and ETH from 2016–01–01 to 2019–01–01 and compute the cointegration.

S1 = df['BTC-GBP']
S2 = df['ETH-GBP']
score, pvalue, _ = coint(S1, S2)
pvalue0.0006501400341705963

As expected, BTC vs. ETH is highly significant.

We normalize the signal of the absolute spread of our signal by treating it as a Z-score to get a better statistical understanding.

Now we can associate probabilities to the signals. If we see a Z-score of 1, it implies one standard deviation of one above the mean and lists a probability of 0.8413 or 84%, therefore, a z-score of 2 lists a probability of 98%.

Given the nature of the spread, it is better to test for a smaller time frame as the mean reversion occurs over two years (2017–09 to 2019–01)

So we look at the period 2019–09 to 2019–12

Given this new timeframe, we can observe mean reversion over a shorter time horizon.

Our Simple Strategy

Go “Long” the spread whenever the z-score is below -1.0
Go “Short” the spread when the z-score is above 1.0
Exit positions when the z-score approaches zero

Examining the price ratio of a trading pair is a traditional way to handle pairs trading. Part of why this works as a signal is based on our assumptions of how stock prices move, precisely, because stock prices are typically assumed to be log-normally distributed. This implies that by taking a ratio of the prices, we are taking a linear combination of the returns associated with them.

In the graph above, we have calculated the ratio over the observed time and plotted the mean ratio with a black line.

ratio = S1/S2
ratio.plot()
plt.axhline(ratio.mean(), color=’black’)
plt.legend([‘Price Ratio’]);

Implementation

Out of Sample Test

We have constructed our spread appropriately and shown how we will go about making trades. Therefore, we can conduct some out of sample testing.

A summary of this strategy would explain that a critical step is to test whether these pairs are cointegrated. However, but we built it on information from a prior time period.

A thorough analysis involves implementing this model in an out of sample framework to confirm that the principles of our model are still valid going forward.

Since we initially built the model on the ‘2018–3–1’ to ‘2018–6–1’ year, let’s see if this cointegrated relationship holds for ‘2018–6–1’ to ‘2018–9–1’. Historical results do not guarantee future results, so this is a sanity check to see if the work we have done still holds strong.

S1 = df['BTC-GBP']
S2 = df['ETH-GBP'] 
score, pvalue, _ = coint(S1, S2) pvalueOut[31]:0.7582819159502596

Unfortunately, since our p-value is above the cutoff of 0.05, we conclude that our model will no longer be valid due to the lack of cointegration between our chosen cryptocurrency pairs. If we tried to deploy this model without the underlying assumptions holding, we would have no reason to believe that this trading framework is valid. If we go ahead with the model, we’ll expose ourselves to possible spurious correlated results.

In statistics, a spurious relationship or spurious correlation is a mathematical relationship in which two or more events or variables are associated but not causally related, due to either coincidence or the presence of a certain third, unseen factor.

A Pairs trading overview on the Quantopian platform

The Quantopian platform currently only supports US equities for backtesting. One cannot import pricing data for bitcoin and or use this data in the back tester. This constraint is within their documentation.

Therefore, let’s use a cointegrated equities pairs.

With the theories we have mentioned, let’s run a test, compute results and discuss some analysis.

The ten Equity assets we’ll analyze are Microsoft Corporation (MSFT), Apple Inc. (AAPL), Amazon.com Inc. (AMZN), Facebook Inc. Class A (FB), Berkshire Hathaway Inc. Class B (BRK.B), Johnson & Johnson (JNJ), JPMorgan Chase & Co. (JPM). Alphabet Inc. Class C (GOOG), Exxon Mobil Corporation (XOM) and Visa Inc. Class A (V).

The cointegration analysis will fall between 1st of January, 2018 and 1st January, 2019.

We observe the tail of the ten pairs below,

We create a matrix heatmap to show p-values less than 0.05 of the cointegration test between each pair of the stocks.

The greener the square, the lower the cointegration value.

P-values below 0.05 are: ‘MSFT’ vs. ‘V’ and ‘BRK_B’ vs. ‘JNJ’

‘MSFT’ vs. ‘V’ = 0.00472187‘BRK_B’ vs. ‘JNJ’ = 0.00390914

Now we further observe the P-values and take ‘BRK_B’ vs. ‘JNJ’ as it is more statistically significant.

Applying the ADF gives the following output:

(-4.1990415091790121,0.00066136612059421152,0,250,{‘1%’: -3.456780859712,‘10%’: -2.5729685440000001,‘5%’: -2.8731715065600003},1167.971040124334)

From this, we can see that the test statistic of -4.199 is larger in absolute terms than the 10% test statistic of -2.572 the 5% test statistic of -2.873, and the 1% test statistic of -3.456, meaning we can reject the null hypothesis that there is a unit root in the spread time series, and is therefore not mean reverting, at the 10%, 5% and 1% level of significance.

The p-value of 0.00066 means that we can reject the null hypothesis up to the 0.066% significance level. That’s very good in terms of statistical significance.
To further explain the spread, we can compute a function to calculate the Hurst exponent of the spread series.

Implementation Part 2

We assume Z=1, and we run the backtest over the period 2017–2019 we will also not allow for multiple orders.

We’ll use SYK vs. AMZN as our pairs that we’ll trade together as they are also highly cointegrated with the following value 0.0394.
We obtain the following backtest results:

Total Return: 96.64 %Sharpe Ratio (6-month   rolling Sharpe ratio):1.42Drawdown: -31.98%

Below we have interesting statistics like the daily annual returns, Distribution of monthly returns and monthly returns. For example, in 2018, we had a best month of 28% and a worst month of -9.7%

Above, we have our results from the previous simple strategy where we go long or short, at a Z-score of 1 and -1, respectively.

We have a painful drawdown, an impressive Sharpe ratio of 1.42, a Beta which is 0.99 which implies its 99% it’s highly correlated with the market which is terrible performance-wise, and good returns that beat the SPY by 60%.

Now we chose different values for the Z-Scores with an increment of 0.5 with the inclusion of 0.25 as a starting value, we also choose whether or not it is better to have more than one open position in the same spread direction.

From the table below, we can observe that Z-score of 1 and 2 have the highest Total return. We also find that the higher the Z-score, the less the Beta. It is shown that in most cases, having more than one position open in the same direction increases the drawdown but also increase the Total return. The Z-score of 2 has a drawdown and Sharpe with the best performance metrics.

Conclusion

The main aim of this study is to test for cointegration via Engle and Granger procedure and others like the ADF to create strategies and test the performance metrics of various Z-scores.

Further research can be on; Adding stop-loss orders and trailing profits based on the Average True Range (ATR) on one or both directions of the pairs could improve profitability and reduce losses, and consequently, reduce the maximum drawdown.

Creating a basket of cointegrated pairs to trade with low betas will make the equity curve more consistent, and it will minimize risk due to its diversification qualities.

We can use deploy cross-validation methods like the Walk forward optimization. In effect, this can reduce overfitting outcomes and survivorship bias. Hence, the out-of-sample data plays a crucial role in determining the validity and reliability of the system and is a realistic estimate of how a system should work in real markets.

Regarding trade design, long waiting times affect cost-of-carry and increases the risk of regime change in the cointegrating relationship. A different approach or trading methodology is to reduce the position by half if it exceeds its half-life estimation.

There are also other ways to approach mean-reversion, for example, modelling the spread with a Vector-Error-Correction-Model — VECM.

For more information on how Digital Alpha Research can help you, visit https://www.digitalalpharesearch.com/ to get in touch.