Exploring Statistical Arbitrage in Cryptocurrency

Ze Chen
Analytics Vidhya
Published in
5 min readJun 7, 2020

Introduction

Statistical arbitrage is one of the most common strategies in the world of quantitative finance. So, I decided to embark on a project last summer to learn about this strategy and eventually apply it in the cryptocurrency market.

The crypto market is highly correlated and highly volatile, which is great for quantitative strategy betting on convergence between two correlating securities. But through fixing bugs in my backtest and learning more about the fundamentals of statistical arbitrage, I have come to the conclusion that applying this strategy profitably into the cryptocurrency market is not an easy feat. The transaction fee is high at most major crypto exchanges, and the depth of the orderbook makes it impossible to take sizeable order without causing too much slippage. Next I will delve deep into the trading logic, findings, and the future plans.

Trading Logic

Using online resources[1], I started learning about what consists of pair trading. I found a few articles online that gave me a model for how to start.

Filtering Process (finding pairs to trade)

  1. Due to the lack of regulations in the crypto space, I first narrow down the exchanges I shall operate on, to ensure that the order book depth is legitimate.
  2. I then select the coins that have a volume of at least $1 million each day, to ensure liquidity[2].
  3. Filter potential pairs through

(1) Correlation using Pearson correlation

(2) Cointegration using Engle-Granger two-step cointegration test

(3) Stationarity using ADF test

Trading Steps

  1. Calculate the hedge ratio of each pair using Kalman Filter Regression.
  2. Calculate the spread using the equation
spread = y - hedge ratio * x

3. Calculate the z-score using

(current spread - average of spread over past x days) / std of spread over x days

Here, x is the lookback period. We calculate it using half-life. For more info on this check out the links from the footnote.

4. Define arbitrary entry z-score and exit z-score.

When z-score crosses upper entry z-score, go SHORT; close the position with z-score return exit z-score; When z-score crosses lower entry z-score, go LONG; close the position with z-score return exit z-score.

Implementation

For the data I decided to pull all the XYZ/USD trading pairs from Bitfinex. Bitfinex, according to a Bitwise presentation made to the SEC, is one of the few exchanges with real volume. Due to the lack of regulation is the crypto industry, wash trading and other illegal activities are rampant in less-known exchanges. Therefore, selecting the right exchanges is a critical step. I selected USD as the base currency because other pairs, such as trading pairs based on BTC, collectively have much lower trading volume.

After testing for correlation, I ran the pairs through cointegration test. For this step, I used Engle-Granger two-step cointegration test method. Due to the fact that this method produces different result if you choose different dependent variable, I decided to calculate cointegration for all pairs, and then select the pairs that have cointegration p-value of < 0.01 for both XYZ/USD and USD/XYZ.

Running cointegration test on all candle size yields these pairs as the cointegrated pairs:

[(‘tLTCUSD’,‘tIOTUSD’),(‘tEOSUSD’,‘tNEOUSD’),(‘tEOSUSD’,‘tTRXUSD’,(‘tIOTUSD’,‘tNEOUSD’),(‘tNEOUSD’,‘tOMGUSD’),(‘tEOSUSD’,‘tIOTUSD’),(‘tIOTUSD’,‘tOMGUSD’),(‘tLTCUSD’,‘tEOSUSD’),(‘tLTCUSD’,‘tNEOUSD’),(‘tOMGUSD’,‘tTRXUSD’),(‘tDSHUSD’,‘tXMRUSD’)]

Then I further refined the results by running ADF test on the spread to find whether the spread between two pairs is stationary.

For the resultant graphs, please see Appendix A. In it, you will find the graphs of the prices, the hedge ratio, the spread and the z-score for each cointegrated pairs.

Interestingly, the ADF test shows the same pattern as cointegration test, which is, the more data points used (more frequent candlestick. For example, going from using 1D candlestick to 6h candlestick), the more pairs pass the ADF test for stationary. I have compiled the result in Appendix B. Please see Appendix C for a comparison of the graphs using different candlestick.

Backtest

The results without commission fee are all fairly promising, but once commission fee is added, the Sharpe ratio all drop to negative values. Here is one such example:

Figure 1. price comparison; hedge ratio; spread vs. z-score; return vs. drawdown of OMGUSD/TRXUSD

As one can judge from the spread graph (the third part of Figure 1.), this is a stationary spread. Due to the overwhelming transaction fee (I used 0.2% for the backtests, but it may be lower depending on the exchange) , all my in-sample backtests failed to produce any effective results. I tried alternating entry z-score, exit z-score, lookback period for calculating z-score. However, it did not help with the result and all resulted in graphs similar to the one above. Sometimes the test would produce a pair with positive Sharpe ratio, but rarely would that ratio exceed 1.

Conclusion

Although the whole crypto market seems to move in tandem, a statistical arbitrage strategy that takes advantage of that fact isn’t easily devised. Assuming a high transaction fee environment, a statistical arbitrage strategy using Kalman Filter does not perform well in the crypto market. Further research is needed for this particular strategy to be successful.

It is important to analyze the nature of a market and narrow down the strategies accordingly. A high frequency strategy may be unfit for a market with high trading commission and low volume. Further optimization of the parameters of Kalman Filter could be done, but a much deeper mathematical understanding on the subject will be needed to understand which other state space models could be used toward this use case. Going down the road, two directions shall be tried:

  • Combine this strategy with other techniques, such as using machine learning, to add another layer of filter to the trading logic.
  • Apply this strategy into traditional markets, such as the futures market and the stock market.
  • Examine other state-space models to try out as statistical arbitrage strategies.

[1] Inspirations: Kalman Filter Techniques And Statistical Arbitrage In China’s Futures Market In Python, High Frequency and Dynamic Pairs Trading Based on Statistical Arbitrage Using a Two-Stage Correlation and Cointegration Approach

[2] $1 million is actually still far too low to operate a profitable pair trading strategy, but the crypto market is too small for anything higher, so I settled on $1 million.

--

--