Intro to Statistical Arbitrage in Crypto — Pairs Trading [Part8]

DΞΛNDRΞΞ
Coinmonks
Published in
18 min readMay 20, 2019

--

Check out my latest app — TradingGYM. It’s a trading simulator that helps you practice trading with a much faster feedback/learning loop and minimal lookahead bias. Once you open the app, you will see a chart with a random asset at a random point in time. Make trades and fast-forward time to see how it played out.

This is Part 8 in multi part series:

  • Part 1: Basic strategies, introduction, setup and testing vs June-July market.
  • Part 2: Advanced strategies and where to find them, testing vs June-July market.
  • Part 3: Basic and Advanced strategies testing vs August market.
  • Part 4: Neural Network strategies description and backtests against September market.
  • Part 5: Neural Network strategies backtests against October market.
  • Part 6: Did Neural Network strategies predict November 14th price drop?
  • Part 7: Crypto Trading 2018 in Review: 17 Advanced + 15 Neural Net strategies tested
  • Part 8: Intro to Statistical Arbitrage in Crypto — Pairs Trading
  • [NEW] Part 9: Crypto Trading 2019 Half Year Review: 17 Advanced + 15 Neural Net strategies tested

In this part, I start to explore more advanced strategies that are rooted in math and statistics. When you look for actual quant strategies, one of the first things that come up is pairs trading, subset of statistical arbitrage. Some might argue that this is not a real quant strategy, but it’s still a step up compared to indicator based ones.

Over the years, pairs trading has seen a steady decline in results (like most strategies once they go public), some sources claim that pairs had ok results until mid 2000’s. But, since Crypto is a new and less mature market, some rules don’t apply here, at least yet. So the question of this post is — could we still see some inefficiencies in market that can be exploited with pairs trading?

There are quite a few pairs trading posts already, how is this one different? Well, first of all, most of them focus heavily on math side, explaining the Cointegration/Hurst/ADF/etc in depth, and that’s ok, but I really want to see the actual results of backtests. Also, a lot of posts simply skip some real life limitations/requirements to actually execute this LIVE, for example — which coins can you actually short?

Why the need to go more with advanced strategies you might ask? When you’ve run tons of backtests like I have, you start to notice the seeming randomness and lack of positive results of indicator based strategies. For a while you keep thinking “I just need to find the magical indicator X with magical params a & b” but then you realize there is none.

Scroll to the bottom (“Result Overview”) if you know all about math/stats of pairs trading and just want to see the results/charts.

Also Read: Best Crypto Trading Bots

What is Pairs Trading?

In pairs trading we make assumption (based on past and math tests) that price of two assets is connected in a way that it hovers around some mean, diverging up and down by some amount, therefor forming a mean reverting relationship that can be exploited. From time to time this divergence (also called spread) becomes large enough to present a trading opportunity — you bet that the price ratio will return to the mean. You do that by going short on the over-performing asset and going long the under-performing asset.

Here is a typical pairs trading strategy in a nutshell:

  • Take universe of assets.
  • Run math tests for each asset vs each other asset, to find ones that are most “connected”.
  • Draw them in a grid.
  • Take the most “connected” ones, inspect prices/spread visually.
  • Trade your selected list of pairs (or just a single one).

Finding Pairs to trade / Math tests

How do you find “connected” pairs? There are multiple tests with tons of posts about them, so I’ll just give brief explanation and show you the results and some things to look out for.

  • Cointegration — from statsmodels lib — tells if the ratio of 2 time series is mean reverting. This is the core of pairs trading and most commonly used one. The rest are supplementary.
  • ADF — from statsmodels lib. Used to test for stationarity of single time series (in case of pairs that means ratio), which means that series mean and variance is constant with time. In simpler terms it means that it’s reverting to mean. And traders like mean reverting stuff. From math/stats perspective, stationary time series have many useful statistical properties. If you want to learn more about this topic, I heavily suggest you check out these articles.
  • Hurst Ratio — test if the time series is trending, random (Geometric Brownian Motion) or mean reverting. I found multiple implementations and ran them all, but the results were always very close to 0.5, which means random.
  • Half Life — if the series is mean reverting, how often does it return to that mean? It’s very important, because you might not want to wait 5 years for your trade to go through. This function sounds great and I found multiple implementations, but the difference between results were just too far off and didn’t look like anything in charts, so I just decided not to use this.

Understanding Cointegration and p_values

It’s important to understand the output of cointegration test. It starts with a null hypothesis that there is no cointegration. It returns multiple values, and one of them is p_value. There are many common misconceptions about p_value and what it means, so it can be very misleading. First, it’s important to understand what p_value is NOT:

  • Lower p_value does NOT mean stronger cointegration than higher p_value
  • Lower p_value does NOT mean higher chance of cointegration then higher p_value

The official Wikipedia definition is very confusing, I think this one is easier to understand:

p-value = probability of observing the result given that the null hypothesis is true, not the reverse, as is often the case with misinterpretations.

So, how do we interpret p_value in context of coint test? The official definition from statsmodels lib:

The Null hypothesis is that there is no cointegration, the alternative hypothesis is that there is cointegrating relationship. If the pvalue is small, below a critical size, then we can reject the hypothesis that there is no cointegrating relationship.

How small do we need p_value to be? The lower, the better. 0.05 is commonly used as cutoff for significance, sometimes even 0.01, but it depends on how much test cases are you running — the more cases, the lower it should be. Still, since it outputs probability, nothing is given — we must inspect the results visually one by one.

If you want to dive in on p_values and statistical significance, I suggest these articles. If you want less math heavy explanation, I found this to be great analogy with “innocent until proven guilty”.

Running the tests

Here are the results (Cointegration/Hurst/ADF) for period from May 1st 2018 to Jan 1st 2019.

I’m using the same TOP Coins from previous parts (17 in total).

Few things to notice here. The most cointegration is going in lower altcoin regions. The big coins show almost none. ZRX seems weirdly cointegrated with everything, which doesn’t make much sense statistically.

With Hurst it was difficult to get conclusive results. First of all, all 3 versions of algorithm gave different results. Second of all, lag parameter also changed the picture. I ended up using lag=150, because that gave the widest range of results (0.4–0.9), but I didn’t analyze them too much.

ADF shows similarities with cointegration, as it should.

From now on I’ll focus on cointegration results since it’s the core of pairs trading, I just wanted to show others as an example.

Let’s take a look at some of the pairs + their ratio and see if the results make sense. Let’s start with something where p_value in 0.1 — 0.2 range — BTC-ZRX / ETH-DASH / XLM-ZEC.

Now let’s try something where p_value < 0.01 : ETH-OMG / EOS-LTC / QTUM-OMG.

Now let’s take something where p_value > 0.8 : BTC-ETH / ETH-XRP / XLM-QTUM

As we can see —test results are inconclusive. You can’t just take all your data, throw it against the algorithm, take the lowest p_value and say — I’ll trade this because test says this is the most cointegrating pair there is.

Side note: pair might be perfectly mean reverting, but the amount it reverts could be too small to profit, when commissions are included.

Now that we have cointegration results, what’s next? We need to solve some technical difficulties to make pair backtesting possible. First — we need a platform to execute.

Platforms

Since pair trading requires some relatively advanced features, not all platforms support them. Two main featured needed are:

  • Shorting — to bet on spread reverting to mean, you long one and short the other coin. If you can’t short, you can’t trade pairs.
  • Multi asset — for obvious reasons.

Turns out, this is quite rare.

As you might know, I’ve previously written few posts about Gekko backtesting specifically, but Gekko lacks both features needed for shorting, so we need something else.

So I went on a little platform searching.

QuantConnect / LEAN

Written in C#. Very complex, tons of code, which means is much harder to customize source. But the code is very clean. Optimized mostly for more traditional assets, Crypto is an afterthought. Since it’s C#, runs best in Windows. I was able to get it running on Ubuntu with Mono but it was a struggle + performance penalty + no UI for non-Windows. Supports Python strats also, but brings debugging difficulties by being multi-language platform. Great documentation, but a bit fragmented. Good amount of free/public strategies. Very custom data format, hard to force to use data from DB. Supports multi-assets / universe selection, which is huge. Supports shorting.

Backtrader

Has great documentation with some nice witty jokes and a very clean code. One of the most feature rich platforms. One of the simplest to use and quickest to get started, which is huge. Has relatively good charting, although not interactive. There is a Bokeh plugin though, but haven’t tried. Has LIVE trading, but Crypto requires external engine, haven’t tried but seems legit, judging from this post. Supports multi-assets. Supports shorting. Only problem — it’s slow.

Catalyst

Based on Zipline, repurposed for Crypto. Way too slow. Almost no free/public strats that I’ve found. Supports shorting, at least on paper. Found the features to be quite lacking with this one.

Backtesting.py

I’ve rarely seen this mentioned anywhere, but I found it and it’s great for me. Best out of the box Charting, super extensible if you are a coder. Super fast. Very short and clean code, does few basic things and does them good. Doesn’t support shorting, but since it was so minimalistic — quite easy to develop yourself. If you wan’t to develop your own platform, this can serve as a good starting point.

Side note: none of these platforms have anywhere close the number of publicly available strategies like Gekko.

There are a ton of other platforms, but most of them aren’t actively developed. Here is the list of lists, of you are interested. Haven’t tried most of them personally.

Every platform lacked something very important for me, but in the end I chose backtrader — mainly because ease of use and ability to get started quickly.

Strategy Implementation

I tried to use the default pairs-trading.py strategy as starting point, but it didn’t work for me out of the box, so I had to make some adjustments.

EDIT (May 21st): I previously had two fixes here that involved making changes in backtrader source, but u/mementix (the author of backtrader) pointed out to me that there is a better way of doing things in the platform — by extending instead of changing source.

First, there was problem in ols.py file with statsmodels lib which API was changed since the strategy was created. Instead of changing source, you should copy the class OLS_Slope_InterceptN andclass OLS_TransformationN as external indicator and fix this line:

p1 = sm.add_constant(p1, prepend=self.p.prepend_constant,    has_constant=’add’)

Next there was issue with comminfo.py. getsize() tries to int parse all order sizes, but in Crypto that’s not realistic because you won’t be buying round numbers of BTC each time, if ever. Again, instead of changing source, you should create your own commission scheme. Here’s the one I used:

class CommInfo_Crypto(bt.CommInfoBase):  params = (    ('stocklike', True),    ('commtype', bt.CommInfoBase.COMM_PERC),    ('percabs', True),  )  def getsize(self, price, cash):
return self.p.leverage * (cash / price)

After the fixes, I played with the strategy and felt like some things could be simplified. To sum up, what I did was:

  • OLS Spread calculation was very slow, so I had to use my own simplified version (check SpreadZScore in github) which is basically spread = self.data0 / self.data1. While technically this is not the way how spread is calculated usually, for demonstration purposes I felt results it provided was good/close enough.
  • Order Size calculations. I simply split the cash between assets and short/long:
self.order_target_percent(data=self.data0, target=0.5)
self.order_target_percent(data=self.data1, target=-0.5)

While not technically correct (you are borrowing funds when shorting + you need margin account + you need to pay rolling interest for borrowing), I think it’s OK for purposes of this post.

You can see the full code in my repo here.

Which exchanges support shorting?

To be able to short coin X, somebody has to be willing to lend you the coin (which implies interest rate that is often forgotten), so you can sell it now + buy and give back later when it’s hopefully cheaper.

As you can imagine, technically this comes with some additional complexity and risk, for both exchange and also you. Which means that being able to short is not something to be taken for granted, not a lot of exchanges support shorting and only for few selected coins. I did some research and found that most of the coins in my tests are supported, but:

  • None of the exchanges support all the coins, most of them support few (~5) coins and mostly the big ones.

Coins get delisted from margin trading (shorting) all the time because low volume etc. The supported list changes all the time.

I found 2 nice compiled lists, a bit old (mid 2018), but still a good overview:

If you want to dig in yourself and find the latest supported coins, I have compiled a list with exchange info pages that describe what is supported:

Note: I’m not 100% sure that all those support Perpetual Contracts (instead of Futures). To be able to execute pairs trading, you must be able to exit at any time, instead of specific fixed time in future, like with Future Contracts. If you are thinking of executing this LIVE, you must know what types of contracts your chosen pair supports and in what exchanges.

Testing Setup

17 coins gives us 16! combinations which is 136 pairs in total.

  • 40 of those have coint < 0.05
  • 20 of those have coint < 0.01

Instead of backtesting only pairs with low coint p_value, I’ll test them all and later show you the difference between profits/losses of higher and lower p_values.

To try to simulate real life and avoid look ahead bias (as much as possible):

  • I’ll be running coint test from May 1st (2018) to Jan 1st (2019) (already done in “Running the tests” section)
  • But backtests will be ran from Jan 1st (2019) to Apr 1st (2019) (3 months)

There are 2 main variables involved in the strategy:

  • spread period — period over which to calculate ZScore of spread (in my case — ratio between coins)
  • threshold — when ZScore crosses this level (upper/lower), initiate signal to reverse current position

I won’t try to choose single value for each beforehand, but create a grid of possible values and run them all, to see which combinations give the best results:

  • spread period = [ 5d, 7d, 10d, 15d ]
  • threshold = [ 0.5, 0.7, 1.0, 1.5, 2.0, 2.5, 3.0, 3.5, 4.0]

Result Overview

First, here are the overall mean percent profits, grouped by COIN1 and COIN2 (basically by pair). All periods / thresholds are included and profits are averaged.

Looks good. While on average we are negative, keep in mind this is all coins (whatever the coint p_value) and all params (periods/thresholds). Let’s split it up and see which periods/thresholds look better. Here are results grouped by spread period + threshold.

Looks like shorter period (5/7d) and smaller thresholds (0.5/0.7/1.0) bring more profits.

Now let’s go a step further and bring cointegration into the game. Keep in mind (as I described in the beginning) that lower p_value != more cointegration. But still, let’s filter by p_value and see what effect it has on profits. I will now create 2 versions of previous chart — one with only pairs that have p_value < 0.05 and another with p_value < 0.01. Scroll back to “Running the tests” section to see previously calculated p_values between all pairs.

Excellent, this is what we wanted and expected - lower coint p_value => more profits. Let’s merge the charts for better overview.

As a final test, let’s see the correlation between coint p_value and profits on scatter plot:

Can’t say there is much correlation, ratio of profits vs losses looks very similar throughout different p_values. I can see only a slight edge as coint p_value goes lower. Let’s zoom on to p_value < 0.01

Still not much. Let’s take specific period+threshold pair from above chart just to double check our results. 5d + 3.5 seems like a good candidate, because there’s a big difference between pvalues.

Ok, makes sense now — difference is visible. Let’s move forward.

Result analysis (BackTrader charts)

Now let’s view some individual results.

Let’s take some good performing pair like ADA-QTUM (avg +40%).

Unfortunately, most of the profits come from huge anomaly spike in Mar 15. In real life, there is a good chance you would NOT have caught that in a trade, that’s why it’s so important to always inspect results visually. If we scroll back to overall chart, we can see a lot of best results come when QTUM is involved. Now we know why.

Let’s try something good where QTUM is not involved— XRP-ZEC (avg +18%)

Now BTC-XRP (avg +16%)

Up next -NEO-DASH (avg +28%)

The numbers looks quite nice, but you should notice one suspicious thing here. Which brings us to our next problem and our next section.

The Most Important Question

I’ve been asked a lot of questions in Discord and they usually sound like:

  • Which NN lib is the best? Convnet is 4 years old, should I switch to Tensorflow?
  • What params did you use for NN optimization? I’m sure they were all wrong, NN strats rock, you just didn’t find the right ones.
  • How much train data did you use? You need to retest NN strats with X years of training data, then it will totally profit.

There is one very basic and much more important question to be asked when you are viewing someone else’s results and that is — what commission are you using? As I’ll show, it changes everything.

In all results previously, I used 0.1% commission. While possible on paper (e.g. Binance has 0.1%, Bitmex has 0.075%), real life brings some hidden costs (like deposit/withdraw fees, slippage, etc) which brings the real cost of trading up. Which doesn’t sounds like much, but when you actually calculate it, there is meaningful difference. So let’s increase it and see what happens. I’ll retest with 0.2% and 0.3% commission and compare that to 0.1% results.

New Commissions

Let’s start with the overall results, grouped by coin. Since there are 17! (pairs) x 3 (commission) = 408 results, I’ll just split them in 3 rows for better readability.

This doesn’t look that bad. On average we see a small drop, maybe 1–5%. No dramatic changes. But let’s try another angle — grouped by spread period + threshold:

Quite huge difference on the left side — winning strategy turning into a losing one! Why only on left side? Results are sorted by spread period first, threshold second, and smaller values force more trades, therefor more cash is spent on commissions.

Let’s try filtering only pairs where p_value < 0.01 , see if that brightens up our day:

Yes, average looks way better. But there is one very important thing to notice here. Remember conclusion from previous section where I wrote:

Looks like shorter period (5/7d) and smaller thresholds (0.5/0.7/1.0) bring more profits.

The tables have turned a bit. While 5/7d spread period still looks good, 0.5/0.7/1.0 threshold is clearly not the way to go when more realistic commission is chosen.

Conclusion

So, after all these charts and statistics, can we filter out some reasonable defaults and see a summary of how the actual results might have looked?

I think these are reasonable values to filter by:

  • p_value < 0.01
  • spread_period in [5d, 7d]
  • threshold in [3.0, 3.5, 4.0]
  • commission in [0.2%, 0.3%]

Here are the final results. Commission in separate charts. Looks pretty equal, because it’s the longer periods/thresholds that don’t have that much trades.

Is the strategy overall usable? Right now, this looks ok, but for me there needs to be more research before putting this LIVE:

  • Add rolling interest fees. They differ for each exchange and also for coins, but they could have quite huge effect on profits. For example, Kraken has 0.01% per 4 hours.
  • Test more/longer periods (here we have only 3 months of data).
  • Test advanced Gekko strategies from previous parts on same time period to have something to compare to.
  • Jan 1st — Apr 1st period was relatively neutral, since we had a huge spike in recent weeks, would be nice to add that and see what happens with pairs trading in that kind of market.
  • Try more advanced pairs implementations, like OLS / Kalman / Copula.

I’ll leave it at that. As always, make sure you run your own tests if you are thinking about running this LIVE. Good luck!

Want to discuss my results or have questions? Find me on Discord — deandree#7313.

Join Coinmonks Telegram Channel and Youtube Channel get daily Crypto News

Also, Read

--

--