# Intro to Statistical Arbitrage in Crypto — Pairs Trading [Part8]

May 20 · 17 min read

This is Part 8 in multi part series:

In this part, I start to explore more advanced strategies that are rooted in math and statistics. When you look for actual quant strategies, one of the first things that come up is pairs trading, subset of statistical arbitrage. Some might argue that this is not a real quant strategy, but it’s still a step up compared to indicator based ones.

Over the years, pairs trading has seen a steady decline in results (like most strategies once they go public), some sources claim that pairs had ok results until mid 2000’s. But, since Crypto is a new and less mature market, some rules don’t apply here, at least yet. So the question of this post is — could we still see some inefficiencies in market that can be exploited with pairs trading?

There are quite a few pairs trading posts already, how is this one different? Well, first of all, most of them focus heavily on math side, explaining the Cointegration/Hurst/ADF/etc in depth, and that’s ok, but I really want to see the actual results of backtests. Also, a lot of posts simply skip some real life limitations/requirements to actually execute this LIVE, for example — which coins can you actually short?

Why the need to go more with advanced strategies you might ask? When you’ve run tons of backtests like I have, you start to notice the seeming randomness and lack of positive results of indicator based strategies. For a while you keep thinking “I just need to find the magical indicator X with magical params a & b” but then you realize there is none.

Scroll to the bottom (“Result Overview”) if you know all about math/stats of pairs trading and just want to see the results/charts.

In pairs trading we make assumption (based on past and math tests) that price of two assets is connected in a way that it hovers around some mean, diverging up and down by some amount, therefor forming a mean reverting relationship that can be exploited. From time to time this divergence (also called spread) becomes large enough to present a trading opportunity — you bet that the price ratio will return to the mean. You do that by going short on the over-performing asset and going long the under-performing asset.

Here is a typical pairs trading strategy in a nutshell:

# Finding Pairs to trade / Math tests

How do you find “connected” pairs? There are multiple tests with tons of posts about them, so I’ll just give brief explanation and show you the results and some things to look out for.

## Understanding Cointegration and p_values

It’s important to understand the output of cointegration test. It starts with a null hypothesis that there is no cointegration. It returns multiple values, and one of them is p_value. There are many common misconceptions about p_value and what it means, so it can be very misleading. First, it’s important to understand what p_value is NOT:

The official Wikipedia definition is very confusing, I think this one is easier to understand:

p-value = probability of observing the result given that the null hypothesis is true, not the reverse, as is often the case with misinterpretations.

So, how do we interpret p_value in context of coint test? The official definition from statsmodels lib:

The Null hypothesis is that there is no cointegration, the alternative hypothesis is that there is cointegrating relationship. If the pvalue is small, below a critical size, then we can reject the hypothesis that there is no cointegrating relationship.

How small do we need p_value to be? The lower, the better. 0.05 is commonly used as cutoff for significance, sometimes even 0.01, but it depends on how much test cases are you running — the more cases, the lower it should be. Still, since it outputs probability, nothing is given — we must inspect the results visually one by one.

If you want to dive in on p_values and statistical significance, I suggest these articles. If you want less math heavy explanation, I found this to be great analogy with “innocent until proven guilty”.

## Running the tests

Here are the results (Cointegration/Hurst/ADF) for period from May 1st 2018 to Jan 1st 2019.

I’m using the same TOP Coins from previous parts (17 in total).

Few things to notice here. The most cointegration is going in lower altcoin regions. The big coins show almost none. ZRX seems weirdly cointegrated with everything, which doesn’t make much sense statistically.

With Hurst it was difficult to get conclusive results. First of all, all 3 versions of algorithm gave different results. Second of all, `lag` parameter also changed the picture. I ended up using `lag=150`, because that gave the widest range of results (`0.4–0.9`), but I didn’t analyze them too much.

ADF shows similarities with cointegration, as it should.

From now on I’ll focus on cointegration results since it’s the core of pairs trading, I just wanted to show others as an example.

Let’s take a look at some of the pairs + their ratio and see if the results make sense. Let’s start with something where `p_value in 0.1 — 0.2` range — BTC-ZRX / ETH-DASH / XLM-ZEC.

Now let’s try something where `p_value < 0.01 `: ETH-OMG / EOS-LTC / QTUM-OMG.

Now let’s take something where `p_value > 0.8` : BTC-ETH / ETH-XRP / XLM-QTUM

As we can see —test results are inconclusive. You can’t just take all your data, throw it against the algorithm, take the lowest p_value and say — I’ll trade this because test says this is the most cointegrating pair there is.

Side note: pair might be perfectly mean reverting, but the amount it reverts could be too small to profit, when commissions are included.

Now that we have cointegration results, what’s next? We need to solve some technical difficulties to make pair backtesting possible. First — we need a platform to execute.

# Platforms

Since pair trading requires some relatively advanced features, not all platforms support them. Two main featured needed are:

Turns out, this is quite rare.

As you might know, I’ve previously written few posts about Gekko backtesting specifically, but Gekko lacks both features needed for shorting, so we need something else.

So I went on a little platform searching.

## QuantConnect / LEAN

Written in C#. Very complex, tons of code, which means is much harder to customize source. But the code is very clean. Optimized mostly for more traditional assets, Crypto is an afterthought. Since it’s C#, runs best in Windows. I was able to get it running on Ubuntu with Mono but it was a struggle + performance penalty + no UI for non-Windows. Supports Python strats also, but brings debugging difficulties by being multi-language platform. Great documentation, but a bit fragmented. Good amount of free/public strategies. Very custom data format, hard to force to use data from DB. Supports multi-assets / universe selection, which is huge. Supports shorting.

Has great documentation with some nice witty jokes and a very clean code. One of the most feature rich platforms. One of the simplest to use and quickest to get started, which is huge. Has relatively good charting, although not interactive. There is a Bokeh plugin though, but haven’t tried. Has LIVE trading, but Crypto requires external engine, haven’t tried but seems legit, judging from this post. Supports multi-assets. Supports shorting. Only problem — it’s slow.

## Catalyst

Based on Zipline, repurposed for Crypto. Way too slow. Almost no free/public strats that I’ve found. Supports shorting, at least on paper. Found the features to be quite lacking with this one.

## Backtesting.py

I’ve rarely seen this mentioned anywhere, but I found it and it’s great for me. Best out of the box Charting, super extensible if you are a coder. Super fast. Very short and clean code, does few basic things and does them good. Doesn’t support shorting, but since it was so minimalistic — quite easy to develop yourself. If you wan’t to develop your own platform, this can serve as a good starting point.

Side note: none of these platforms have anywhere close the number of publicly available strategies like Gekko.

There are a ton of other platforms, but most of them aren’t actively developed. Here is the list of lists, of you are interested. Haven’t tried most of them personally.

Every platform lacked something very important for me, but in the end I chose backtrader — mainly because ease of use and ability to get started quickly.

# Strategy Implementation

I tried to use the default pairs-trading.py strategy as starting point, but it didn’t work for me out of the box, so I had to make some adjustments.

EDIT (May 21st): I previously had two fixes here that involved making changes in backtrader source, but u/mementix (the author of backtrader) pointed out to me that there is a better way of doing things in the platform — by extending instead of changing source.

First, there was problem in ols.py file with statsmodels lib which API was changed since the strategy was created. Instead of changing source, you should copy the `class OLS_Slope_InterceptN` and`class OLS_TransformationN `as external indicator and fix this line:

`p1 = sm.add_constant(p1, prepend=self.p.prepend_constant,    has_constant=’add’)`

Next there was issue with comminfo.py. `getsize()` tries to `int parse` all order sizes, but in Crypto that’s not realistic because you won’t be buying round numbers of BTC each time, if ever. Again, instead of changing source, you should create your own commission scheme. Here’s the one I used:

```class CommInfo_Crypto(bt.CommInfoBase):  params = (    ('stocklike', True),    ('commtype', bt.CommInfoBase.COMM_PERC),    ('percabs', True),  )  def getsize(self, price, cash):
return self.p.leverage * (cash / price)```

After the fixes, I played with the strategy and felt like some things could be simplified. To sum up, what I did was:

```self.order_target_percent(data=self.data0, target=0.5)
self.order_target_percent(data=self.data1, target=-0.5)```

While not technically correct (you are borrowing funds when shorting + you need margin account + you need to pay rolling interest for borrowing), I think it’s OK for purposes of this post.

You can see the full code in my repo here.

# Which exchanges support shorting?

To be able to short coin X, somebody has to be willing to lend you the coin (which implies interest rate that is often forgotten), so you can sell it now + buy and give back later when it’s hopefully cheaper.

As you can imagine, technically this comes with some additional complexity and risk, for both exchange and also you. Which means that being able to short is not something to be taken for granted, not a lot of exchanges support shorting and only for few selected coins. I did some research and found that most of the coins in my tests are supported, but:

Coins get delisted from margin trading (shorting) all the time because low volume etc. The supported list changes all the time.

I found 2 nice compiled lists, a bit old (mid 2018), but still a good overview:

If you want to dig in yourself and find the latest supported coins, I have compiled a list with exchange info pages that describe what is supported:

Note: I’m not 100% sure that all those support Perpetual Contracts (instead of Futures). To be able to execute pairs trading, you must be able to exit at any time, instead of specific fixed time in future, like with Future Contracts. If you are thinking of executing this LIVE, you must know what types of contracts your chosen pair supports and in what exchanges.

# Testing Setup

17 coins gives us 16! combinations which is 136 pairs in total.

Instead of backtesting only pairs with low coint p_value, I’ll test them all and later show you the difference between profits/losses of higher and lower p_values.

To try to simulate real life and avoid look ahead bias (as much as possible):

There are 2 main variables involved in the strategy:

I won’t try to choose single value for each beforehand, but create a grid of possible values and run them all, to see which combinations give the best results:

# Result Overview

First, here are the overall mean percent profits, grouped by `COIN1` and `COIN2 `(basically by pair). All periods / thresholds are included and profits are averaged.

Looks good. While on average we are negative, keep in mind this is all coins (whatever the coint p_value) and all params (periods/thresholds). Let’s split it up and see which periods/thresholds look better. Here are results grouped by `spread period + threshold`.

Looks like shorter period (5/7d) and smaller thresholds (0.5/0.7/1.0) bring more profits.

Now let’s go a step further and bring cointegration into the game. Keep in mind (as I described in the beginning) that lower p_value != more cointegration. But still, let’s filter by p_value and see what effect it has on profits. I will now create 2 versions of previous chart — one with only pairs that have `p_value < 0.05` and another with `p_value < 0.01`. Scroll back to “Running the tests” section to see previously calculated p_values between all pairs.

Excellent, this is what we wanted and expected - lower coint p_value => more profits. Let’s merge the charts for better overview.

As a final test, let’s see the correlation between coint p_value and profits on scatter plot:

Can’t say there is much correlation, ratio of profits vs losses looks very similar throughout different p_values. I can see only a slight edge as coint p_value goes lower. Let’s zoom on to `p_value < 0.01`

Still not much. Let’s take specific period+threshold pair from above chart just to double check our results. `5d + 3.5` seems like a good candidate, because there’s a big difference between pvalues.

Ok, makes sense now — difference is visible. Let’s move forward.

Now let’s view some individual results.

Let’s take some good performing pair like `ADA-QTUM` (avg +40%).

Unfortunately, most of the profits come from huge anomaly spike in Mar 15. In real life, there is a good chance you would NOT have caught that in a trade, that’s why it’s so important to always inspect results visually. If we scroll back to overall chart, we can see a lot of best results come when `QTUM` is involved. Now we know why.

Let’s try something good where `QTUM` is not involved— `XRP-ZEC` (avg +18%)

Now `BTC-XRP` (avg +16%)

Up next -`NEO-DASH` (avg +28%)

The numbers looks quite nice, but you should notice one suspicious thing here. Which brings us to our next problem and our next section.

# The Most Important Question

I’ve been asked a lot of questions in Discord and they usually sound like:

There is one very basic and much more important question to be asked when you are viewing someone else’s results and that is — what commission are you using? As I’ll show, it changes everything.

In all results previously, I used 0.1% commission. While possible on paper (e.g. Binance has 0.1%, Bitmex has 0.075%), real life brings some hidden costs (like deposit/withdraw fees, slippage, etc) which brings the real cost of trading up. Which doesn’t sounds like much, but when you actually calculate it, there is meaningful difference. So let’s increase it and see what happens. I’ll retest with 0.2% and 0.3% commission and compare that to 0.1% results.

## New Commissions

Let’s start with the overall results, grouped by coin. Since there are 17! (pairs) x 3 (commission) = 408 results, I’ll just split them in 3 rows for better readability.

This doesn’t look that bad. On average we see a small drop, maybe 1–5%. No dramatic changes. But let’s try another angle — grouped by spread period + threshold:

Quite huge difference on the left side — winning strategy turning into a losing one! Why only on left side? Results are sorted by spread period first, threshold second, and smaller values force more trades, therefor more cash is spent on commissions.

Let’s try filtering only pairs where `p_value < 0.01` , see if that brightens up our day:

Yes, average looks way better. But there is one very important thing to notice here. Remember conclusion from previous section where I wrote:

Looks like shorter period (5/7d) and smaller thresholds (0.5/0.7/1.0) bring more profits.

The tables have turned a bit. While `5/7d spread period` still looks good, `0.5/0.7/1.0 threshold` is clearly not the way to go when more realistic commission is chosen.

# Conclusion

So, after all these charts and statistics, can we filter out some reasonable defaults and see a summary of how the actual results might have looked?

I think these are reasonable values to filter by:

Here are the final results. Commission in separate charts. Looks pretty equal, because it’s the longer periods/thresholds that don’t have that much trades.

Is the strategy overall usable? Right now, this looks ok, but for me there needs to be more research before putting this LIVE:

I’ll leave it at that. As always, make sure you run your own tests if you are thinking about running this LIVE. Good luck!

Want to discuss my results or have questions? Find me on Discord — deandree#7313.

Written by