Build a BitCoin(tegration) Backtester

Learn the statistical technique of Cointegration and build your own crypto backtester to create and test a quantitative trading strategy.

This tutorial is in 2 parts — (you can run the backtester as a separate standalone module) :

  1. Learn the Statistical technique of Cointegration.
  2. Build a Bitcoin Backtesting engine using Python to analyze the performance of a Cointegration based trading strategy.

Just want the code? click here.

What are we building

We are going to build a python based event-driven backtester that pulls 2 crypto securities Bitcoin (BTC)and Bitcoin Cash (BCH) from an API, passes it through a trading strategy that uses the mean reverting cointegration spread between the 2 securities and generates buy/sell signals when the spread hits ± 1 stdev. We then send these signals to the Portfolio class which handles the logic of the backtester. One time stamp will be pulled and processed at a time, allowing us to see what would have happened tick-by-tick. Finally we print the results to console (or jupyter notebook) and print out the PnL (profit and loss).



  • what is time series
  • model assumptions
  • why does this happen
  • what is stationarity
  • orders of integration
  • cointegration

Time Series

To understand Cointegration we first need to look at time series.

In cross sectional (non time series) regression models, if we are trying to predict some output value ‘Y’ we would have (one or more) corresponding input feature values ‘X’, we would learn some mapping between X and Y using something like least squares regression on a training set, then assess the performance on a test set. All very straight forward.

Regular (cross sectional) multiple regression model

In the case of time series, instead of using exogenous features ‘X’ we can use lags of the target output ‘Y’ in the form:

Basic AR (autoregressive) time-series model

In simple terms, our predictor for today's observation is yesterdays observation. This is known as an Autoregressive model or AR model.

Model Assumptions

Most statistical models and techniques make the (often unrealistic) assumption of the input data being iid (independent and identically distributed); each data point being independent and all drawn from the same probability distribution.

For regression models we make the following assumptions:

  • IndependencePr [ rank ⁡ ( X ) = p ] = 1. The input features (X) are statistically independent from each other. A full rank feature matrix.
  • Linearity — y=b0 +b1x1…btXt. The relationship between dependent and independent variables is linear.
  • Homoscedasticity — E[ εi² | X ] = σ². The variance of the errors is constant.
  • Normality — ε ∣ X ∼ N ( 0 , σ²I n ). The error terms are normally distributed.
  • No Autocorrelation — E[ εiεj | X ] = 0 for ij. The error terms are uncorrelated.
  • Strict Exogeneity  E ⁡ [ ε ∣ …Xt-1, Xt, Xt+1… ] = 0. This assumes the errors are mean zero E[ε] = 0, and the errors are uncorrelated with the input features E[X.ε] = 0. Crucially this means each error term must be uncorrelated with every value of X past present and future.
  • Weak Exogeneity (Optional — we’ll come back to this) — E ⁡ [ ε ∣ …Xt-1, Xt] = 0. Similar to ‘strict’ form except expectation only applies to current and past values (not future values of X).

Failure to meet these any or all of these assumptions, can cause our models, whether inference or prediction, to be inefficient, inaccurate, incorrectly significant or harder to interpret than necessary. See here for more information.

However, when we try to extend these regression assumptions from the cross-sectional domain to time-series, the two assumptions that are hardest to meet are the last two: Lack of Autocorrelation and Strict Exogeneity . In particular if these conditions do not hold then our regression coefficient estimate is biased as is the variance of the coefficient estimate.

Why does this happen?

The Law of Large Numbers (LLN) states that if our variables are iid (and have a finite expected value), then the sample means, variances and covariances will tend towards the true, population moments.

Sample moments ==> Population moments

When we have this nice asymptotic property of sample moments tending towards population moments, our regression estimators are efficient and unbiased, our standard errors and confidence intervals are trustworthy.

However, if we modify the above formula to make xi a function of time, xt, then the sample mean no longer converges to the population mean, it diverges to infinity!

The mean diverges to infinity!

This example also extends to the variance and covariance diverging too. It should be obvious now that if we were to run regression or any statistical analysis on this time trend data, that our results would be unreliable. Furthermore, in some situations, as we increase the sample size, this can make our model even worse!

This is an example of non stationary data.

A more subtle example shows a time series that we call ‘cyclo-stationary’:

y(t)=μ + Acos(2πt) + e

in this example if we calculate the full sample mean, y/n does indeed converge to the population mean μ. However if we choose a fixed time window as our sample, say t` (tee prime), then we converge to a different mean: μ+Acos(2πt′). Note, this is still a constant mean, but the two means clearly are not equal:

μ != μ+Acos(2πt′)

This is an example of a time series that is (cyclo) stationary but not ergodic.

Only if we have both a stationary and ergodic time series, can we loosen our assumption of Strict Exogeneity to Weak Exogeneity. This allows us to meet the model assumptions and have a reliable model.

What is stationarity?

So we’ve learnt something about stationarity, but what is a stationarity?

Definition: If we take a time series {Xt} (or any sequence of random variables) and define the joint distribution of a consecutive sub sequence as Fx(Xt1…Xtn), then define a 2nd joint distribution from a 2nd sequence Fx(Xt1+ 𝜏…Xtn+ 𝜏) then for all 𝜏,t,n, if Fx== Fx, we have a stationary series in {Xt}.

If 2 random vectors have same distribution they are Stationary.
In words, a stationary process is one whose joint probability distribution doesn't change with a shift in time.

The above definition implies Strict Stationarity. If instead of the whole distribution being the same, we just have the mean and covariance consistent throughout the time series, then we have Weak Stationarity.

Note: iid is a stronger assumption than stationarity because stationarity makes no assumption about the data being independent, just that they are identically distributed.

All iid sequences are stationary but the reverse does not hold true.


The penultimate step before we get to Cointegration, is the concept of Integration denoted as I(i). Lets define a simple time series where the regressors are the error terms, defined as Y-Ŷ = ε, the true value minus the predicted value. The bj terms are the ‘weights’, denoting how much each error term influences Yt.

Moving Average (MA) series

Given this moving average series, if the following condition holds, then we call the series I(0). This conditions states that the autocorrelation (influence of the error terms on Yt’s) decays such that the variance of bk doesn't blow up to infinity. The mathy term is ‘square summable’.

Weights decay

Note: I(0) is necessary for stationarity but not sufficient, So all stationary series are I(0), but not all I(0) are stationary.

If we cumulatively sum an I(0) series we get an I(1) series. The following python code generates an I(0) series sampling from a standard normal, then cumulatively sums those values to get an I(1). Note how the I(1) looks remarkably like a stock price chart! We could reverse the I(1) by taking the 1st difference of the series, by taking each price minus the previous price:

x = pd.Series(index=range(1000))
#generate samples from standard normal
for i  in range(1000):
x[i] = (np.random.normal(0,1))
x_i_zero = x
#cumulatively sum the I(0) series to make it I(1)
x_i_one = np.cumsum(x)
plt.plot(x_i_zero, label = 'I(0)')
plt.plot(x_i_one, label = 'I(1)')
I(0) cumulatively summed = I(1)


At a high level, if a linear combination of two or more non-stationary time series is stationary, then the entire set of time series is considered cointegrated.

Given a set of time series (or any sequence of RV’s) {X1, X2, …, Xk}, if all series are I(1) as is usually the case with financial data, then if some linear combination of them evaluates to an I(0) series, we call the set of time series Cointegrated.

Formally, we are building a linear model (which we will see later can be done with regression) where the X’s are individually I(1) and therefore non stationary, that gives us a new singular time series Y, that is I(0) and stationary.


For example, if X1, X2, and X3 are all I(1), and the linear combination of 5X1+3X2+0X3=5X1+3X2 is I(0). Then in this case the time series set (X1, X2, X3) are cointegrated.

So how does this help us build a trading strategy? Well,

if we can find 2 or more time series that are cointegrated, then that cointegrated time series, by definition would be I(0 ) and mean reverting. So we could generate signals whenever the series moved far away from its mean on the expectation that it will move back to the mean over time.

Building a BackTester

Lets build a BitCointegration BackTester! (Now it makes sense right?)


  • hypothesis testing
  • the strategy
  • building the backtester
  • backtesting pitfalls

Our hypothesis

Before we begin our analysis and building of the backtester, we need to start with a hypothesis as to why two or more securities might be cointegrated.

This is an important starting point. If we simply scanned every tradable instrument over all time periods, then we would undoubtedly find a pair of instruments that showed cointegration. This is the curse of multiple comparison bias. Put simply, if you look at enough data, you will eventually find a result which matches your desired outcome, regardless of its statistical significance. Its crucial to understand this before we start and I have an entire research piece on this topic here.

For our research, we will be using Bitcoin (BTC) and Bitcoin Cash (BCH). The base economic rationale for this is simple: BCH is a fork of BTC, therefore our hypothesis is that the 2 instruments *might* be cointegrated. We keep it as simple as that and then test this hypothesis.

Note: my post today is not about bitcoin or blockchain or the relative merits of one instrument over the other. There are plenty of other forums for that!. My aim is to teach the concept of cointegration and how to test for it statistically and how to build a backtester from scratch along with the many pitfalls. To get a quick overview of the difference between BTC and BCH read here.

The strategy

Assuming that the analysis that we are about to do, finds that BTC and BCH are cointegrated, then based on what we have learned so far, we know that a linear combination of BTC and BCH, if cointegrated, will be stationary and therefore mean reverting. We will use this property to build a trading strategy, specifically as we’ll see in the next section, we will short the spread between BTC and b*BCH (b is a weight, that we will calculate) when it rises above 1 standard deviation (upper blue line) and long the spread if it moves below the lower blue line. Crucially we will take profit and close out the positition when the spread hits the mean (read line). Note, we dont have any stop loss implemented in this strategy, but it could easily be added.

buy/sell signals generated at ±1std

When we go long the spread, this means we buy BTC and sell b*BCH. When we generate a short signal we sell BTC and buy b*BCH. Because this type of pairs trading strategy can get quite complicated we will restrict each buy/sell amount to $1000 each time we execute a trade.

Example: if we generate a short signal, we would sell $1000 of BTC and buy $1000 of b*BCH

Th eagle eyed reader will notice that we are actually not buying $1000 of BCH but $1000 times some multiplier ‘b’ times BCH. Otherwise we would simply be trading the raw spread which is not cointegrated! We will account for this when we code the Portfolio class.

Here’s an overview of the Classes and methods we’ll be building:

  • DataPuller — to pull, align and organize the data from API
  • Portfolio — handles the logic of the cointegration pairs trade
  • Strategy — runs event-driven backtester with ‘Portfolio’ as base class

Lets get started with the usual imports:

#standard imports
import requests
import numpy as np
import pandas as pd
import seaborn as sns
from scipy import stats
import matplotlib.pyplot as plt
%matplotlib inline
#nice trick to make plots full width
plt.rcParams['figure.figsize'] = [15,5]

We are going to use the CryptoCompare API to get the price data:

#fetch daily OHLC prices for BTC, BCH
btc = requests.get("").json()['Data']
bch = requests.get("").json()['Data']

Next we put the data into a Pandas Dataframe and change the time column to a proper Pandas DateTime object and use the rename function to change the duplicate columns of ‘close’ to unique names, ‘btc’ and ‘bch’. We also select our starting date as 2017–12–12 (more on this later).

#put into dataframe
btc_df = pd.DataFrame(btc)
bch_df = pd.DataFrame(bch)
#use pandas datetime feature to convert timestamp into a datatime object with units = seconds
btc_df['time'] = pd.to_datetime(btc_df['time'], unit='s')
bch_df['time'] = pd.to_datetime(bch_df['time'], unit='s')
#use the newly created datetime object as index
btc_df.set_index('time', inplace=True)
bch_df.set_index('time', inplace=True)
#rename 'close' for each instrument so they have unique names
btc_df.rename({'close':'btc'}, axis=1, inplace=True)
bch_df.rename({'close':'bch'}, axis=1, inplace=True)
#select our desired stating data
btc_df = btc_df.loc['2017-12-12':]
bch_df = bch_df.loc['2017-12-12':]

So here’s one of our cryptocurrency dataframes:


For our purpose we just want the closing prices of both BTC and BCH, so we will use the concat function in pandas to merge just the closing price columns:

#we'll work with just the closing pries for this project, so concatenate the 2 columns together.
df = pd.concat([btc_df['btc'], bch_df['bch']],axis=1)
#we'll also add the raw spread as a column
#calculate the spread between the 2 prices, for reference only.
#We will be trading the 'cointegration spread' instead.
df['spread'] = df['btc'] - df['bch']

So that's our dataframe sorted, now we need to test the 2 time series, BTC and BCH to see if they are cointegrated. To do this we will import the adfuller and coint modules from statsmodels and select a training sample from both of our cryptocurrencies. We choose a 5 month window from the beginning of 2018. We also create a ‘spread’ series showing the difference between BTC and BCH just for reference.

The function coint basically fits a regression model, like we have already discussed and tests the null hypothesis that there is no cointegration, meaning we want to see a small p-value, so we can reject the null.

adfuller tests for a ‘unit root’ which would indicate the series is non stationary. Again we want a small p-value.

adfuller implements the Augmented Dickey Fuller test for stationarity.

coint implements the Engle Granger 2-Step method for cointegration testing.

#test for cointegration
from statsmodels.tsa.stattools import coint, adfuller
import statsmodels.api as sm
#select a training sample
btc_train, bch_train = df['btc'].loc['2017-12-12':'2018-4-30'], df['bch'].loc['2017-12-12':'2018-4-30']
spread_train = btc_train - bch_train

Lets throw our training set into the coint function and what we’re looking for is a p-value (see here for more on p-values) below a 5% significance level:

This will imply BTC and BCH are cointegrated over the training period:
#return p value
#coint returns 3 values t stat, p-value and critical value
#in python we can unpack all three on one line
t,p,crit = coint(btc_train,bch_train)
#test for significance
if p <0.05:
print('NOT Cointegrated')

Great! Bitcoin and Bitcoin Cash appear to be cointegrated. Well if they’re cointegrated the spread between them must also be stationary right?

#use adf to test for stationarity
pval_spread = adfuller(spread_train)[1]
if pval_spread <0.05:
print(pval_spread,'Data is Stationary!')
print(pval_spread, 'Data is NOT Stationary!')
#note the spread itself is Not stationary as it assumes a 'Beta' value of 1
#so we need to construct a linear model to find the optimal Beta value...
Oops! The spread’s not stationary

So whats happened here? remember we defined cointegration as being a linear combination of the the time series that are stationary, not simply the raw 1-to-1 spread. But how do we find this linear combination? Well one technique we already know for finding a linear combination is linear regression!

If we have 2 time series X1,X2 then if we can define a linear model as:

which is therefore cointegrated and if we rearrange the algebra we can get:

In other words, if there is a linear combination of X1 and X2, that gives a spread which is I(0), then by definition the spread is stationary and mean reverting.

So what we need to do is build a simple linear model between BTC and BCH and use the slope coefficient ‘beta’ from that equation to build our stationary spread series defined as ‘z’:

#build linear model to find beta that gives I(0) combination of pair
X = sm.add_constant(bch_train)
result = sm.OLS(btc_train,X).fit()
#result.params returns the intercept (const) and slope of the model. #We can ignorethe intercept and use 'b' to build our cointegrated #series!?

#define new stationary spread as 'z'
#'b' value gives the parameter of our linear model
b = result.params['bch']
#simply define our new cointegrated series as z = btc - b*bch
z = btc_train - b*bch_train
intercept and slope coefficients of linear model

Now if we run the augmented dickey fuller test on this new linear combination of BTC and b*BCH:

#run adf again, this time on linear combination 'Z'
z_pval = adfuller(z)[1]
if z_pval<0.01:
print(z_pval,"Huzzah!, it's Stationary")
print(z_pval,":Not stationary")
Huzzah it’s stationary!

It’s stationary! This means we have found a linear combination of BTC and BCH which is stationary (at least over the training period).

Lets think about what the series ‘z’ actually is and how we can construct a trading strategy from it. ‘z’ represents the difference (spread) between 1 unit of BTC and 3.99 units of BCH, which we have shown to be stationary. Without going too deep into the inner workings of ADF (Augmented Dickey Fuller) it checks for a ‘unit root’ which is a fancy way of saying the moments (mean, variance etc) depend on time ‘t’ and are therefore non-stationary. We want to reject the null hypothesis that a unit root exists.

Lets produce some plots to show visually what we have done. The following code plots 3 charts

  1. The raw spread between BTC and BCH (not cointegrated), we use this as a reference.
  2. THIS IS IMPORTANT — we plot the full time series of BTC — b*BCH (training + test set) ASSUMING that stationarity hold not just for the training set but ALSO the test set (we will see later the consequences of this).
  3. Shows a plot of the daily returns, we don't actually use this series but we include it as a potential alternative predictive feature that we might want to test.

This final point (3) highlights the fact that I have arbitrarily selected ± 1 standard deviation on the cointegrated spread, as our trading strategy. We could just as easily use daily returns breaching 1.2345 stdev as our signal, or anything else!

Marked in green is the end of training set|begining of test set. We will see later why our assumption of stationarity in the training set holding true for the test set, is a bad idea!

#calculate cointegrated series 'full_z' for the whole (train + test) dataset
spread = df['spread']
full_z = df['btc'] - b*df['bch']
#lets plot the raw spread, the stationary spread and for reference the 'spread daily percent change' or 'returns'
#the green vertical line shows the end of the training set period.
fig,ax = plt.subplots(3,1,sharex=True)
#stationary series 'z' plotted with 1 standard deviation horizontal bars shown
#note standard dev bars are arbitrary and could be anything
ax[1].set_title("Linear model 'z'")
#plot inverse so its same as 'Spread'
full_z_mu = full_z.mean()
ax[1].axhline(full_z_mu+full_z.std(),ls ='--')
ax[1].axhline(full_z_mu-full_z.std(),ls ='--')
#spread pct change  / returns with 1 standard deviation horizontal bars shown

spread_pct = spread.pct_change(1)
ax[2].set_title('Spread daily % change')
#mark end of training sample in green
for i in range(3):
spread v ‘stationary spread’ v daily returns

By looking at the raw spread between BTC and BCH compared to the stationary spread, we can get a visual confirmation that the linear modeled spread ‘z’ appears to be reasonably bounded between ±1 std (blue lines) and centered around a constant mean (red line). This property seem to hold beyond the training set period (green line) too, but more on this later. The daily returns shown in the lower of the 3 plots, is just for reference and to show how volatility seems to correlate with major changes in the spread.

The Backtester

And finally we get to the actually backtester!

The first component we will build is the Data_Puller class.

we begin with the __init__() function by defining some variables we will use throughout the backtester; ticker1, ticker2 for holding the crypto pairs and a pandas dataframe named df3 to store the final results.

we define 2 functions (actually they’re methods because they are within a class):

get_data()- to pull and merge the data from the API

fetch_data()- to return the final dataframe so we can pass it to the next component.

#Data_Puller fetches crypto data, cleans then passes to container df3
#Class to store data for any pairs, crypto or otherwise
class Data_Puller:
def __init__(self,ticker1,ticker2,freq,periods):
self.ticker1 = ticker1
self.ticker2 = ticker2
self.freq = freq
self.periods = periods
self.df3 = pd.DataFrame()

#method to pull, munge, store crypto pairs data
def get_data(self):
#replace this in final merge
b = 3.995977
_data1 = requests.get(f"{self.freq}?fsym={self.ticker1}&tsym=USD&limit={self.periods}").json()['Data']
_data2 = requests.get(f"{self.freq}?fsym={self.ticker2}&tsym=USD&limit={self.periods}").json()['Data']
df1 = pd.DataFrame(_data1)
df1_close = df1['close']
df2 = pd.DataFrame(_data2)
df2_close = df2['close']

df1['time'] = pd.to_datetime(df1['time'],unit='s')
df1.set_index(df1['time'], inplace = True)

df2['time'] = pd.to_datetime(df2['time'],unit='s')
df2.set_index(df2['time'], inplace = True)
df1 = df1.drop(['high','low','open','volumefrom','volumeto','time'] ,axis=1)
df2 = df2.drop(['high','low','open','volumefrom','volumeto'] ,axis=1)
df1.rename(columns={'close': 'BTC'}, inplace=True)
df2.rename(columns={'close': 'BCH'}, inplace=True)
self.df3 = pd.concat([df1,df2],axis=1)
#self.df3['spread'] = self.df3[self.ticker1] - self.df3[self.ticker2]
#self.df3['spread_pct_change'] = self.df3['spread'].pct_change()
#add cointegration model X1 - X2 = should be stationary
self.df3['full_z_coint'] = self.df3['BTC'] - b*self.df3['BCH']
self.df3['b_x_bch'] = b*self.df3['BCH']

#prints df to check data

#returns the final dataframe, with 1st element dropped as its nan for spread_pct_change
def fetch_df(self):
return self.df3.loc['2017-12-12':]

To show the output of this class, lets instantiate the class and pass in the arguments (‘BTC’,’BCH’,’day’,500) , day is the frequency and 500 is the number of days.

to display all the results we can use pd.set_option(‘display.max_rows’, 400) to show the first 400 entries.

x = Data_Puller('BTC','BCH','day',500)
#pd.set_option('display.max_rows', 400)
#instantiate Data_Puller class then fetch_data
q = x.fetch_df()

At this stage its a bit ugly, but is has all the data we want. The full_z_coint column shows the spread between BTC and b*BCH (its the same as the ‘z’ variable we defined earlier in the post), this is effectively the ‘instrument’ we are going to trade as its stationary and mean reverting. Of course to trade this ‘spread’ we actually need to take a long position and a short position in BTC or b*BCH.

The variable b*BCH shows the value of BCH multiplied by the learned parameter ‘b’ which is approx 3.99, as we derived earlier.

So that’s Data_Puller. Next up is the Portfolio class. This is the most complex component as it does all of the heavy lifting in terms of trading logic and execution.

Lets walk through it.

The __init__() function defines a dataframe called _port() and we pass in the column names;

ts=time stamp, signal= buy sell or hold (this logic will be built in the Strategy class next), action=indicates what action we took given the signal(bought,already bought, closed out etc), sold/bought value=dollar value of trade, U_pnl=the unrealized profit/loss showing the running pnl, R_pnl=realized profit/loss once we have actually closed out a position.

Next we initialize a few variables that will track what position we currently hold, running pnl etc.

Next is the close_out() function. This is important as it will be used throughout the backtester logic to close any open position when the trigger is met. We define the close out trigger event as being when the price hits or crosses the mean value.

What follows is the logic to handle each of the possible trade signals, which will be generated in the Strategy class; Hold, Long, Short. In each of these situations we do the following:

  • check current_pos to see if we already have a position
  • calculate new sell/buy units by taking our $1000 initial value and adjusting to today's price and quantity
  • if we have a position and the new price has crossed the mean value ‘close out’ threshold, then we run the close_out() function
  • update current position to reflect any changes
  • print out any actions to console
  • pass the new current values down to the end of the class to be added to our _port portfolio dataframe
#Portfolio class handles trade logic
class Portfolio:
def __init__(self):
#self.orders = pd.DataFrame(columns=['TS','Order','tick1','tick2'])
self._port = pd.DataFrame(columns=['ts','signal','action','sold_value','bought_value','U_pnl','R_pnl'])
# self.current_budget = 1000000
self.signal = None
self.prev = None
#bought / sold
self.current_pos= "empty"
#self.pnl = pd.DataFrame(columns = ['pnl'])
self.bought_sold_price = 0
self.stamp = 0
#self.sold_value = 0
#self.bot_value = 0
self.sell_units = 0
self.buy_units = 0
self.value_2 = 0
self.value_1 = 0
self.rpl = 0

def close_out(self):
self.rpl += (1000 - self.value_2) + (self.value_1 - 1000)
self.current_pos ='empty'
print("close out position")

def position(self,ts,tick1,tick2,price,tot_trade_amount=2000):
print('current pos:',self.current_pos)
print("bought / sold price: ",self.bought_sold_price)
print('this is prev:', self.prev)
print('this is the signal:',self.signal)
single_trade_amount = tot_trade_amount/2
action = None

#logic for Hold signal

if self.signal =="Hold":

if self.current_pos =='sold':
print("sold tick")
self.value_2 = self.sell_units * tick2
self.value_1 = self.buy_units * tick1
elif self.current_pos == 'bought':
self.value_2 = self.sell_units * tick1
self.value_1 = self.buy_units * tick2
print("Hold neither bought nor sold")
self.value_2 = 0
self.value_1 = 0

print("hold 1")
print("caputured by Hold")

if self.current_pos == 'bought' and price >
print("hold 2")
action = "Closed out Long"
#self.current_pos ='empty'

elif self.current_pos =='sold' and price <
print("hold 3")
action = "Closed out Short"
#self.current_pos ='empty'
print("hold 4")
print("""take no action -> Hold""")
action = "Held"

#logic for Short signal

elif self.signal =='Short':

print("caputrd by Short")
sell_units = single_trade_amount/tick2
buy_units = single_trade_amount/tick1

if self.signal == 'Short' and self.signal != self.prev:
print("short 1")
if self.current_pos == 'bought':
self.value_2 = self.sell_units * tick2
self.value_1 = self.buy_units * tick1
elif self.current_pos == 'empty':

print("short 2")

print("Went short: sold",sell_units,"units of BTC","at a price of",tick2, "and bought",buy_units,"of b*BCH at a price of",tick1)
#self.sold_value = sell_units*tick2
#self.bot_value = buy_units*tick1
self.sell_units = sell_units
self.buy_units = buy_units
self.value_2 = self.sell_units * tick2
self.value_1 = self.buy_units * tick1
self.bought_sold_price = tick2 - tick1
self.current_pos = 'sold'
action = "Went Short!"
print("short 5")
print("current pos must be already sold - check!")
action = "Already Short!"
self.value_2 = self.sell_units * tick2
self.value_1 = self.buy_units * tick1
print("short 6")
print("prev signal must be Short - check!")
action = "Already Short!"
self.value_2 = self.sell_units * tick2
self.value_1 = self.buy_units * tick1

#logic for Long signal

elif self.signal =='Long':

print("captured by Long")
sell_units = single_trade_amount/tick1
buy_units = single_trade_amount/tick2

if self.signal == 'Long' and self.signal != self.prev:
print("long 1")
if self.current_pos == 'sold':
self.value_2 = self.sell_units * tick1
self.value_1 = self.buy_units * tick2
action = "short => close out"
elif self.current_pos == "empty":

print("long 2")

print("Went Long: sold",sell_units,"units of b*BCH","at a price of",tick1, "and bought",buy_units,"of BTC at a price of",tick2)
#self.sold_value = sell_units*tick1
#self.bot_value = buy_units*tick2
self.sell_units = sell_units
self.buy_units = buy_units
self.value_2 = self.sell_units * tick1
self.value_1 = self.buy_units * tick2
self.bought_sold_price = tick2 - tick1
self.current_pos = 'bought'
action = "Went Long!"
print("should be 1000", single_trade_amount)
print("tot trade amount", tot_trade_amount)
print("long 5")
print("current pos must be already long - check")
action = "Already Long!"
self.value_2 = self.sell_units * tick1
self.value_1 = self.buy_units * tick2
print("long 6")
print("prev signal must be long - check!")
action = "Already Long!"
self.value_2 = self.sell_units * tick1
self.value_1 = self.buy_units * tick2
print("not captured 1")
print("not captured by buy sell or hold need to fix!")

#print("tick1: ", tick1, "tick2: ", tick2)
#calculate unrealized pnl and finally update _port()
        urpl = (1000 - self.value_2) + (self.value_1 - 1000)
self._port.loc[len(self._port)] = [ts,self.signal,action,self.value_2,self.value_1,urpl,self.rpl]
self.prev = self.signal

Lets remind ourselves of the ‘spread’ that we are trading. Any time the price moves above the top blue line, we short the spread by selling BTC and buying b*BCH. If the price goes below the lower blue line we do the opposite; buy BCT and sell b*BCH. When the spread value crosses the mean (red line) we close out any position we may have.

Remember there is one thing missing from this trading strategy-stop loss!

This code generates a plot to show the ‘spread’ that we are trading. I’ve added the index number of the trades for reference:

#shows the spread we are trading with mean (red line) and +- 1std (blue lines)
_mu = np.mean(q.full_z_coint)
#plot every 5th index for debugging and reference
for i ,txt in enumerate([x for x in range(len(q))]):
if i%5==0:


The final component is the Strategy class. This executes the event driven backtester by pulling one row of data at a time from the Data_Puller class, passing it through the backtester logic handled by the Portfolio class via the position() method. This is the core of an event driven backtester. Rather that simply vectorizing the whole time series and applying the trading rules to all points at the same time, we simulate what would happen with our strategy tick-by-tick. This gives us a much more realistic simulation and an expandable framework whereby we could add in functionality to account for transaction costs, slippage, liquidity, microstructure events etc, into our backtest.

#create strategy to perform on any pair.
class Strategy(Portfolio):

def __init__(self):
#use Super to get Portfolio attrs
#price_feed = Data_Puller().fetch_df()
self.sdev = np.std(q.full_z_coint) = np.mean(q.full_z_coint)

#go long / short if +- 1 std, sell when hit mean
def strat(self):

while q.empty==False:

#pop .loc and drop it...

btc,bch,ts,z_coint,b_x_bch = q.iloc[0]

#compare to plus / minus 1 stdev -> generate signal
if z_coint > + self.sdev:
#self.orders.loc[len(self.orders)] = [ts,'Short',btc,bch]
self.signal = 'Short'

elif z_coint < - self.sdev:
#self.orders.loc[len(self.orders)] = [ts,'Long',btc,bch]
self.signal = 'Long'

#self.orders.loc[len(self.orders)] = [ts,'Hold',btc,bch]
self.signal = 'Hold'



#function to return tick by tick printout and R_pnl chart
def get_portfolio(self):
pd.set_option('display.max_rows', 400)
return self._port.head(360)
#return self._port

Lets run the backtester and see what happens. First we instantiate the Strategy class and run the strat() method. This will start printing out a real time tick by tick display of various bits of information to show what the backtesting engine is doing:

p = Strategy()

backtester prints out tick-by-tick events
sample of print out as backtester runs

Here’s a selection of what’s included in the print out:


previous signal (for debugging)

captured by shows which logic is triggered (short/long/hold) again for debugging,

short1, short2 references which part of the ‘short’ logic is activated

a string printout of what we’ve bought and sold

the two floating point numbers represent the ‘sell and buy units’ that is the amount of BTC and BCH that we buy given that each new position is always $1000 long and $1000 short.

finally we print out a plot showing the realized pnl and a pandas dataframe showing all the tick-by-tick events that have happened:

plot of Realized pnl in $
final printout of all events in backtester

We made a profit! But remember what we said earlier, the way this algo is structured means that there are no stop losses, only take profit signals (assuming that stationarity holds).

Backtesting pitfalls

The purpose of a backtester, is that it is a historical simulation of how a strategy would have performed.

Backtesting is one of the most misunderstood concepts in finance. In financial literature it is done badly, with many authors committing structural and statistical errors in their backtester. Below is a list of the “7 sins of quantitative investing” by Luo et al [2014].

  • Survivorship bias — using an investment universe that doesnt include companies that went bankrupt / delisted. The S&P500 today is different that 10yrs ago.
  • Look-ahead bias — using data that was not available at time of the simulation.
  • Storytelling — justifying the results after the event or simply selecting the data that fits your predetermined ‘story’.
  • Data snooping — incorporating test data in training data.
  • Transaction costs — simple backtesters don’t account for slippage, costs, fees etc.
  • Outliers — using extreme results with a low probability of ever occurring again.
  • Shorting — related to transaction costs, the cost of selling short is unknown unless you actually made the trade.
Glancing through this list you may notice something…We have committed most of these sins!

Not only did we fall prey to some of these pitfalls we also fell for many more! As an example, remember at the beginning of our analysis we arbitrarily split the data into training | test sets ? Well, look what happens when we shift our training set window forward by jut 2 weeks…its no longer cointegrated!

#shift training set window forward by 2 weeks
shifted_train_btc, shifted_train_bch = df['btc'].loc['2017-12-26':'2018-5-13'], df['bch'].loc['2017-12-26':'2018-5-13']
t,p,crit = coint(shifted_train_btc,shifted_train_bch)
#test for significance
if p <0.05:
print('NOT Cointegrated')

As another example, our original linear model based on the training set we know to be stationary, but what about the test set? after all, this is what counts when running a backtester:

The test set is not stationary!

#run adf again, this time on Test set
z_pval = adfuller(z)[1]
if z_pval<0.01:
print(z_pval,"Huzzah!, it's Stationary")
print(z_pval,":Not stationary")
plt.axhline(z.mean() + z.std(),color='b')
plt.axhline(z.mean() - z.std(),color='b')
test set model is NOT stationary!
test set model is NOT cointegrated!

Clearly we have committed a number of mistakes when it comes to building a statistically robust backtester. The main issues in our case, are based around the statistical properties of time-series data. In particular, that one window of data can have a particular distribution, but another (close by) window can be completely different!

There is a labrinthine rabbit hole we could go down here with backtesting pitfalls, but this post is long enough already.

Next steps

We have learnt the statistical technique of cointegration along with stationarity and time series. We then learned how to test for it using python statsmodels functions coint and adfuller.

We constructed a hypothesis for why BTC and BCH might be cointegrated, test for it and build a non-trivial, event driven backtester using python.

Finally we looked at what we did wrong and the potential pitfalls when conduction backtesting.

  • Now you should run the backtester using either the jupyter notebook or directly in the terminal using the .py file which can both be found here.
  • For further exploration, try using some of the other crypto pairs available on the cryptocompare API to see if they are cointegrated.
  • You could extend the logic of the Portfolio class to account for transaction costs.
  • You could try a rolling window when testing for stationarity, to ensure the whole series is stationary, not just a select window.
  • Currently the logic executes a new buy/sell trade simultaneously to when the signal was generated, you may want to try generating a buy/sell signal at close of play on one day, then executing the trade the next day ( this isn’t necessary for crypto which is 24/7 but equities have a fixed trading day).
  • I’d love to hear your thoughts on any of the topics discussed!
  • Read my other blog posts on statistical testing…
  • if you are interested in Deep Learning and want to learn how backpropagation works, check out this tutorial…
  • Follow me for more on quant finance, deep learning and more!
  • Say hi on twitter at