Conquer High Frequency Backtesting: A Quantitative Journey, Part V

Crypto Chassis
Open Crypto Trading Initiative
7 min readFeb 11, 2021
Photo by Jamie Hagan on Unsplash

In one of our previous posts entitled “Hammer Test Your Backtesting”, we provided an in-depth discussion on some technique called stratified sampling and demonstrated its use with a vivid example using backtrader framework. Leaving the territory of medium-to-low frequency trading, today we’ll climb up the mountains of high frequency trading (HFT) strategies. In this post, we will only focus our challenge on one particular type of HFT: market making. Backtrader framework was established to handle backtesting of medium-to-low frequency trading strategies relying on traditional OHLC datasets. For HFT market making backtesting, the problem becomes much more difficult (see https://quant.stackexchange.com/questions/38781/backtesting-market-making-strategy-or-microstructure-strategy). Because traders (i.e. you) are more interested in practical, manageable, and deployable implementations rather than math, theories, and PhD thesis, the objective of this post is to show that establishing a minimalistic HFT market-making backtesting framework from scratch is quite feasible and doable with limited efforts. The idea presented here isn’t bound to or confined by any particular software or data provider (including us).

The demonstrational market-making strategy to be used in this post is a quite simple one. In the crypto community, it is sometimes called pure market-making. At the beginning of every period (e.g. 5 seconds) place a buy order at a certain percentage below the mid price with the exchange’s minimum order size (minimal disruption on the market), place a sell order at a certain percentage above the mid price with the same size, then wait til the end of the period, and at the end of the period cancel all pending orders. The goal is to backtest such a strategy. The substantial challenge associated with backtesting a market making strategy is to answer the question of what should happen to our orders after they are “submitted” to this replayed environment, i.e. how to determine whether these orders should be filled or not (and perhaps when they should be filled). This problem is largely absent in backtesting a medium-to-low frequency strategy relying on OHLC datasets. Here is our analysis. Market makers depend on market takers to fill their orders. Market takers initiate a trade if they find market maker’s price satisfactory. These trades were recorded as historical trades. Therefore we can use trades information to “fill” market maker orders in the replayed environment. More specifically, for a given period, collect all trades that were initiated by a seller (i.e. seller is taker, thus buyer is maker), if the lowest price ≤ our market-making buy order’s price, then fill our buy order. Similarly collect all trades initiated by a buyer (i.e. buyer is taker, thus seller is maker), if the highest price ≥ our market-making sell order’s price, then fill our sell order. Although the idea presented here isn’t tied to any software provider or data provider (including us), we have to be careful about the historical trade datasets to be used in that they have to be crystal clear about whether a trade was initiated by a buyer or a seller: https://medium.com/open-crypto-market-data-initiative/leveraging-trade-directions-a-quantitative-journey-part-iv-cf96221b799e).

With the above theory in mind, lets prepare the datasets needed using Python (The code snippets shown below are minimal and reproducible. There’s certainly room for improvements). To determine our market-making order prices, we need to prepare one pandas dataframe holding market mid prices. The following code snippet demonstrates how to achieve this using https://github.com/crypto-chassis/cryptochassis-api-docs#market-depth:

import gzip
import pandas as pd
import pathlib
import requests
local_csv_name_1 = 'order_book_snapshot.csv'
local_csv_1 = pathlib.Path(local_csv_name_1)
if not local_csv_1.is_file():
r = requests.get(f'https://api.cryptochassis.com/v1/market-depth/coinbase/btc-usd?depth=1&startTime=2021-01-21')
file_url = r.json()['urls'][0]['url']
csv_string = gzip.decompress(requests.get(file_url).content).decode('utf-8')
with open(local_csv_1, 'w') as f:
f.write(csv_string)
df_1 = pd.read_csv(local_csv_name_1, index_col='time_seconds')
df_1[['bid_price','bid_size']] = df_1['bid_price_bid_size'].str.split(pat='_',expand=True).astype(float)
df_1[['ask_price','ask_size']] = df_1['ask_price_ask_size'].str.split(pat='_',expand=True).astype(float)
df_1['mid_price'] = (df_1['ask_price']+df_1['bid_price'])/2
df_1 = df_1[['mid_price']]
print(df_1)

Output:

              mid_price
time_seconds
1611187200 35493.970
1611187201 35491.085
1611187202 35493.695
1611187203 35502.295
1611187204 35512.420
... ...
1611273595 30802.220
1611273596 30803.285
1611273597 30823.285
1611273598 30832.510
1611273599 30852.095

Self-explanatory. You can see that we’ve deliberately chosen a bad day (2021–01–21) for Bitcoin: its price dropped from ~$35.5K to ~$30.8K in a single day.

To simulate whether our market-making orders should be filled or not, we also need to prepare another pandas dataframe holding raw trades information. The following code snippet demonstrates how to achieve this using https://github.com/crypto-chassis/cryptochassis-api-docs#trade:

local_csv_name_2 = 'trade.csv'
local_csv_2 = pathlib.Path(local_csv_name_2)
if not local_csv_2.is_file():
r = requests.get(f'https://api.cryptochassis.com/v1/trade/coinbase/btc-usd?startTime=2021-01-21')
file_url = r.json()['urls'][0]['url']
csv_string = gzip.decompress(requests.get(file_url).content).decode('utf-8')
with open(local_csv_2, 'w') as f:
f.write(csv_string)
df_2 = pd.read_csv(local_csv_name_2, index_col='time_seconds')
df_2 = df_2[['price', 'is_buyer_maker']]
print(df_2)

Output:

                 price  is_buyer_maker
time_seconds
1611187200 35498.85 0
1611187200 35498.83 0
1611187200 35498.31 0
1611187200 35494.23 0
1611187201 35493.70 0
... ... ...
1611273599 30854.66 0
1611273599 30854.66 0
1611273599 30854.66 0
1611273599 30854.61 0
1611273599 30855.95 0

Notice that we not only need to know the price of the trade but also need to know in the trade whether the buyer is a maker or a taker.

At this point, we are ready to replay the historical market and simulate what our demonstrational strategy can achieve. At the beginning of the day, we have an initial capital of 0.1 BTC and an equivalent amount of USD priced at that moment. This is our initial 50/50 inventory. From the beginning of the day towards the end of the day, we will walk in small periods of 30 seconds. At the beginning of each period, we know the mid price of BTC/USD and place a buy order 0.01% (about 3 to 4 dollars) below the mid price for a quantity of 0.001 BTC (i.e. 1% of our initial BTC capital) and a sell order the same percentage above the mid price for the same quantity. Then we look at all the historical trades that happened during this period. If the lowest price of buyer-is-maker trades ≤ our market-making buy order’s price, fill our buy order, which in this simulated environment means adjusting our BTC and USD balances accordingly. Similarly if the lowest price of buyer-is-taker trades ≥ our market-making sell order’s price, fill our sell order. At the end of the period, the “unfilled” orders are “canceled” by us, which in this simulated environment means no balance change. Then we move on to the next period. Here is the code snippet:

period_seconds = 30
initial_start_time_seconds = int(
dateutil.parser.parse(start_time_iso).timestamp())
end_time_seconds = initial_start_time_seconds + 24 * 3600
spread_percentage = 0.02
our_order_quantity = 0.001
balance = {
'time_seconds': [],
'btc': [],
'usd': [],
'total': [],
}
balance['time_seconds'].append(
initial_start_time_seconds)
initial_btc_balance = our_order_quantity * 100
balance['btc'].append(
initial_btc_balance)
initial_btc_usd_price = df_1.head(
n=1).iloc[0]['mid_price']
initial_usd_balance = initial_btc_balance * \
initial_btc_usd_price
balance['usd'].append(
initial_usd_balance)
balance['total'].append(
initial_btc_balance *
initial_btc_usd_price +
initial_usd_balance)
start_time_seconds = initial_start_time_seconds
while start_time_seconds < end_time_seconds:
print(
f"replaying {pd.Timestamp(start_time_seconds, unit='s')} (unix timestamp {start_time_seconds})")
mid_price = df_1[df_1.index <= start_time_seconds].tail(
n=1).iloc[0]['mid_price']
our_buy_order_price = mid_price * \
(1 - spread_percentage / 100 / 2)
our_sell_order_price = mid_price * \
(1 + spread_percentage / 100 / 2)
df_trade_price = df_2[(df_2.index >= start_time_seconds) & (
df_2.index < (start_time_seconds + period_seconds))]
trade_price_buyer_is_maker_lowest = df_trade_price[df_trade_price.is_buyer_maker == 1]['price'].min(
)
trade_price_buyer_is_taker_highest = df_trade_price[df_trade_price.is_buyer_maker == 0]['price'].max(
)
balance_btc_change = 0
balance_usd_change = 0
if not np.isnan(
trade_price_buyer_is_maker_lowest) and trade_price_buyer_is_maker_lowest <= our_buy_order_price:
balance_btc_change += our_order_quantity
balance_usd_change -= our_buy_order_price * \
our_order_quantity
print(
f'filled our buy order at price {our_buy_order_price}')
if not np.isnan(
trade_price_buyer_is_taker_highest) and trade_price_buyer_is_taker_highest >= our_sell_order_price:
balance_btc_change -= our_order_quantity
balance_usd_change += our_sell_order_price * \
our_order_quantity
print(
f'filled our sell order at price {our_sell_order_price}')
start_time_seconds += period_seconds
balance['time_seconds'].append(
start_time_seconds)
btc_balance = balance['btc'][-1] + \
balance_btc_change
usd_balance = balance['usd'][-1] + \
balance_usd_change
balance['btc'].append(btc_balance)
balance['usd'].append(usd_balance)
balance['total'].append(
btc_balance *
mid_price +
usd_balance)
print(pd.DataFrame(balance))

Output:

      time_seconds    btc          usd        total
0 1611187200 0.100 3549.397000 7098.794000
1 1611187230 0.099 3584.894519 7098.797549
2 1611187260 0.098 3620.421357 7101.703287
3 1611187290 0.097 3655.993869 7106.182504
4 1611187320 0.097 3656.000990 7109.841675
... ... ... ... ...
2876 1611273480 0.122 2833.211335 6592.858495
2877 1611273510 0.121 2863.985842 6587.328872
2878 1611273540 0.121 2863.992021 6601.932491
2879 1611273570 0.121 2863.998170 6584.236340
2880 1611273600 0.121 2864.004334 6593.022264

You can see that we started with an initial inventory of 0.1 BTC and 3549 USD, and ended with a final inventory of 0.121 BTC and 2864 USD, i.e. after a whole day’s work, we’ve increased the inventory of (the less-valued) BTC and decreased the inventory of (the more-valued) USD: a classical example of adverse selection. We’ll provide a deeper analysis on this topic in our next post.

To visualize the performance of this strategy over time, let’s plot the time series representing the BTC balance, the USD balance, and the total balance (with BTC converted to USD by the market mid price at each time point).

Again the plot represents a classical example on what happens to pure market making when the market itself is trending (as opposed to sideways): adverse selection. This is sad. But is it possible to circumvent it? Stay tuned for our next post.

Follow us on Github at https://github.com/crypto-chassis. 🚙

Disclaimer: This is an educational post rather than investment/financial advice.

--

--