In this article we consider Level II order book data of the most traded cryptocurrency product — BitMEX XBTUSD contract. Specifically, the goal is to investigate how order flow imbalance influences price change. The piece has the following structure:
- Level I Trades and Quotes (TAQ) Data
- Order Flow Imbalance (OFI) and Trade Flow Imbalance (TFI)
1. Level I Trades and Quotes (TAQ) Data
Standard limit order book (LOB) data usually contains a collection of bid prices, bid sizes, ask prices and ask sizes. Level II data consists of the best bid and ask levels only, thus providing the price at which one can sell (best bid) and the price at which one can buy (best ask) and the corresponding volumes over time:
As you can see the granularity of the data is very fine - we have a nanosecond-precision timestamp. Each row represents a single change at the top of the order book, otherwise known as tick. In other words, if there is a limit, cancellation or a market order that changes the state of Level I data. In trades data, each row represents a distinct market order:
Note that in both sets of data, “Volume” column represents the number of contracts as opposed to the number of Bitcoins. Such data can be acquired in multiple ways. The easiest way is to connect to an exchange data API via a websocket protocol, and write data to the convenient storage medium. Cryptofeed Python library provides a great interface to seamlessly connect to a number of exchanges, including BitMEX, and manage live order book data.
Data at our disposal starts from 1 October, 2017 and ends 23 October, 2017. Quotes and trades data contains 81.3 million and 38.9 data points respectively. Original data is contained in flat CSV files and is later partitioned by trading day and stored in kdb+, an in-memory high-frequency database.
2. Order Flow Imbalance and Trade Flow Imbalance
Order flow imbalance (OFI) is a quantification of supply and demand inequalities in a LOB during a given time frame (Cont et al, 2014). OFI rests on the fact that any event that changes the state of a LOB can be classified as either the event that changes the demand or the event that changes the supply currently present in a LOB. Namely:
- Increase in demand in a LOB is signified by an arrival of a limit bid order.
- Decrease in demand in a LOB is signified by either an arrival of market sell order or full or partial cancellation of a limit bid order.
- Increase in supply in a LOB is signified by an arrival of limit ask order.
- Decrease in supply in a LOB is signified by either an arrival of market buy order or full or partial cancellation of a limit ask order.
We would like to define a quantity that will reflect these supply and demand changes. Formally, we can define it as:
Quantity e represents a change in supply / demand between successive order book states and it is conditioned on bid and ask prices to match the definition provided earlier. To provide some intuition of mechanics of e, if bid volume increases by some volume v, signifying an increase in demand via a limit bid order placement, e takes on the value of
since neither of the best bid and ask prices actually changed. By construction of Level I quotes data, only one event can occur between the observations, which means that previous ask volume will equal to the current ask volume, so the two cancel out:
This implies that:
the size of the new limit order added to the bid queue v. In summary, e measures the supply / demand impact of n-th order event.
Order flow imbalance is an aggregation of impacts e over a number of events that take place during time frame t:
where N(t) is the number of events occurring at Level I during time frame [0,t]. OFI can be seen as an accumulator of supply and demand changes over a given time frame. The response variable is the contemporaneous mid-price change in number of ticks over the same time frame as OFI:
Where MP is the arithmetic midpoint between best bid and ask prices. We normalise the resulting mid-price change by tick size, which is $0.10 in our case. The corresponding Python code for creating OFI and price change variables is:
The model is to be fit by the method of Ordinary Least Squares (OLS). The chosen time-intervals k are 1-second, 10-second, 1-minute, 5-minute, 10-minute and 1-hour. Function for fitting and visualising the model is defined as follows:
We also consider trade flow imbalance (TFI). Trade events are a subset of order book events considered in OFI. By intuition, trade flow imbalance may not have as big of an explanatory power as OFI because components of the latter are a superset of components of the former. However, when one places and cancels the order, he has virtually no cost of doing so, whereas to place a trade, one pays a commission and a bid-ask spread. Trade flow imbalance over time interval t is defined as:
I is the conditional identity function that differentiates between market sell and buy orders, signing them accordingly. N(t) is the number of events occurring at Level I during [0,t].
This study will investigate to what extent trade flow events impact price in cryptocurrency markets by means of the following linear regression model:
Once again, the model is to be fit via OLS regression. Python code for fitting and visualising the fit is:
Order Flow Imbalance
1-second OFI model demonstrates the point mentioned earlier — low update arrival rates require sampling over a bigger time frame to observe a substantial price change / order flow imbalance. It is visible that the scatter plot below represents a “sliding cross” formation whereby not much activity is able to develop as most points lay close to the axes of the graph. Correspondingly, the linear relationship between OFI and price change is poor at this sampling window. R-squared of 1-second OFI linear model fit is 7.1%. All results are presented in the figure at the end of the section.
When k is set to 10 seconds, the linear model has a much better fit — R² = 40.5%. The linear relationship starts to resemble the one Cont et al (2014) observe. 1-minute time frame provides an even clearer demonstration of the linear relationship between price change and order flow imbalance.
The interpretation of the 1-minute model is very intuitive: for 10000 units of net order flow, the expected average mid-price change is 0.65 ticks (using the parameters found in the results table). Note that the price impact coefficient does not differentiate between types of order book events, hence generalising for cancellation, placement and trade order volume flows.
Trade Flow Imbalance
The intervals that are used for calculation of TFI and contemporaneous price change are the same as the intervals used in OFI modelling: 1 second, 10 seconds, 5 minute, 10 minutes and 1 hour.
The first model regresses TFI on contemporaneous mid-price MP sampled over 1-second intervals. The model produces a coefficient of determination of 12.8%, which is higher than the R² achieved for OFI over the same sampling period. At 10-second sampling interval R² of the TFI model is 37.3%, which is lower than its 10-second OFI counter part, whose R² is 40.5%.
For sample periods higher than 10 seconds, however, TFI is a consistently better estimator of the price change than OFI. At 1-hour sampling time grid R² is 75.2%, which is 20% higher than the order flow imbalance R² for the same time grid. Scatter plot below exhibited visually demonstrates that the fit is more linear than that of OFI.
These results are counter to the findings of Cont et al. (2014)that find that for all 50 U.S. stocks chosen for their analysis, order flow imbalance takes precedence of explaining contemporaneous price change for every single one. The initial hypothesis that aggregate order flow imbalance has stronger explanatory power than trade flow imbalance is rejected based on these results. Results in Table 2 confirm that the estimates of the beta coefficients are statistically significant for all sampling periods k. Note that the p-values are close to zero for all TFI and OFI coefficients, and instead of being reported, are instead subtracted from 100%, yielding the probabilities of the coefficients not being obtained by chance. t-statistics are also included for every estimated coefficient.
Order flow imbalance provides a good approximation for realised mid-price change, and there are a few potential reasons why OFI does not provide a better fit. First of all, it helps to understand under which circumstances OFI provides an inferior estimate of contemporaneous price change. More crudely, under what predicates will the data points end up in second and fourth quadrants on scatter plot such as the ones presented in the previous section. Let’s assume that at at time t there is volume Vb present and best bid and Va present at best ask, such that Va > Vb. At time t+1 a cancellation order arrives on ask side, cancelling amount qc < Va, thus, ceteris paribus, registering a positive effect on the current order flow calculation, and leaving the mid price unchanged. At time t+2 there is a sell market order of quantity qm such that qm > Vb and qm<qc. This market order moves the mid-price down, but because qm<qc, current OFI value is still positive. The resulting data point will end up in the second quadrant of the scatter plot. Thus, it is unevenness of volume across the LOB price levels that exacerbates the estimation of price change by OFI.
The goodness of fit is a function of two main factors: (a) depth D at all price levels and (b) more realistically, dispersion of D, since all real-life markets will have non-constant D. If LOB price levels have a very “volatile” D, the effects of order flow won’t even out as well as if D is not so dispersed. Concluding from statistics and empirical evidence, cryptocurrency prices are impacted by order flow in a much less deterministic fashion than established markets due to lower compliance with the stylised model of LOB that this study assumes.
The results also show that the impact of trade flow imbalance on prices is stronger than that of order flow imbalance. The explanatory power of TFI depends on the same depth parameter D and its dispersion across price levels. Circumstances under which trade flow will not be a good estimator of price change are, therefore, similar to circumstances under which order flow will not be a good estimator of price change.
The aggregate order flow already includes trades, so why does the trade flow on its own explain price movements better? The argument comes down to the fact that while aggregate order flow includes more information, in the realm of cryptocurrency market microstructure as well as macrostructure, such information may be of little value, due to noise. There are a few possible reasons that may help explain this phenomenon, both macrostructural as well as microstructural.
Unlike U.S. Equities, that are subject to multiple anti-spoofing policies including Dodd-Frank Wall Street Reform (spoofing constitutes an action of posting and cancelling limit orders in quick succession in order to disguise the intent of executing an order), there are no equal regulatory counterparts in cryptocurrency markets. This may have repercussions for why order flow may carry relatively lower information as opposed to trade flow in cryptocurrency markets. Traders who submit and quickly cancel orders to fake the intent of buying / selling are not legally constrained from doing so. Therefore, market agents are more inclined to post low-information orders of any magnitude into the LOB if that benefits their agenda. For example, a market maker that sits on a large inventory could choose to spoof in the direction that would benefit the value of his net inventory. This leads to ephemeral liquidity i.e. orders that do not intend to be executed and therefore, do not contribute to net price change. On the other hand, to execute a market order, a trader will pay a commission as well as a bid-ask spread, thus signifying a high-information intent that, as can be evidenced from the results, has a significant impact on price.
D and its variance across price levels are the main factors that drive explanatory power of both OFI and TFI. The results also conclude that TFI has an overall better explanatory power than OFI, while the component events of the latter are a superset of component events of the former. This phenomenon is largely attributable to two things that are both, though indirectly, functions of parameter D. First of all, it is possible to consider the bid-ask spread having an effect on low explanatory power of OFI. The average spread of XBTUSD contract is 2.87 ticks, with standard deviation of 11 ticks, which is large and dispersed if compared to American equities, where large cap stocks rarely have average spreads larger than one tick. When the spread is large, the mid-price can be manipulated at little or no cost by posting and cancelling limit orders at best bid and ask, whereas if the spread is almost always at one tick, there is no cost-less way of manipulating the price in the same way. In such circumstances OFI is more likely to have a poor explanatory power. Cont et al. (2014) present that the CME Group stock that has an average spread of 103 ticks (the biggest of the group of selected stocks), also has the worst OFI R² of 35%, as compared to other stocks used in the study. Contrary to our results, however, CME’s TFI has worse explanatory power than its OFI counterpart, which may be attributed to its below-average quotes / trade ratio of 27.14. XBTUSD, on the other hand, has a quotes / trade ratio of 2.08, which is means that there is an average of only two quotes per trade! That suggests that there is very big propensity to trade (much higher than in U.S. Equities) in cryptocurrency markets. This propensity may imply a lack of market makers that are able to provide liquidity, and hence stabilise the depth across the order book. Such conditions may well justify the generous market maker rebates that BitMEX pays to liquidity providing traders.
Cont, R., Kukanov, A. & Stoikov, S. (2014), ‘The price impact of order book events’, Journal of financial econometrics 12 (1), 47–88.