The Sunk Costs of Market Data

Ekin Tuna
ChainSlayer
Published in
4 min readMay 29, 2020

As a company that works with market data we like to keep our finger on the industry pulse. In the past 6 months we have been talking to well over 50+ crypto hedge funds that focus on quantitative trading strategies and automated trading execution.
In these discussions we’ve obtained amazing insights about what excites and worries them, and what they see the future holding for cryptocurrency trading. This post is the first in a series unpacking these insights.

In our interviews with crypto traders we had an interesting observation when we asked them about how are they are accessing the market data. Most of them told us that they are accessing it with the API integrations that they have built themselves. Here’s how the generic case of the trading and market data infra looks like for the median crypto trading firm.

Usually the founder(s) of the firm started a few years back. They built the first integrations to exchanges to collect data that was used to backtest their trading strategies. After finding a profitable strategy they built the trading execution infrastructure. Some of them integrated an existing trading logic framework, and some built the entire execution engine themselves.

As time passed and the trading strategies proved to be profitable they expanded to new trading pairs and markets. Considering the already large number of crypto exchanges, and the incredible churn that they have, new trading opportunities kept on popping up. At the same time they hired an engineering and analyst team to support the expanding scope of pairs and markets. Today on average these quant funds are small, consisting of 10–50 employees.

Because the crypto markets are rapidly changing, the opportunities also keep coming and going. The lifetime for a single strategy is not always long, and some strategies might only work for a couple of weeks. In addition, many of the crypto exchanges are new to business and the quality and availability of their data keeps changing constantly. Sudden changes in data formats, unannounced downtime and outright incorrect data make integrating and maintaining the connection to the exchanges problematic. If the connection is not working it is also not possible to execute trades that would be profitable according to the trading strategy.

The impact of this is significant. We found out that employees in data science and engineering roles in quant funds spend on average 30% of their time either fixing broken connections to exchanges or maintaining the stored data (often due to gaps in data caused by API downtime). This is especially frustrating as the trading opportunities might only exist for a short time. In these situations the question that they face is to either keep maintaining the data collection for a strategy that is not working anymore or risk losing the opportunity when it pops up again at a later time.

The financial impact of this is also large. From the total budget of a quant fund 15% is allocated solely to obtaining and storing market data. This is because, aside of the cloud or local infrastructure costs, the time to building a robust and automated data collection system is really significant. Often it takes a year for a full time senior engineer.

So here is why we called this post “the sunk costs of market data”. When we consider that the quant funds are not utilizing the data that they have on scale by redistributing it we begin to understand the scope of the problem. Tens of companies globally are building these highly redundant, engineering heavy data storages. And in each of these companies there are normally 1–3 highly skilled engineers working full time only on ensuring that the data connection and storage works properly. The interesting part is that the data that is required for all of the funds is fundamentally the same: orderbook, candles and trades.

So where does this leave us? When we asked the funds that why do they then keep doing things this way we got the same type of answer: they don’t have a choice! The current data providers are not offering the type of on-demand, high quality data infrastructure which would allow them to forgo their own data collection infrastructure and move to a more efficient and cost effective solution. We, of course, are working on solutions to fix that!

If you want to learn more about how to improve your market data collection or share your ideas contact us at ekin@chainslayer.io or subscribe to our newsletter. We’d love to hear your ideas!

--

--