Calculate OHLCV From Trading History: Doing The Right Thing, The Right Way.

Bitcoin and other cryptocurrencies
Bitcoin and other cryptocurrencies

Previously we published several articles on discussing important pitfalls that people might encounter when collecting tick-level order book data. Since then we’ve realized that in the crypto world there are lots of traders whose successful strategies are based on traditional indicators and therefore they only need data aggregated on a larger time scale (e.g. 1 minute or larger). Therefore we spent the past several months building, testing, and refining a new part of our data pipeline focusing on OHLCV data that are calculated from the trading history. There are two things that we want to share with the readers here: one is how easy it could be to get things wrong in these “simple” OHLCV calculations (even some exchanges have admitted to us that their OHLCV data aren’t exactly correct: see below), the other is how we have come up with an API endpoint design such that the user could obtain the entire historical OHLCV data using just one call (instead of gazillions of paginated calls).

First, it was surprisingly easy for an engineer, or even an entire organization, to get the OHLCV aggregation done incorrectly. Through a somewhat tedious process, you could download the raw trades from an exchange for the period of time that you are interested in, then download the OHLCV data for the same period of time, then import the raw trades into a SQL database or a Python program and calculate OHLCV values from the raw trades, and then compare with what have been downloaded directly from the exchange. We found that there are several exchanges having systematic inconsistencies between the calculated OHLCV and the downloaded OHLCV. Some exchanges are quite open to the issues reported by us, while some others aren’t so responsive. One exchange that we’d like to highlight and praise about their transparency is Kraken:

The OHLCV data available via the OHLC endpoint is calculated independently at each source (each API server, Kraken Pro, Cryptowatch, etc.), hence there are sometimes differences in the resulting OHLCV data due to timing inconsistencies, networking delays, or similar causes. The usual manifestation of this is discrepancies between our API data and our Kraken Pro/Cryptowatch trading platform, but it is also possible for multiple calls to the OHLC endpoint to return somewhat different OHLCV results…The OHLC endpoint is provided merely as a convenience (so that less technical clients can avoid making the above calculations), but the Trades endpoint provides our official market data, hence the OHLCV calculated from the trading history would be considered our official OHLCV.

Unfortunately there are some exchanges apparently having similar issues by inspecting their data but weren’t responsive to our inquiries: from the data we could clearly see that for some time intervals their calculation of OHLCV was based on partial time intervals: i.e. they missed some raw trades near the end of the time interval! This could happen if their OHLCV calculation was performed too early. We avoid such mistakes by not calculating the OHLCV values until there is one raw trade recorded in the next time interval which guarantees that the current time interval is absolutely “closed” (when the current time interval’s end time is reached, the interval may not be “closed” because there could late events due to numerous reasons).

Second, when we started to research on what was available by other data providers on the market for OHLCV data we found that all of them require the user to make lots of timestamp paginated requests in order to get historical data for a given exchange, a given instrument, and a given granularity: this is especially true for 1 minute granularity. Let’s make a simple calculation: for a given exchange, a given instrument, and 1 minute granularity, there will be a total of 60 * 24 * 365 * 5 ~ 2.6 million rows of data needed to be returned. This could very well fit into a gzip-compressed csv file. Therefore, we’ve come up with an API response design such that if your requested time range for OHLCV data is more than one month from the current moment, the response will include a link to a flat file that contains the needed historical data, but if your requested time range for OHLCV data is quite recent, the response won’t include such a link and the needed data will be represented by a normal json array. By surveying, interviewing, and beta-testing with some real algorithmic traders, we found that this approach allows them to quickly download historical OHLCV data, quickly experiment with different exchanges, instruments, and time intervals (by the way we provide pretty much all time intervals that you would ever need from as short as 1 minute to as long as 1 day: 1m, 3m, 5m, 15m, 30m, 1h, 2h, 3h, 4h, 6h, 8h, 12h, 1d), and quickly validate/invalidate/screen ideas.

All in all, you might be surprised to realize that the task of obtaining correct and accurate OHLCV data is easy said than done. The good news is that we’ve performed all the hard work for you and are more than glad to present you the now-official OHLCV endpoint:



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store