The role of data in algorithmic trading

Samur Araujo
Algologic.ai
Published in
3 min readDec 13, 2018

Data is essential in any business and algorithmic trading is no different. The driving power behind any trading algorithm is data; not just any data, but good data. In this short article we share our experiences dealing with historical data from crypto exchanges and the challenges that we have faced.

There are 3 basics types of data used in algorithm trading:

Tick-by-tick data — this data contains the trades executed on an exchange. Usually, it contains a timestamp of the trade, its price, and volume. For example, for BTCUSD, a line of tick-by-tick data would be:

timestamp,volume,price
1530403200796,0.05,6391.08

At algologic.ai, we add a few extra fields to the data to make it more human readable:

id,timestamp,datetime,volume,price
54164393,1530403200796,2018-07-01 00:00:00.796000,0.05,6391.08

Candles (OHCLV) data — candles are aggregations of the tick-by-tick data. Usually, you see candles of 1 min, 5 min, 15 min, 1 hour, 3 hours, 1 day, etc. Candle data contains open price, high price, close price, low price and volume of the trades in the specific aggregated interval (e.g 1 minute). This is why they are also referred to as OHCLV data. Below is an example of the format of algologic.ai crypto candles dataset.

datetime,timestamp,open,high,low,close,pos_volume,neg_volume,volume
2018-01-01 00:00:00,1514764800000,3.5117,3.512,3.5117,3.512,235.00,-81,153

Order book data — order book data contains the bid and ask data that forms an order book. Usually, each entry in an order book data contains 4 data points: bid and ask prices and bid and ask volume. An order book snapshot is a list of all bid and ask prices / volume at a current timestamp. Order books are usually very large datasets and are only used by professional algorithm traders.

The first challenge you encounter when dealing with any of these datasets is to collect them. Not all crypto exchanges provide a means to collect data. The process of collecting historical data is time consuming, error prone and requires a lot of engineering to do it properly. Furthermore, different crypto exchanges provide different data formats, once you collect the data, you need to unify or normalize it into a single format. At algologic.ai, we have solved this problem and we are glad to provide our unified data for download at our webshop.

Another challenge is data quality. Quite often, connections to exchange API’s drop or fail. This problem can result in missing data points in your data, which is a major cause of poor performance of algorithms. Handling data connectivity is a challenge in itself and should not be underestimated.

Once you have your data collected and the quality is verified and confirmed, then you are ready to backtest your algorithm. Backtesting is the process of verifying whether your algorithm works on historical data before deploying it in the market. It is the best way to check if your code is correct and to estimate if the algorithm will be profitable. It can save you time and money. Working with unified data may simplify your backtest by saving time on coding and debugging as well.

Fortunately, you can buy historical data online and avoid the headache of collecting and processing it. If you can afford to buy data, it will save you time to focus on building your the algorithm, which is what actually matters.

In this article, we have briefly described the challenges you may encounter in collecting data for your algorithm. Data is essential for algorithmic trading and you should know that there is a trade-off between buying it and collecting it yourself.

We are glad to announce that we provide our data at our webshop. Our data is clean, unified and high quality. Using our data will save you huge amount of time on coding and data processing. Please visit our shop at algologic.ai.

--

--

Samur Araujo
Algologic.ai

CTO at algologic.ai — building a team of data engineers and algorithm traders