Step Two — Downloading and Processing Binance Quote Data

Artem Stepanenko
4 min readJan 14, 2024

--

In our previous article, we engineered a proprietary library for candlestick pattern recognition, tailored specifically for the Binance futures market’s continuous trading environment. We now shift focus to a pivotal aspect crucial for any algorithmic trading model: the acquisition and preprocessing of trading datasets. This article is not just for those following our project but also serves as a standalone guide for anyone keen on figuring out how to get trading data from Binance.

Binance has established an advanced infrastructure for developers in algorithmic trading by providing a comprehensive and multifunctional API. This extensive API includes multiple endpoints, enabling developers to interface with various layers of the exchange’s data architecture. It offers access to market data, account, and order management, as well as tools for conducting technical analysis.

For retrieving historical datasets of Binance futures, we need https://fapi.binance.com/fapi/v1/klines endpoint. This endpoint allows access to the entire history of futures trades by specifying the necessary request parameters: the trading pair symbol (symbol), the required timeframe (interval), the start (startTime) and end (endTime) date of the dataset being extracted, and the size of data requested (limit).

What to consider when using this endpoint:

  • List of trading pairs to download (symbol): To avoid overfitting the designed trading system, we must exclude artificially narrowing the list of used pairs. Ideally, a comprehensive analysis should encompass all actively traded pairs. However, it makes sense to filter out some pairs before analysis: stablecoin pairs (for example, USDCUSDT pair), which will show zero volatility due to their peg to the US dollar; or pairs with zero or near-zero trading volume and insufficient demand/supply in the order book, which will create problems of insufficient liquidity when implementing trading strategies in practice.
  • List of analyzed timeframes (interval): From the entire list of timeframes, it’s worth cutting off the most integrative (e.g., ‘1M’, monthly candles), which will have too few data for statistically valid assumptions; and ultra-short (e.g., ‘1m’, minute candles), whose charts have insufficient amplitude of fluctuations to at least cover the trading fee when making transactions.
  • Start and end date of the data being extracted (startTime and endTime): It makes sense to cover as large a time period as possible for analysis, but it should be noted that Binance futures themselves are a young market, operating since September 2019; and most of the traded pairs appeared even later than this date, so the data range for each pair will differ, which should be taken into account when comparing the results obtained.
  • Number of candlesticks requested (limit): In the Binance API, there is a limit of 1000 candles per request (for comparison, in the context of hourly candles, this equals 62 trading days), so it makes sense to use a loop that makes as many requests for the pair being loaded as necessary to load the required time period.

What we retain — processing data after downloading

The dataset retrieved from the specified endpoint provides detailed data on the traded pairs. However, for candlestick analysis, we need the following: candlestick’s time, its opening and closing prices, the highest and lowest prices, and the trading volume (a total of 6 fields). Post-download, it’s critical to cleanse the dataset of any superfluous data. It’s advisable to label the data fields consistently with your coding schema (e.g., ‘time’, ‘open’, ‘high’, ‘low’, ‘close’, ‘volume’) and convert the date format for optimal usability.

Date Format

Given Binance’s use of timestamps for trade data, for convenience in working with the datasets, we can immediately convert the timestamp into a human-readable format. It is important to consider that the Binance API will most likely provide you with data taking into account the settings of your time zone, and this is a potential problem when designing a trading system: your back-test will be based on data with a time zone shift, and trading bots will be tied to UTC quotes. Therefore, when downloading data and subsequently converting the date format, it is worth loading data immediately in the UTC time zone.

What we should get should look something like this:

Sample of Downloaded Dataset

Saving Data

For data storage, the *.CSV format is quite suitable, where we place data for each trading pair in a separate file, which are contained in folders indicating the timeframe of the data. This allows for sufficiently high-speed data processing and maintains the necessary universality in their application and combination.

Now, in addition to Japanese candlestick patterns, we also have a complete archive of Binance futures trading data, and we can proceed to design our first trading system.

The full code for fetching Binance futures data is provided below.

Fetch Binance Futures Data Function

We’ve come one step closer to launching our trading robot. However, to reach the launch of the trading robot in production, the following tasks still need to be addressed:

Stay tuned as we tackle these challenges and move towards launching our trading robot.

Thank you for reading! Your comments and feedback are greatly appreciated.

--

--