Deep Reinforcement Learning for Crypto Trading

Part 3: Data preparation

Published in

Coinmonks

9 min readMay 17, 2024

Disclaimer The information provided herein does not constitute financial advice. All content is presented solely for educational purposes.

Introduction

This is the first part of my blog post series on reinforcement learning for crypto trading:

This article explains the data collection and data normalization process.

As I mentioned in Part 0: Introduction, my main goal is to connect with potential employers or investors and ultimately become a professional quant.

Prerequisites

Santiment API PRO subscription

Resources

I do not publish datasets due to privacy concerns; however, I opensource code to collect them via Santiment and Binance/Bybit APIs.

dataset.ipynb

To scrape the dataset, you should create an account on the Santiment platform and acquire the API key. Binance and Bybit APIs market data are accessible without account registration.

Data used

Data is the fuel for machine learning algorithms. Therefore, obtaining historical and real-time high-quality data is a top priority. The vast majority of blog posts and research papers on this topic make use of only a few technical indicators. I believe this is enough for a publication but certainly not enough for real trading. My strategy uses four main sources for data:

Candlesticks (OHLCV) and technical indicators derived from them using Pandas TA library.
On-chain data from providers such as Santiment or Glassnode. In this blog series, we will focus on Santiment. Santiment offers a free trial of the two-week PRO plan and the most comprehensive suite of metrics for ERC-20 tokens. Glassnode, on the other hand, offers the best suite of on-chain metrics for BTC and ETH on the market; however, it targets institutional clients.
Data from crypto exchanges, such as Funding rate, Open interest, Open value, etc. You can fetch it directly from the exchange APIs but Santiment also does it for a few centralized exchanges.
Social sentiment metrics from social networks are also offered by the Santiment data provider.

Other types of data I’ve experimented with, but they are not included in the public part of my project:

limited order book data
proprietary price prediction indicators using deep learning

Traders use indicators from different time frames to make more informed decisions, and so does my strategy, which uses various metrics at 1-hour and 1-day timeframes. Bots operate in 1-hour discrete time steps, which means a reinforcement learning agent decides to open/close a long/short position or do nothing every 1 hour.

This blog series focuses on trading Fantom (FTMUSDT trading pair). I’ve chosen this coin because it has strong fundamentals and is one of the most volatile coins among the Top 100 by market cap; I need volatility for my strategy. Santiment provides a wide range of metrics for Fantom (more than 600 since FTM is an ERC-20 token).

Altcoins usually follow BTC and ETH prices. That’s why I decided to include exchange, on-chain, and sentiment data for the two largest coins by market cap in my dataset. Some on-chain data for Tether (USDT) can also be included in the dataset because Tether is used as collateral. Stablecoins are essential to the crypto space; their flows can provide some insight.

Dataset structure

Datasets used for training and validation have more than 30000 timesteps on average.

The final dataset consists of 2 files:

a traded coin price is in price_outfile.npy file
metrics_outfile.npy is a 2D array, rows correspond to timesteps, and columns to features

The Jupyter notebook used to generate the dataset is dataset.ipynb. The Jupyter notebook used to generate the dataset is dataset.ipynb. This notebook uses Python files to scrape data from corresponding platforms:

All files have command-line arguments to define start and end dates, timeframes, and coins to scrape. “historical” mode creates a dataset for training and “live” mode scrapes the last 1 week of data (168 hours, which is an observation state for the agent) every 1 hour and stores files to disk.

if __name__ == "__main__":
    parser = argparse.ArgumentParser(description='Run the Santiment data scraper with specified parameters.')
    parser.add_argument('--coins', required=True, help='Comma-separated list of coin names, e.g.,"BTC,ETH,USDT,FTM,BAT"')
    parser.add_argument('--resolutions', required=True, help='Comma-separated list of resolutions, e.g.,"1h,24h"')
    parser.add_argument('--start_time', required=False, help='Start datetime in ISO format, e.g., "2020-07-01T00:00:00", not used for live data')
    parser.add_argument('--end_time', required=False, help='End datetime in ISO format, e.g., "2024-04-11T00:00:00", not used for live data')
    parser.add_argument('--endpoint_file_paths', required=True, help='path of endpoints_file_path_santiment.json')
    parser.add_argument('--save_folder', required=True, help='Folder path to save the scraped data, e.g., "./data/test/santiment/historical"')
    parser.add_argument('--mode', required=True, help='mode to scrape data', choices=["historical", "live"])

    args = parser.parse_args()
    main(args)

You can select TA indicators to be calculated by rewriting .csv files under ./data/endpoints/binance; and Santiment metrics to be collected under ./data/endpoints/santiment.

list of TA indicators for FTM at 1 hour and 1 day timeframes; image by author

list of Santiment metrics for FTM at 1 hour timeframe; image by author

list of Santiment metrics for FTM at 24 hours timeframe; image by author

Scrapers use open_time as the Pandas dataframe index during data collection for all .csv files. The final dataset timesteps in metrics_outfile.npy, however, are close_time to align the latest available values for all metrics at every timestep. Timestamps use the UTC+0 time zone.

Technical analysis data preparation

I collected this set of data from Binance API or Bybit API: ‘open’, ‘high’, ‘low’, ‘close’, ‘volume’, ‘open_time’’ at ‘1h’ and ‘1d’ time intervals.

Candlesticks are further used to calculate technical indicators using Pandas TA library. The Pandas TA strategy method utilizes multiprocessing for bulk indicator processing. Visualized indicators are similar to TradingView charts.

Average True Range TradingView chart; image by author

Average True Range PandasTA indicator; image by author

On-chain and sentiment data preparation

It is more complicated with on-chain data. I collect Santiment data for FTM as well as BTC. BTC serves as an index for the whole crypto market, and a negative BTC sentiment spike on Twitter (such as China bans crypto) could significantly impact the FTM price. This approach would also work for ETH or another top 100 coin for which Santiment provides all the necessary data.

Santiment data has two main minimum time intervals: “5m” and “24h” a user can aggregate upon. I aggregate “5m” metrics into “1h” time frame to match the Trading strategy time step. The only exception is the metric funding_rate calculated every “8h” on some exchanges. I use raw “24h” metrics without further aggregations.

There is a problem with some Santiment on-chain and sentiment metrics — a time lag between data appearing on the blockchain or social network and the time it is stored in a database, which is not the case with almost real-time price data from centralized crypto exchanges. Moreover, on-chain data can mutate over time and some on-chain metrics suffer from look-ahead bias. Detailed explanation here.

Santiment doesn’t have point-in-time data, so my solution is to introduce a latency shift. That is to shift data in the training dataset by 1 hour later to match the latency of data used for live trading. For example, a datapoint is stored in Santiment database under timestamp 2022–03–27 10:00:00, in my dataset this datapoint is under timestamp 2022–03–27 11:00:00 to make sure this stored value is immutable and doesn’t have look-ahead bias. This is not the perfect solution, but it seems to work for my strategy because it is not high-frequency.

Glassnode provides a more comprehensive set of metrics for BTC and ETH analysis. Still, since the published strategy focuses on altcoins, I decided to stick to Santiment, at least for the public part of my project. You also need to become a Glassnode institutional client to get access to their API.

More thoughts on my dataset

Why did I use a few TA indicators and did not use over 600 metrics aggregated at different time intervals Santiment PRO subscription provides for FTM? The final dataset presented in this blog already contains 181 features, so one reason is to avoid the curse of dimensionality. It also takes considerably longer to train a neural network in a reinforcement learning setting than in classical supervised learning. With an increased number of features, it can take days to train an agent on 1 GPU.

I learned a lot about all kinds of metrics and chose the most informative ones. I also trained agents on larger datasets and then applied techniques such as IntegratedGradients from Captum library to see how each feature contributed to the trained neural network output. I filtered out features that had less impact.

Interestingly, metrics such as MVRV at different timeframes or funding rates were the most significant on the agent decisions, social metrics however were almost ignored by the trained agents. Agents trading BTC or ETH could learn profitable policies just by using Technical indicators, which is not the case for agents trading altcoins such as FTM. I was able to achieve the best results with datasets of mixed technical and on-chain metrics.

The final dataset still has 181 features, which is a lot, so feel free to reduce its number, for example, by removing some BTC metrics. An extreme is to train an agent using just FTM price. I also added columns for weekdays and hours of the day to create time encodings for the transformer neural network, which is used as an agent backbone.

The dataset starts at the end of 2020 to capture the bull market in 2021, the bear market in 2022, sideways moves in 2023, and the new bull cycle in 2024. Ideally, training and validation datasets contain periods with uptrends and downtrends to evaluate how well the bot performs in each market regime. I implemented a feature to define a few training and validation intervals, rather than validating on the last few months of data. For example, here we have 5 train intervals each 4000 timesteps (hours) long, and each followed by a validation interval, 1000 timesteps long.

Training and validation subsets, FTM price

Test dataset is de-facto paper trading on Bybit testnet. Since the neural net tries to approximate the training dataset, it is not true that a larger dataset is ultimately better. The crypto market can change significantly for a “short” time and policy learned from the old data may not be valid for the latest data.

I use a Robust scaler from Sklearn to normalize the data. As stated in the docs, it scales features using statistics robust to outliers. Our dataset has many outliers, especially for on-chain and social data. This scaler removes the median and scales the data according to the quantile range. According to my experiments, training was more stable and converged to better results if the scaler was fit not on the whole training dataset globally but rather locally on the lookback window preceding the current timestep. The fitted on the lookback window scaler is then applied to the current observation state during training and live trading on the exchange. Link to the Scaler class.

Conclusion

In this first part of the blog series, we explored the data preparation process. We discussed the various data sources, including exchange data, on-chain and social sentiment data. We learned how to collect and pre-process this data to create a suitable dataset.

The key takeaways from this part are:

It’s essential to use a variety of data sources beyond just technical indicators for effective trading. BTC metrics can serve as leading indicators for FTM.
On-chain and social sentiment data introduce complexities due to time lags and look-ahead bias. We addressed this by introducing a latency shift in the training data.
The curse of dimensionality can negatively impact the training process. We carefully selected informative features and used techniques to reduce the final number of features in the dataset.

We also explored the structure of the final dataset for the deep learning model training and inference.

This ends Part 1: Data preparation. See you in the next part, Part 2: Trading strategy.

If you are interested in cooperation, feel free to contact me.