Deep Reinforcement Learning for Crypto Trading

Part 5: Live Trading

Published in

Coinmonks

6 min readMay 17, 2024

Disclaimer: The information provided herein does not constitute financial advice. All content is presented solely for educational purposes.

Introduction

This is the fifth and the last part of my blog series on reinforcement learning for crypto trading.

This blog post explains my approach to live trading on Bybit testnet.

As I mentioned in Part 0: Introduction, my main goal is to connect with potential employers or investors and ultimately become a professional quant.

Prerequisites

Bybit testnet account

Choosing a cryptocurrency exchange

Once we are satisfied with the backtesting results on the validation dataset, the next step is to move to a real exchange. Crypto exchanges provide a great option if you want to test your strategy but don’t want to lose real money, so-called Testnets. The network trading real money is called Mainnet. I conducted thorough research to identify which exchange would best suit my needs. The winner is Bybit because its testnet UI, API and Last price are almost identical to the Bybit mainnet exchange. It also offers lower fees for perpetual futures trading than most crypto exchanges.

The Last price is used for exchange settlements, not the Market or Index price. This is a problem with the Binance testnet. The last price on the Binance testnet is a mess, so realized_pnl and other account parameters calculated during trading are unreliable if fetched via Binance API and not calculated explicitly in the environment.

FTM last price on Binance testnet; image by author

On the other hand, the Bybit testnet has large, unrealistic spikes in BTC and ETH last prices, which can skew trading results.

BTC last price on Bybit testnet; image by author

However, there is no such problem with altcoins. For altcoins such as FTM, the Bybit mainnet and testnet price charts are almost identical (except for volume). For the live trading environment, we only need the Last price so the exchange calculates all account parameters correctly; candlesticks and volume are scrapped from Bybit mainnet even though trades are executed on the testnet.

FTM price on Bybit mainnet (left) and testnet (right); image by author

Another good feature of the Bybit testnet is that you can change the api_key, api_secret, and testnet flag without any other code changes and start trading immediately on the mainnet:

from pybit.unified_trading import HTTP

session = HTTP(testnet=False, api_key="...", api_secret="...")

If I were trading BTC or ETH, I would change my stack: Glassnode as the on-chain data provider instead of Santiment and the Deribit exchange instead of Bybit because the Deribit testnet does not have the aforementioned problems and Glassnode has a more comprehensive set of metrics for BTC and ETH.

Since the Last price slightly deviates on mainnet and testnet, the trading results on both networks would be slightly different. The skew is negligible and up to 0.1% per episode (2 weeks) because my strategy is not high-frequency.

The environment I coded used for training and backtesting is also not one-to-one identical to the real exchange. I compared agent returns from online data (real-time trading on the Bybit testnet) and offline data sets (collected over the same time period) evaluated using my backtesting framework discussed in Part 4: Backtesting. Returns may deviate by up to 0.5% per episode in both directions up and down since the learning environment does not capture all the processes occurring on the exchange under the hood.

No universal strategy fits all market regimes, so I typically deploy multiple bots (with different policies). Each policy is represented by a checkpoint saved during different training runs and after a different number of training epochs. This diversifies sources of profit in different market conditions and reduces risks. Even if one bot fails, the average return will still be positive.

Bybit allows you to have up to 20 sub-accounts under your main account. You can deploy one bot per subaccount; each bot can read the metrics_outfile.npy file stored on disk every 1 hour and call the API corresponding to its own subaccount. I deploy my bots on the Google Cloud Platform.

List of commands to start collecting live data (last 1 week of data from the current timestamp) and store them on disk every hour:

python scraper_binance.py --coins "BTC,ETH,FTM" --resolutions "1h,1d" --endpoint_file_paths "./data/endpoints_file_path_binance.json" --save_folder "./data/test/binance/live" --mode "live"
python scraper_bybit.py --coins "BTC,ETH,FTM" --resolutions "1h,1d" --endpoint_file_paths "./data/endpoints_file_path_bybit.json" --save_folder "./data/test/bybit/live" --mode "live"
python scraper_santiment.py --coins "BTC,ETH,FTM" --resolutions "1h,24h" --endpoint_file_paths "./data/endpoints_file_path_santiment.json" --save_folder "./data/test/santiment/live" --mode "live"

Results

I do not publish code for live trading. If you are interested in it or would like to see more trading results, please contact me directly. In short, training_env.py should be modified to interact with Bybit API, and the data preprocessing pipeline should be changed to work with live online data rather than offline historical datasets.

A screen recording during bot live trading on the Bybit exchange:

live trading demo; video by author

I’m also posting reaslized_pnl reports exported from the Bybit of 3 different bots.

Bot 1:

Link to Google sheet.

Bot 2:

Link to Google sheet.

Bot 3:

Link to Google sheet.

Conclusion

This blog series provided a comprehensive overview of Deep Reinforcement Learning for Crypto Trading, culminating in live trading on the Bybit testnet. Published results demonstrate the potential of this approach.

Here are the key takeaways:

Bybit testnet offers a realistic environment for testing crypto trading strategies due to its close resemblance to the mainnet.
The slight deviations between Bybit’s mainnet and testnet, along with limitations of the training environment, necessitate adjustments for real-world trading.
Diversifying trading bots with different policies helps mitigate risk and capture profits in various market conditions.

The published strategy has known problems, which I partially resolved during further development. Feel free to point out any potential problems with this approach. I have created more complex trading strategies with more advanced neural network architectures which are more profitable and reliable than the version posted here. Published code is a good starting point for further development.

I’m investing small amounts of my money into bots, but more testing and development are needed before increasing the assets under management.

If you see potential and want to invest in further development or want to offer me a job as a Quant/ML Engineer, I’d be happy to answer your questions. I have more ideas to implement but also need more computing resources and larger data sets. I trained all models on my PC with two Nvidia RTX 2080 Ti GPUs. I also can’t afford datasets from data providers targeting institutional clients.

This ends Part 5: Live trading and the whole blog series about Deep Reinforcement Learning for Crypto Trading. I hope you enjoyed reading as much as I enjoyed writing and coding. I believe all the information provided will be helpful for your crypto journey.

Please contact me if you want to see more results or explore collaboration opportunities.