Deep Reinforcement Learning for Crypto Trading

Part 0: Introduction

Published in

Coinmonks

6 min readMay 17, 2024

Disclaimer: The information provided herein does not constitute financial advice. All content is presented solely for educational purposes.

Introduction

This is the introduction of a blog series about my reinforcement learning journey in crypto. The current chapter provides a high-level overview of the building blocks of my working solution. A detailed description of each block is provided in separate chapters:

Resources

https://github.com/xkaple00/deep-reinforcement-learning-for-crypto-trading

About myself

I’m a senior machine learning engineer who started my career in 2016. I have an inventor mindset. During my career, I’ve become the author of four US Patents in the fields of computer vision and signal processing using deep learning and reinforcement learning.

Here is a list of my patents:

I became fascinated with the crypto space in 2021 and started trading crypto manually. After a few months, my gains were equal to my losses (not a bad start for a newbie in crypto). Manual trading, however, was very time-consuming and emotionally exhaustive, as you might imagine. Then, I realized that I could automate trading, considering my professional background.

Motivation

I opensource a big chunk of my code and share some impactful ideas on my project. I publish the basic version of my solution — the runnable code that can be adapted by other quant traders. The advanced version of my solution includes several additional essential aspects that allow the AI bots to provide a substantially better risk-reward ratio than the basic version. I will include links to the accompanying files at the beginning of each chapter. It’s better to review the code while reading to follow my ideas. Variables from the code are shown in italics in the blog articles.

There are several sources of motivation to open source something that most crypto trading companies try to hide and protect in every possible way:

Accelerate research in the reinforcement learning field for trading. Provide a community and individuals just like me with tools that they can use to build new exciting projects. I’ve used a lot of open-source projects in my career, and now I want to give something back to the community to maintain a balance of giving and receiving.
Crypto trading has become my true passion, and I would like to switch my career entirely and become a Professional Quant. I’m looking for collaboration opportunities with companies and professional teams. It’s always better to work with similar-minded people to share ideas and learn from each other.
My algorithm and trading strategy are imperfect, and I would like constructive feedback from more experienced quant traders.

Problem statement

Market-neutral strategies seem to be the most lucrative, as is HFT (high-frequency trading) or Arbitrage because if done correctly, it can be a stable source of income despite the up-and-down market swings. However, I was aware of the enormous competition in the HFT space. As an individual, I had zero chances to succeed considering my available hardware and resources. So I decided to focus on mid-term trading and create bots that mimic human traders and trade perpetual futures contracts to go long during uptrends and short during downtrends.

As you might have guessed, the most logical thing to do is predict future market trends first. Knowing what the price of a coin will be in 1 hour or 1 day is a crucial prerequisite for success. I spent time predicting trends using libraries such as Neuralforecast, Pytorch forecasting and proprietary AI models. But as it turned out, no future trend indicator is 100% precise and it’s very challenging to predict the future in such a complex and volatile environment as the crypto space. Moreover, a lot of other questions arise:

When to close a position at a loss to prevent even more significant losses if the trend forecast turns out to be wrong.
When to close a position at a profit if the trend forecast is correct, but the price may go even higher.
Most importantly, which part of the funds to allocate into long and short positions, how to decide on position sizes, and how to do account management.

Even if you have a reliable forecasting indicator, an entire trading strategy must be created around it. This is where reinforcement learning comes to the rescue. Deep reinforcement learning algorithms recently beat professional Go and Dota 2 players. So why can’t it beat professional traders?

Reinforcement learning

Since crypto markets are highly manipulated and I can access only a few metrics that do not catch all the intricacies happening in the market to predict future price moves, reinforcement learning is a natural choice. Even with limited prediction abilities, reinforcement learning agents can react accordingly to market changes rather than trying to predict the exact future prices. This concept works similarly to the Relative Strength Index (RSI). RSI values below 30 don’t precisely predict future prices in a time horizon of 1 hour, 1 day, etc. Still, they signal that from a statistical point of view, there is a high probability of opening a profitable long position so the agent can act accordingly and then wait to obtain new observation states. Agent’s learned strategy can close a position at a profit when the RSI spikes above 70 or a loss if the RSI goes below 20.

Trading fits perfectly into the reinforcement learning paradigm with a partially observable environment.

agent takes actions in an environment, which is interpreted into a reward and a representation of the state, which are fed back into the agent; source Wikipedia

The environment is the crypto market itself. There are two types of environments:

Training environment (also used for backtesting).
Live trading environment.

In the training environment, the agent explores the outcomes of opening and closing positions in a sequence. After training is done, the bot is deployed to a live trading environment.

The agent is a trading bot that sends API requests in the correct order to maximize rewards. The agent’s actions in the environment are orders to open or close positions.

A crypto exchange is an interpreter that provides an agent with a positive reward for performing profitable actions and a negative reward for performing actions leading to losses. State or observation space is actually the state of your exchange account, which contains: unrealized profit and loss, available cash, etc, as well as a bunch of technical, on-chain, and social indicators (a detailed overview is in Part 1: Data preparation).

observation and action space inside exchange UI; image by author

The simplest possible reward equals the normalized amount of money the bot gains or loses while closing a position (realized profit and loss).

Results

To make further reading more exciting, I’m sharing the trading results of two bots among the entire fleet of my bots on Bybit.

Bot 1:

Link to Google sheet with trading data exported from Bybit.

Bot 2:

Link to Google sheet with trading data exported from Bybit.

Conclusion

This blog post introduced the concept of using deep reinforcement learning for crypto trading. We explored the challenges of traditional forecasting methods and how reinforcement learning can address them. The environment, agent, and reward system were explained, laying the groundwork for future parts of the series where we’ll dive into data preparation, training, and live trading

This ends Part 0: Introduction, see you in the next Part 1: Data preparation.

If you are interested in cooperation, feel free to contact me.