Applied RL: Customized Deep Reinforcement Learning for Algorithmic Trading

4 min readJun 6, 2022

TLDR: I show how to design a multi-stock portfolio optimization/trading environment in OpenAI’s Gym environment then I show how to write a custom policy and neural network architecture for the learning agent.

To the uninitiated, reinforcement learning is a field of AI that mimics the learning process of the animal mind — a policy that leads to rewards/treats is a good policy to live by. The idea is that the AI algorithm trains an Agent which operates within an environment, taking actions that are rewarded by the environment. The AI algorithm updates the policy of the agent over time to achieve higher rewards.

AlphaGo utilized a policy network to learn policies from expert replay, a value network that accurately estimated the potential reward of a game state-action pair, and a tree search to find the most optimal action to take.

Reinforcement learning has been a part of AI research for more than 3 decades but the high point was in 2015 with the success of AlphaGo defeating World Champion in Go using Neural Networks to learn the policy of the Agent. This was a start of a flurry of research into the field of deep reinforcement learning which is the specialization of utilizing deep neural networks as the policy learner in reinforcement learning.

In reinforcement learning an Agent takes actions in an environment, the state of which is changing constantly, and gets a reward for it. In Deep RL, the agent’s policy is decided by a Neural Network.

Deep reinforcement learning has seen research focus by major AI labs and tech giants alike in recent years. I personally believe that advancements in Deep Reinforcement Learning, in conjunction with other technologies, will play a crucial role in developing Artificial General Intelligence (AGI). RL has many applications in the coming years such as:

Self-driving cars
Anomalous activity detection in cyber security
Information recommendation
Making energy management in grids and systems more efficient

One of the most crucial functions AI/AGI must perform for humans is automating the process of creating wealth which would allow us to free ourselves up for better humanitarian pursuits. The application of Deep Reinforcement Learning to algorithmic trading will enable one of the many small automation AI can help us with in the future.

RL algorithms are classified based on if a model of the learned or not and whether the model of the environment is given or not and how the deep neural network is trained — optimizing the value of the state-action pair or directly the policy function.

Many good libraries and toolkits exist in the open-source domain for the purpose of Algorithmic trading using Reinforcement learning. Yet, I found many of them are not very easy to deploy a model to market. After months of struggling with deploying these opensource libraries which either have minimal scope for customization or require in-depth library analysis for the excruciating amount of code rewrite.

FinRL and TensorTrade are two upcoming libraries that provide out-of-the-box implementations to enable Reinforcement Learning for Algorithmic Trading but both of them are very much in active development.

I realized it is much better for someone to design their own custom Deep Reinforcement Learning experiment both as a learning experiment and for achieving their trading goals.

In the subsequent parts of the series of articles, I showcase how I prepare my dataset containing (Open, High, Low, Close, Volume) data points at 5-minute intervals for 30 stocks along with an index fund for a total of 31 assets. I then implemented my own custom environment for algorithmic trading in python using OpenAI’s gym — I shall explain the mathematical logic and variables embedded within this environment.

Rewards earned by the final customized neural network in backtesting against data it had never seen.

I showcase how to design a custom policy to use with the Policy Proximal Optimization algorithm to perform the Reinforcement learning. Finally, I will showcase also how to use recurrent layers, geometric layers, and other complexities within the neural network.

Thanks for reading!

Disclaimer: I would like to assert that these articles are not to be misconstrued as investment advice. Most algorithmic trading systems lose money when deployed to production. This article series is for educational purposes only.

You can find the code and dataset used for these experiments in my Github Repo. I would appreciate it if you could show some love and leave a star on the repo.

Please leave a comment I would love to hear your feedback on my work!

Please read ahead for data preprocessing for the RL — experiments in the below article.

Applied RL: Custom RL for Algo Trading — Data Preprocessing

We are going to build a custom Gym environment for multi-stock trading with a customized policy in stablebaselines3…

medium.com

Applied RL: Customized Deep Reinforcement Learning for Algorithmic Trading

Applied RL: Custom RL for Algo Trading — Data Preprocessing

We are going to build a custom Gym environment for multi-stock trading with a customized policy in stablebaselines3…

Written by Akhilesh Gogikar