Applied RL: Advanced deep learning customization of neural networks for RL based Algo trading

4 min readJun 6, 2022

Previously, we had developed a custom policy network by defining a new neural network architecture with new linear layers.

Applied RL: Customization of RL policies using StableBaselines3

Customizing RL PPO policy architectures by defining a new set of Dense layers in the StableBaselines3 library format.

medium.com

But we can do so much more to customize the neural network architecture. As neural networks are universal function approximations, as long as the network architecture is not unstable and error-free almost any combination of neural network architecture can give good results. The usual goal of ML engineers is to find the most efficient and accurate architecture. For the purpose of this learning exercise, let’s not be bogged down by such utilitarian goals.

In this article, we will develop a neural network with the following additions:

Add a recurrent unit to learn embeddings along the temporal dimension
Add an attention block to combine the embeddings to the attention vector
Add a graph attention layer to build a graph embedding of the attention vector
Add a capsule layer on the fusion of the graph attention embedding with the attention vector

To get started, we build the custom neural network inheriting from the nn.Module of Pytorch.

class CapsGATattentionGRU(nn.Module):
    def __init__(self,input_dim,time_step,hidden_dim,use_gru=True):
        
        super(CapsGATattentionGRU, self).__init__()        outer_edge = np.ones(shape=(2, input_dim**2))        # Edge_index for adjacency matrix for geometric learning
        count = 0
        for i in range(input_dim):
            for j in range(input_dim):
                outer_edge[0][count] = i
                outer_edge[1][count] = j
                count += 1
      
        # basic parameters
        self.dim = hidden_dim
        self.input_dim = input_dim
        self.time_step = time_step
        self.outer_edge = outer_edge
        self.batch = 1        self.inner_edge = 
            torch.tensor(outer_edge,dtype=torch.int64).to('cuda:0')        self.use_gru = use_gru        # hidden layers        if self.use_gru:            self.temporal_encoder =
                nn.GRU(input_dim*hidden_dim,input_dim*hidden_dim, num_layers=2, bidirectional=False)        self.inner_gat0 = GATv2Conv(hidden_dim , hidden_dim)
        self.inner_gat1 = GATv2Conv(hidden_dim,hidden_dim)        self.attention = AttentionBlock(12,hidden_dim)        self.caps_module =
            CapsuleLinear(out_capsules=self.input_dim,
                in_length=2*hidden_dim, out_length=hidden_dim,
                in_capsules=None, routing_type='dynamic',
                num_iterations=3)        self.fusion = nn.Linear(hidden_dim,input_dim)

So we have defined some key layers in this initialization of the neural network:

The temporal encoder layer is an instance of the nn.GRU from PyTorch

def forward(self,inputs):    inputs = torch.nan_to_num(inputs, nan=0.0, posinf=1.0)    #inputs shape : torch.Size([1, 18600])
    if self.use_gru:    embedding,_ =
    torch.relu(self.temporal_encoder(inputs.view(-1,self.time_step, self.input_dim*self.dim)))    # Embedding shape: torch.Size([1, 12, 1550])

2. The attention layer is an instance of the custom Attention block defined below:

class AttentionBlock(nn.Module):def __init__(self,time_step,dim):    super(AttentionBlock, self).__init__()    self.time_step = time_step    self.attention_matrix = nn.Linear(time_step, time_step)def forward(self, inputs):    inputs_t = torch.transpose(inputs,2,1) # (batch_size, input_dim, time_step)    attention_weight = self.attention_matrix(inputs_t)    attention_probs = F.softmax(attention_weight,dim=-1)    attention_probs = torch.transpose(attention_probs,2,1)    attention_vec = torch.mul(attention_probs, inputs)    attention_vec = torch.sum(attention_vec,dim=1)    return attention_vec, attention_probs

The attention layer is applied along the temporal dimension of the embeddings:

# Embedding shape: torch.Size([1, 12, 1550])att_vector,_ = self.attention(embedding) # (100,dim)# Attention vector shape: torch.Size([1, 1550])

3. inner_gat0 and inner_gat1 are instances of graph attention layers from PyTorch geometric which are applied in a Residual fashion

inner_graph_embedding = torch.relu(self.inner_gat0(x,self.inner_edge))inner_graph_embedding0 = torch.relu(self.inner_gat1(inner_graph_embedding, self.inner_edge.view(2, -1)))inner_graph_embedding = torch.add(inner_graph_embedding, inner_graph_embedding0)inner_graph_embedding = inner_graph_embedding.view(-1,self.input_dim,self.dim)# inner_graph_embedding shape: torch.Size([1, 31, 50])

4. Finally we add a capsule network layer to apply something like a hough transform

# fusionfusion_vec = torch.cat((att_vector,inner_graph_embedding),dim=-1)# fusion_vec shape : torch.Size([1, 31, 50])caps_out, _ = self.caps_module(fusion_vec)# caps_out.shape: torch.Size([1, 31, 50])out_vec = torch.tanh(self.fusion(torch.relu(caps_out)))# out_vec.shape: torch.Size([1, 31, 31])return out_vec

Let’s see what we can do with this concoction of deep learning techniques when amalgamated to form a neural network.

We update the policy and value network in our new custom policy, observe that we had to make some changes in the linear layers, and add a transpose to make sure the output vector is (num_stocks,) shaped.

# Policy networkself.policy_net = nn.Sequential(CapsGATattentionGRU(last_layer_dim_pi,timesteps,feature_dim),nn.Linear(last_layer_dim_pi, 1), Transpose(), nn.Tanh())# Value networkself.value_net = nn.Sequential(CapsGATattentionGRU(last_layer_dim_vf,timesteps,feature_dim),nn.Linear(last_layer_dim_pi, 1), Transpose(), nn.Tanh())

Then we run the model learning process with this policy, but due to the added bells and whistles, the training process requires substantially more amount of time.

model = PPO(GATActorCriticPolicy, env, verbose=2,tensorboard_log='tb_logs', batch_size=16)model.learn(total_timesteps=1000)

When we run inference on the test data we see the following returns we see improved results, in fact, we see consistently positive results.

And that’s it folks we have explored building a custom RL environment for trading and trained custom RL policies by using the StableBaselines3 library.

Hope you liked my work but it is important to note that this work has been done only for educational purposes and none of it must be understood as a bit of profitable advice for investing. In my experience, all but a handful of — even the best in backtesting — trading strategies lead to losses when deployed in production.

Please check out my GitHub repository for the code. I would appreciate some love with a few claps and if you could also star my Github repo.

Please show some love and leave a comment if you have feedback. Many thanks if you have read through all the articles so far. Cheers!

Applied RL: Advanced deep learning customization of neural networks for RL based Algo trading

Applied RL: Customization of RL policies using StableBaselines3

Customizing RL PPO policy architectures by defining a new set of Dense layers in the StableBaselines3 library format.

Written by Akhilesh Gogikar