Applied RL: Advanced deep learning customization of neural networks for RL based Algo trading

Akhilesh Gogikar
4 min readJun 6, 2022

--

Previously, we had developed a custom policy network by defining a new neural network architecture with new linear layers.

But we can do so much more to customize the neural network architecture. As neural networks are universal function approximations, as long as the network architecture is not unstable and error-free almost any combination of neural network architecture can give good results. The usual goal of ML engineers is to find the most efficient and accurate architecture. For the purpose of this learning exercise, let’s not be bogged down by such utilitarian goals.

In this article, we will develop a neural network with the following additions:

  1. Add a recurrent unit to learn embeddings along the temporal dimension
  2. Add an attention block to combine the embeddings to the attention vector
  3. Add a graph attention layer to build a graph embedding of the attention vector
  4. Add a capsule layer on the fusion of the graph attention embedding with the attention vector

To get started, we build the custom neural network inheriting from the nn.Module of Pytorch.

class CapsGATattentionGRU(nn.Module):
def __init__(self,input_dim,time_step,hidden_dim,use_gru=True):

super(CapsGATattentionGRU, self).__init__()
outer_edge = np.ones(shape=(2, input_dim**2)) # Edge_index for adjacency matrix for geometric learning
count = 0
for i in range(input_dim):
for j in range(input_dim):
outer_edge[0][count] = i
outer_edge[1][count] = j
count += 1

# basic parameters
self.dim = hidden_dim
self.input_dim = input_dim
self.time_step = time_step
self.outer_edge = outer_edge
self.batch = 1
self.inner_edge =
torch.tensor(outer_edge,dtype=torch.int64).to('cuda:0')
self.use_gru = use_gru # hidden layers if self.use_gru: self.temporal_encoder =
nn.GRU(input_dim*hidden_dim,input_dim*hidden_dim, num_layers=2, bidirectional=False)
self.inner_gat0 = GATv2Conv(hidden_dim , hidden_dim)
self.inner_gat1 = GATv2Conv(hidden_dim,hidden_dim)
self.attention = AttentionBlock(12,hidden_dim) self.caps_module =
CapsuleLinear(out_capsules=self.input_dim,
in_length=2*hidden_dim, out_length=hidden_dim,
in_capsules=None, routing_type='dynamic',
num_iterations=3)
self.fusion = nn.Linear(hidden_dim,input_dim)

So we have defined some key layers in this initialization of the neural network:

  1. The temporal encoder layer is an instance of the nn.GRU from PyTorch
def forward(self,inputs):    inputs = torch.nan_to_num(inputs, nan=0.0, posinf=1.0)    #inputs shape : torch.Size([1, 18600])
if self.use_gru:
embedding,_ =
torch.relu(self.temporal_encoder(inputs.view(-1,self.time_step, self.input_dim*self.dim)))
# Embedding shape: torch.Size([1, 12, 1550])

2. The attention layer is an instance of the custom Attention block defined below:

class AttentionBlock(nn.Module):def __init__(self,time_step,dim):    super(AttentionBlock, self).__init__()    self.time_step = time_step    self.attention_matrix = nn.Linear(time_step, time_step)def forward(self, inputs):    inputs_t = torch.transpose(inputs,2,1) # (batch_size, input_dim, time_step)    attention_weight = self.attention_matrix(inputs_t)    attention_probs = F.softmax(attention_weight,dim=-1)    attention_probs = torch.transpose(attention_probs,2,1)    attention_vec = torch.mul(attention_probs, inputs)    attention_vec = torch.sum(attention_vec,dim=1)    return attention_vec, attention_probs

The attention layer is applied along the temporal dimension of the embeddings:

# Embedding shape: torch.Size([1, 12, 1550])att_vector,_ = self.attention(embedding) # (100,dim)# Attention vector shape: torch.Size([1, 1550])

3. inner_gat0 and inner_gat1 are instances of graph attention layers from PyTorch geometric which are applied in a Residual fashion

inner_graph_embedding = torch.relu(self.inner_gat0(x,self.inner_edge))inner_graph_embedding0 = torch.relu(self.inner_gat1(inner_graph_embedding, self.inner_edge.view(2, -1)))inner_graph_embedding = torch.add(inner_graph_embedding, inner_graph_embedding0)inner_graph_embedding = inner_graph_embedding.view(-1,self.input_dim,self.dim)# inner_graph_embedding shape: torch.Size([1, 31, 50])

4. Finally we add a capsule network layer to apply something like a hough transform

# fusionfusion_vec = torch.cat((att_vector,inner_graph_embedding),dim=-1)# fusion_vec shape : torch.Size([1, 31, 50])caps_out, _ = self.caps_module(fusion_vec)# caps_out.shape: torch.Size([1, 31, 50])out_vec = torch.tanh(self.fusion(torch.relu(caps_out)))# out_vec.shape: torch.Size([1, 31, 31])return out_vec

Let’s see what we can do with this concoction of deep learning techniques when amalgamated to form a neural network.

We update the policy and value network in our new custom policy, observe that we had to make some changes in the linear layers, and add a transpose to make sure the output vector is (num_stocks,) shaped.

# Policy networkself.policy_net = nn.Sequential(CapsGATattentionGRU(last_layer_dim_pi,timesteps,feature_dim),nn.Linear(last_layer_dim_pi, 1), Transpose(), nn.Tanh())# Value networkself.value_net = nn.Sequential(CapsGATattentionGRU(last_layer_dim_vf,timesteps,feature_dim),nn.Linear(last_layer_dim_pi, 1), Transpose(), nn.Tanh())

Then we run the model learning process with this policy, but due to the added bells and whistles, the training process requires substantially more amount of time.

model = PPO(GATActorCriticPolicy, env, verbose=2,tensorboard_log='tb_logs', batch_size=16)model.learn(total_timesteps=1000)

When we run inference on the test data we see the following returns we see improved results, in fact, we see consistently positive results.

And that’s it folks we have explored building a custom RL environment for trading and trained custom RL policies by using the StableBaselines3 library.

Hope you liked my work but it is important to note that this work has been done only for educational purposes and none of it must be understood as a bit of profitable advice for investing. In my experience, all but a handful of — even the best in backtesting — trading strategies lead to losses when deployed in production.

Please check out my GitHub repository for the code. I would appreciate some love with a few claps and if you could also star my Github repo.

Please show some love and leave a comment if you have feedback. Many thanks if you have read through all the articles so far. Cheers!

--

--

Akhilesh Gogikar

A Geek — I like many nerdy tropes — strategy games, anime, development. I can hold intriguing conversations — just not riveting enough to pay the bills! :P