Deep Q Stock Trading: Object Oriented R Code

Alexey Burnakov
Analytics Vidhya
Published in
4 min readApr 3, 2020

Recollection of what was done

In past series we run an experiment testing hypothesis that reinforcement learning framework can learn to trade simulated and real stock data successfully. Check it before if you didn’t. First part.

Progressing towards faster convergence we modified several things: adding recurrency to the neural network, shaping rewards, and generating presentations to get rewards. The experiment showed that potential-based reward shaping works best of all. Second part.

Back then I posted the code of the neural network to help you start your projects. This time I am posting the whole experiment’s code, which I created using my top choice, R language, enriched by R6 classes to make it simpler to dive in. Even if your daily coding concerns using Python, Java or C, you will likely find the OOP of R6 quite convenient. I hope you guys will like it high-level. Check my code repository, clone and run it!

Experiment logic

Once you have installed packages as advised, feel free to run main.R from command line, R console or with RStudio IDE. If you want full control over the options, RStudio is your first choice as a convenient and cozy editor.

setwd('C:/R_study/reinforcement/rl_classes') # set your working directory  ## Classes 
source('NN_class.R')
source('DATA_class.R')
source('RB_class.R')
source('TRAIN_class.R')

In R one has to set a working directory to indicate to the program the location of one’s files and directories, and you should definitely do this first. As a result, import of the other scripts will go smoothly.

Take the next steps in that order you observe in the script: make data object, then replay buffer object, neural network object, and, finally, train object. The objects rely on the properties and methods created in earlier steps. I.e., the replay buffer object takes a piece of the data field from the data object.

Dat <- Data$new()

We create an object of class Data and on this object call a method to get the timeseries. You have a few options: synthetic noise, synthetic signal, and real stock data from Yahoo. Per default the synthetic signal will be your timeseries as a toy problem.

Dat$synthetic_signal( 
stepsize = 0.1
, noise_sd = 0.0
, noise_sd2 = 0.0
, n = 20000 )

You can create a very simple (sine) signal. Opt for a more complex signal by increasing _sd standard deviation values.

Dat$make_features(max_lag_power = 6)

In order to put a task environment state in a measurable space, create input features implemented in the form of timeseries difference of varying order.

Nn <- NN$new(lstm_seq_length = 8L) Nn$compile_nn( 
loss = 'mse'
, metrics = 'mse'
, optimizer = 'adam'
)
Nn2 <- Nn$clone()

A neural network structure is not modifiable from outside of the class object, except for how many timeseries timesteps it will use when creating the inputs for an LSTM layer.

Due to the double logic of learning q values, we need 2 neural networks that will fire on random (you can view this code inside Train methods).

Rb <- RB$new( 
buffer_size = 512
, priority_alpha = 0.1
)
Rb$init_rb()

Replay buffer is a data.table object that stores past trajectories and it is also a source of data for tuning our NN model. Making prioritized sampling requires an additional parameter that defines the sharpness of the probability distribution over RB rows during sampling.

Log <- Logs$new()

All relevant information about the neural networks and agent behavior is inside a log class object. You will see how it is utilized, and you may want to plot it your own style.

Tr <- Train$new() Tr$run( 
test_mode = F
, batch_size = 64
, discount_factor = 0.99
, learn_rate = 0.001
, max_iter = 5000
, min_trans_cost = 0
, print_returns_every = 100
, magic_const = 1
)

Here we go. Training’s started. I advise you to take a read about Q-learning parameters so that you are fluent in this part. One interesting parameter is print_returns_every that controls how often an intermediate training report will pop up during the process. The report may look as follows:

You can leave the long training on its own, but check the screen from time to time for an idea of how well this goes.

As the loop stops you will also get a summary of your agent perfomance evolution. If you wish, run training multiple times, each time the neural networks will evolve from the state they had had at the last training finish. When satisfied with the result, call $save() method on a Nn object to store model.

Shaping rewards and presentations haven’t been put in script, yet.

On this I am going to finish this article, hoping the scripts will land smoothly in your technological stack and you will have not only just a coding time but rather an educational window.

Take your time reading the methods inside classes to modify a NN architecture, feed your own data feed, or change the experiment logic.

Good luck with Reinforcement Learning!

--

--