A Q-learning model for an autonomous trading system

Karen Poblete
Mindboard
Published in
3 min readApr 27, 2018

Objective: Implement a system capable of finding the best policies for trading Futures without having an explicit model given in advance. The information available includes measurements of the actual depth and price of the market, transactions made, and options information.

Background and solution: Futures trading has been gaining more popularity in recent years since they can be a perfect tool for minimizing risk while trading a diversity of financial instruments. In fact, trading Futures brings several advantages over trading Stocks like the low cost of operations and high liquidity which can bring fast earnings.

Q-learning is a modeless reinforcement learning technique first presented in [1] as a way of learning from delayed rewards in animal behavior studies. The Q-learning algorithm is a stochastic method to optimize policies where is expected to take the best action possible in the actual state of a system . These actions taken must be only dependent on the actual state of the system, and should aim to maximize profits considering future rewards. Now a days there are several application for Q-learning models, like playing Atari games [2] and trading stocks[3].

It is well known that the stocks, as well as the options and futures, market is hard to represent in a computational model. There are lots of factors that affects the changes in price making it hard to be predicted. By not having a deterministic way of predicting the price of a commodity, it is always possible to use a stochastic model to represent the price. Knowing that fluctuations in price look more like random changes when the data is limited, a stochastic model could help to optimize profits making use of Markov Processes.

It is important to mention that the best tool to create a profitable model for trading is information. The more information and understanding of it and the targeted market, the better a model could perform. Nevertheless, agents closely related to the targeted markets have advantages over the other traders, since not all the information of the market is of the public domain. In despite of this limitation, systems based in q-learning models have shown to be a good tool to implement trading systems with their own limitations.

Q-learning models have been deeply studied since the first time they were published and different upgrades have been added. In the early version of the algorithm, a Q-matrix was used to store the values of the actions at each discrete state of the system. This was changed for Neuronal Networks, which brought the advantages of dealing with continuous and unknown states.

Moreover, the number of states can turn out to be an important problem. When having a large number of states, it is hard to visit all of them during training, needing more training data. In my next post I will talk in deep about the different problems I have faced while implementing a Futures trader using Q-learning. For a introduction to Q-learning and its algorithm, check [4].

[1] Watkins, C. J. C. H. (1989). Learning from delayed rewards (Doctoral dissertation, King’s College, Cambridge).

[2] Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., & Riedmiller, M. (2013). Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602.

[3] Lee, J. W., Hong, E., & Park, J. (2004, October). A Q-learning based approach to design of intelligent stock trading agents. In Engineering Management Conference, 2004. Proceedings. 2004 IEEE International (Vol. 3, pp. 1289–1292). IEEE.

[4] Watkins, C. J., & Dayan, P. (1992). Q-learning. Machine learning, 8(3–4), 279–292.

--

--