Enhancing Trading Performance through Deep Q Network Hyper-parameter Tuning

Reza Karbasi
4 min readFeb 4, 2023

--

Trading in the stock market can be a complex process, but with the advancement of AI and machine learning, new levels of sophistication can be brought to it. In this blog post, we will explore how to enhance a classic trading algorithm using deep reinforcement learning. By connecting Python and MQL5 through the use of sockets, we will design a trading algorithm and implement a deep Q network agent that will continuously learn and optimize the algorithm’s hyper-parameters. This combination will lead to a trading system that can adapt to changing market conditions and improve its performance over time.

The use of sockets allows for real-time communication between the two systems, and the ability to visualize the results of the trading algorithm and the deep Q network agent in real-time provides valuable insights. This approach represents a major step forward in the development of advanced trading systems and holds great promise for the future of finance and AI.

Table of Contents:

  1. Introduction to Python-MQL5 Connection
  2. Classic Trading Algorithm in MQL5
  3. Reinforcement Learning Agent in Python
  4. Tips to Improve results.

Introduction to Python-MQL5 Connection

In this section, we’ll guide you through the process of connecting Python and MQL5 using sockets. We will be following the instructions from a specific link, and we’ll point out which methods have been found to be effective and which have not. The project’s GitHub repository contains two folders: one for the Python code that implements the socket server, and another for the MQL5 files that include a library and a code that connects to the socket.

It’s worth mentioning that the code is claimed to be compatible with both MQL4 and MQL5, but the author has only tested it on MQL5.

Classic Trading Algorithm in MQL5

The classic trading algorithm is based on the highest and lowest prices of candles. It calculates the highest and lowest of each of the last n candles. If the new price crosses the highest, a buy order is executed. If it passes the lowest, a sell order is executed. When an order is executed, the stop-loss is set at the middle of the highest and lowest prices. The number of candles used to calculate the stop-loss and limit prices (n) is determined by the RL agent. For example, if the candles are like the ones shown below, a buy order would be executed on the 6th of January. The code for this algorithm can be found here.

if candles are like this, a buy starts at 6th Jan

The MQL5 code has the responsibility of sending the last-step-state, new-step-state, applied action, and received reward to the Python environment. The state information includes the stop-loss of the contract, the entrance price, the closed price, and the highest and lowest values of the past 5, 10, 20, 40, 80, and 120 candles (dimension = 15). The action refers to the value of n chosen (n can be 10, 20, or 40). The reward is given if the contract is stopped and represents the profit from the contract.

At each interaction between Python and MQL5, the aforementioned four pieces of information are sent from MQL5 to Python, while only the value of n is sent from Python to MQL5.

As mentioned before, all codes are here.

Reinforcement Learning Agent in Python

The Reinforcement Learning (RL) agent is straightforward to implement and will create a Deep Q Network using 15 features. The Q-Network has 15 inputs, which are explained in the preceding section, and 3 outputs, which correspond to the 3 actions the agent can take in the world (setting n to 10, 20, or 40).

Following picture shows whole idea of related code :

code structure of RL code

The “socket_handling” file employs multithreading to facilitate communication with the Python code. It populates the “input_list” which can be emptied when the run code needs to access it. The “model” and “epsilon” (epsilon-greedy) values are set in the “run.py” file.

The “reinforcement_learning” file contains three classes: “DataPrepare”, “NeuralNetwork”, and “ReinforcementLearningAgent”. The network is a multi-layer perceptron (MLP) with four layers. The “ReinforcementLearningAgent” utilizes the neural network to generate Q-values and actions are taken based on an epsilon-greedy policy, with epsilon decaying over time.

Tips to get better results

Here are some suggestions for improving the results of your RL agent:

  • Visualize the rewards and the loss function of the neural network over time to better understand the training process.
  • Be mindful of the calculation of q-values. It is important to keep these values within a reasonable range (e.g. between -10 to 10) as large values can make the network’s learning more difficult.
  • Consider using the RMSprop optimizer, as it has been shown to be effective in some articles (It’s a good choice often not always).
  • If you are experiencing difficulties, try adjusting the learning rate, the memory size of the Q-learning algorithm, and the epsilon decay schedule. Experimentation is key!

As a final note, in many tasks I saw this pattern of changing loss :

which loss decays very good and suddenly it diverges to large numbers. You have to change some parameters due to your visualizations…

Reference

--

--