Building a trade prediction model for a trader bot

Sciforce
Sciforce
Published in
6 min readNov 10, 2023

The World Trade Organization (WTO) report points out the role of digital transformation in changing the way the world trades. Digital tools help traders to succeed in a world where doing business becomes more complicated.

  • Adaptive Market Predictions: Machine Learning helps traders stay ahead by quickly showing what’s available and at what price, which is important when global tensions can shake up the markets suddenly.
  • Smart Asset Management: Deep Learning makes sure trading assets are used in the best way possible. This helps everyone grow economically, especially when a big wealth gap exists.
  • Resilient Data Handling: AI models are great at working with huge amounts of information and can keep up even when the climate changes make the markets go up and down.
  • Transparent and Secure Trading: In uncertain economic times, it’s important to have automated systems that keep trading clear and safe, handling many channels and currencies without problems.
  • Insightful Competitive Analysis: In an age where being eco-friendly is key, tools that use a lot of data help businesses watch what their competitors do, learn from it, and make more money in a way that’s good for the planet.

Using these new technologies, international trade is getting better and safer and becoming more tuned in to today’s big challenges. This paves the way for a more robust, accurate, and environmentally conscious business world.

Finding an AI solution to identify trader behavioral patterns

The client wanted us to create an actionable AI-backed tool that would analyze traders’ behavior on the market with high operational efficiency and prediction accuracy. From the functional perspective, the tool was expected to perform the following tasks:

· Differentiation between several types of traders, including hedgers, arbitrageurs and speculators;

· Recognition and processing of trader patterns, based on historical data analysis, i.e. previous trading activities;

· Building profiles for each trading pattern to accurately predict the trader’s next moves on the market.

· Simulation of traders’ behavior to provide insights on alternative steps a trader can take in ever-changing environments, such as in the case of market fluctuations;

· Pinpointing the exact price at which a trader opens and closes the trade (Entry Type: IN/OUT) or increases the trade size.

Our solution

For this project, we developed a solid ML-driven trainable model to track bot trader actions. We have chosen two families of ML models to use for the prediction task:

  • LSTM-based neural network, that belongs to the family of recurrent neural networks and
  • XGBoost ensemble of trees with the gradient boosting algorithm over it.

Both models are trained to predict if a trader proceeds with a specific transaction in the next time interval based on their history of transactions. In this way, the models were expected to solve the classification problem for three modes: Idle (no transaction), Buy, and Sell. We noticed that the hardest part was differentiating Idle from Buy/Sell. In other words, if the model correctly detected an upcoming transaction, it could easily predict a transaction type (see the confusion matrix below).

Confusion matrix for predictions of Quiet Moon strategy transactions.

Both XGB and LSTM models provide three types of outputs:

· Action prediction for the next time interval: idle, buy, or sell (i.e. time model).

· Confidence score for a prediction, e.g. 80% confident that the transaction would be Buy, 19% that it would be Idle, 1% that it would be Sell.

· Likelihood of the transaction for the next time interval, given a certain change in price

According to the experiments that we run on several traders, the difference between these model families is not significant. Features, context length, and aggregation window provide a far more dramatic impact on a model than its type.

Input data

As inputs for both models, we used historical data. In other words, we backtracked changes in prices and a trader’s actions to predict what the trader would do next. There are four important co-requisite touchpoints or parameters we consider:

· Time limits for historical data under analysis

· Essential features to be taken for initial profiling

· Relevant data aggregation.

· Raw ticks data to better realize the current market situation

Context length can drastically impact the accuracy level of trader action prediction. To be objective, it should be backed by a comparison of typical time intervals between a trader’s transactions. Having a smaller context often results in accuracy drops. A broader context, on the other hand, is a waste of computational resources and makes learning harder for a model. As such, traders that we sampled in the initial exploratory data analysis had frequencies of around 1–2 hours, which accounts for 100 min for models training a standard context length.

Open price with context and prediction time intervals visualized

Features that we employed at the training stage were based on the earlier-stage analysis. We collected data on the price, profits and position, volume for the aggregated interval, technical indicators, and the likelihood of price patterns. In principle, simple training on the price data can yield strong enough models, but pairing it with extra data like indicators and profits often adds a few percent to the accuracy. To extract the most relevant features for each trader, we developed a feature selection process based on the correlation with changes in the trader’s position. In this way, we can decrease the amount of features we use, speed up the algorithm, and reduce the memory required for it.

Aggregation helps cut down an input size by the time axis. ML-driven models often have a hard time extracting the sparse meaningful signal from the noise that will be inevitably produced by large-scale data. The best way to address this issue is to introduce aggregation of tick data by time intervals. We experimented with 1-hour, 30, 15, 5, and 1-min aggregation periods. Similarly to context length, the optimal aggregation period correlates with the trading frequency.

Raw ticks also appeared to be useful for a share of traders. The first model used aggregated data, while the second model employed a mix of aggregated data to get a larger context and raw ticks data to capture the current situation. In some cases, raw ticks accounted for up to 2–5% of improvement. However, in other cases, they didn’t affect or even decrease the resulting accuracy. It seems that raw ticks are more important for traders who have frequent transactions, while for traders who have 1 transaction per hour on average, all necessary information can be available in open prices for time intervals.

Value delivered

Our team created a consistent and reliable AI tool for trade analysis and prediction

The project’s takeaways reveal the competitive advantages of using two ML-driven models for building bot traders:

· The LSTM-based family is more flexible for further extensions and works more naturally by treating input data as a sequence;

· The XGBoost model has a higher data processing rate. Most of our experiments utilized the XGBoost model to save computational time and implemented raw ticks support only for it.

As a result, the customer enjoys an actionable AI-fueled solution that can perform complex large-scale data processing tasks and come up with high-accuracy trade forecasts. On the other hand, we as developers can take away the following key findings from the project:

· A robust price model can be created by multiple applying the model to the data with the same historical context, yet a different current price.

· The price model is trained to track small price changes, with huge price gaps being treated as false or irrelevant.

· We treat the current price seesaw as merely one of the factors that affect a trader’s actions and that can overlap with the previous price fluctuations.

· The rationale behind the choice of Sell, Buy, and Idle trade modes shows that the order of importance of the abovementioned factors varies greatly from pattern to pattern, albeit some similarities also take place.

--

--

Sciforce
Sciforce

Ukraine-based IT company specialized in development of software solutions based on science-driven information technologies #AI #ML #IoT #NLP #Healthcare #DevOps