6 rules for using machine learning in your trading strategies (part 1)

11 min readFeb 12, 2023

Machine learning (ML) algorithms have become prevalent in our daily lives. It is only natural to consider their potential applications in the financial world, specifically in the realm of algorithmic trading.

However, utilising ML in trading is not as straightforward as it may seem. Developing a successful trading strategy with them requires a deep understanding of the algorithms, as well as a robust philosophy for their implementation.

In this article, we will share some of the key rules or principles, sometimes discovered painfully, that are helping us to efficiently develop successful ML-based trading strategies. Note that this is just part 1 and once you are done, you can move to part 2.

Below is the summary:

Rule 1: Clearly define your problem

Rule 2: You don’t have to use ML

Rule 3: Think out of the box

Rule 4: Use high quality data

Rule 5: Don’t use models you don’t understand

Rule 6: Avoid data snooping

Rule 1: Clearly define your problem

“Give me six hours to chop down a tree and I will spend the first four hours sharpening the axe” — Abraham Lincoln (16th president of the United States)

The problem definition phase is crucial before applying ML, as it is the foundation upon which the entire project is built. By investing time in defining the problem clearly, we can ensure that we are approaching it with the right perspective and understanding, leading to more efficient problem solving. This phase allows for the identification of the right data and tools to solve the problem, which is crucial for a successful outcome.

During this initial step, you can rely on first principle thinking which refers to the method of breaking down complex problems and concepts into their most fundamental building blocks and understanding each individual component in isolation before synthesising a solution. This approach is based on the idea that everything can be reduced to a set of basic truths or axioms and is meant to encourage critical thinking and independent problem-solving. By understanding the underlying principles and mechanics, one can develop a deeper and more intuitive understanding of the problem at hand, leading to more effective and innovative solutions.

By thinking in terms of first principles, we can break down a complex problem into its core components and build a solution from the ground up, leading to a more robust and effective solution.

In practice:

The following steps can be used to develop solutions that are based on a deep understanding of the problem:

Identify the problem: Clearly define the problem you are trying to solve, such as developing a profitable trading strategy.
Gather information: Research the problem and gather data that will help you understand it better. This can include market data, news, and other relevant information.
Break down the problem: Analyze the problem and identify its core components. For example, in algorithmic trading, you might consider factors like market trends, price movements, and the behavior of other traders.
Develop a hypothesis: Based on your analysis, develop a hypothesis about what causes the problem and what might be done to solve it.
Test the hypothesis: Use your hypothesis to design and implement a solution. Test the solution to see if it works and if not, iterate until you find a solution that works.
Evaluate the results: Analyze the results of your solution and determine if it is achieving the desired outcome. If it is not, revise your hypothesis and start the process again.

Rule 2: You don’t have to use ML

“Fall in love with the problem, not the solution” — Uri Levine (co-founder of Waze)

When using any tool, whether it be a hammer, a computer program, or machine learning, it is critical to remember that the tool is simply a means to an end. The goal is to solve the problem, not to use the tool for its own sake. You should focus on the problem you’re trying to solve and not to get caught up in the tools themselves, as they are only a part of the process.

ML is just one of the many tools available for algorithmic trading, and its use is not always necessary. There are a number of other methods that traders can use to make decisions, such as rule-based systems, technical analysis, and quantitative analysis. The choice of method depends on a number of factors, including the trader’s experience, the market being traded, and the trader’s risk tolerance.

Additionally, ML can be complex and time-consuming to implement, and requires a large amount of data to be effective. It also requires expertise in both ML and financial markets, and can be prone to overfitting and other biases if not implemented correctly. Besides, the interpretability and transparency of these models can sometimes be an issue, particularly in a regulated environment like finance. In such cases, it may be better to use a simpler, more transparent approach that allows stakeholders to understand how decisions are being made.

In some cases, simpler methods may be more effective, or a combination of different methods may be more appropriate. The key is to choose the right tool for the job, and to understand the strengths and limitations of each approach.

In practice:

Here are a few signs that ML might not be the right tool for a particular trading strategy:

The problem is not well-defined (go back to Rule 1): If the problem is not well-defined, it is difficult to use ML to solve it. If the goal is unclear or the problem statement is too vague, ML may not be the best option.
The data quality is poor (see Rule 4): ML algorithms require large amounts of quality data to train effectively. If the data is not reliable, complete, or representative of the problem, the results of the algorithm are likely to be unreliable.
The problem is too simple: If the problem can be solved with simple rules or linear models, ML may not add much value. It is paramount to determine if the added complexity is worth the potential benefits.
The time horizon is too short: Machine learning models typically require a large amount of data to train and validate. If the time horizon is too short, there may not be enough data to build a robust model.

In these cases, it may be more appropriate to use a simpler tool or a more traditional algorithmic trading approach.

Rule 3: Think outside the box

“Think outside the box, collapse the box, and take a f**king sharp knife to it.” — Banksy (famous street artist)

As far ML for algorithmic trading is concerned, the majority of use cases we come across (typically here on Medium) are centered around price prediction using market data.

Successfully predicting price movements with ML based on market data only is very hard because of the underlying assumptions behind most models:

Stationarity assumption: Many ML models assume that the underlying data generating process is stationary, which is often not the case in financial markets. This means that the statistical properties of the data are constantly changing over time, making the accurate prediction of future price movements difficult, as the model may be based on relationships that no longer hold true.
Linearity assumption: Some models assume a linear relationship between the inputs and outputs, while financial data can often exhibit non-linear patterns.
Independence assumption: Some models assume that the observations are independent, while financial data can often exhibit strong dependencies and autocorrelations.
Homoscedasticity assumption: Some models assume that the errors have constant variance, while financial data can often exhibit heteroscedastic errors.
Overfitting: It is easy to overfit a model to the training data, leading to poor performance on new, unseen data.
Data quality: Financial data can be noisy and may contain errors or missing values, which can impact the model’s performance.
Changes in market dynamics: Market dynamics can change rapidly, making it challenging to develop models that remain relevant over time. The data is often subject to sudden and large changes in volatility, such as during market crashes or hype cycles. These events can greatly affect the relationships between different variables and can cause the statistical properties of the data to change rapidly.

Instead of spending a tremendous amount of time transforming your market data with fancy mathematical and statistical tools so that they yield the ideal properties, why not simply use machine learning in more indirect but “natural” ways?

In practice:

Here are some examples of how to creatively use ML in your algorithmic trading projects:

Using it to analyze blockchain data, such as transaction history and network activity, to identify patterns that can then be turned into factors to predict price movements.
Incorporating sentiment analysis of social media and news articles to gauge market sentiment and turn the outcomes into factors to predict price movements.
Using natural language processing to extract information from forums, chat groups, and other online communities to identify emerging trends and potential market opportunities.
Implementing generative models to generate synthetic financial time-series data to train models and simulate the market conditions in a more realistic way.
Applying reinforcement learning to train trading algorithms that can adapt to changing market conditions and execute trades in a dynamic and efficient way.
Using evolutionary algorithms to evolve trading strategies in a way that can adapt to the volatility and uncertainty of the crypto market.
Combining all of these techniques to create a multi-modal AI trading system that can analyze multiple data sources and make predictions based on a variety of inputs.

Rule 4: Use high quality data

“Garbage in, garbage out” — a popular expression in computer science

The accuracy of predictions made by ML models is heavily dependent on the quality of the input data.

Poor quality data can result in biased or incorrect predictions, which can negatively impact the overall performance of the strategy.

Good quality data, on the other hand, provides the model with the information it needs to make accurate predictions, helping to improve the performance of the strategy.

To ensure high quality data, it is important to carefully curate and pre-process the data, checking for missing or incorrect values, and ensuring that the data is representative of the market conditions it is meant to model.

In practice:

Here are some practical tips to use high-quality data:

Data sourcing: Obtain data from reliable sources, such as reputable exchanges (e.g. Binance in the context of cryptocurrencies).
Data cleaning: Clean and pre-process the data to correct any errors or outliers. Remove any irrelevant data that may affect the results.
Data verification: Verify the data by comparing it to other sources and checking for consistency.
Data augmentation: Use additional sources of data to augment the main data set. For instance, including news and social media data can be useful in the context of algorithmic trading.
Data monitoring: Continuously monitor the data for any changes or updates, and adjust the algorithmic trading model as necessary.
Data transparency: Make the data transparent and accessible to others, including relevant stakeholders, to ensure that the results are trustworthy and replicable.

Rule 5: Don’t use models you don’t understand

“A tool not understood is a tool not worth having.” — unknown

Using a model that you don’t understand can be dangerous for several reasons.

First, you won’t be able to identify any potential flaws or biases in the model that could negatively impact your trades.

Second, you won’t be able to make informed decisions about when to use the model and when to rely on other methods or intuition.

Third, you won’t be able to interpret the results of the model, which means you won’t be able to understand why a trade was executed or why a certain outcome occurred.

Again, the consequences of all these points are poor results and financial losses.

Rule 6: Avoid data snooping

“If you torture the data long enough, it will confess to anything.” — Ronald Coase (Nobel prize laureate in economics)

Data snooping is a statistical bias that occurs when one uses the same data to select variables and then to fit the model. This can lead to overfitting and artificially high performance estimates, because the model has been fine-tuned to the data at hand and may not generalise well to new, unseen data.

Data snooping can occur in many stages of the analysis process, such as feature selection, model selection, and hyperparameter tuning. For example, one may use a statistical test to select the best predictors for a model, but this test may be overly optimistic if the same data is used to fit the model.

In your early days dealing with ML, you have probably made the typical rookie mistake of training a model with the training set and then evaluating it with the same data. That is pretty much the most basic form of data snooping. However there are more subtle ways data snooping could occur.
Consider the following example. We have a dataset with missing data for one of the features and we decide to use the median value of the available data as a replacement for the missing data. Doing this step before splitting the data between training and testing would lead to data snooping because we would have introduced data from the testing set into the training set via the median operation. Here, it would be critical to first split the data into two independent datasets, and then separately apply the median operator to approximate the missing values.

In practice:

To avoid data snooping in algorithmic trading with machine learning, here are a few practical tips:

Split the data into training and testing sets before preprocessing: It is important to split the data into two sets, training and testing, before performing any preprocessing steps. This prevents the model from using information from the testing data to make predictions, which can lead to overfitting and unrealistic performance metrics.
Use cross-validation: Cross-validation is a technique that splits the data into multiple folds and uses each fold to test the model. This helps to reduce the risk of overfitting and provides a more robust estimate of the model’s performance.
Be transparent and document all steps: Keep a detailed record of all preprocessing and model selection steps, including any parameter tuning or feature selection methods. This helps to ensure that the results are transparent and can be reproduced.
Use multiple performance metrics: Use multiple performance metrics, such as precision, recall, and F1 score, to evaluate the performance of the model. This helps to avoid over-relying on a single metric, which may be misleading.

Conclusion

In conclusion, we have shared some of our key principles for incorporating machine learning algorithms into our trading strategies. These principles are a result of our experience and ongoing experimentation, and we believe that they can help other traders achieve success with ML as well. We hope that this article has been informative and helpful. If you are thirsty for more principles, the second part of the article is already available. Happy trading!

A quick note about us

At AlphaGrow, we are dedicated to help you grow your portfolio while boosting your trading revenues thanks to an in-house fully automated trading system that is hosted on a robust cloud infrastructure. Machine learning is only one of the tools at our disposal to achieve that mission. We also rely on other forms of statistical methods, mathematics and computer science techniques.
Our team of passionate quantitative analysts is constantly working on new strategies. If you are interested in learning more about our strategies and if you want to exchange ideas, feel free to contact us (see below) 🙂 🚀

How to contact us: contact@alphagrow.io

Our website: https://alphagrow.io

6 rules for using machine learning in your trading strategies (part 1)

Rule 1: Clearly define your problem

Rule 2: You don’t have to use ML

Rule 3: Think out of the box

Rule 4: Use high quality data

Rule 5: Don’t use models you don’t understand

Rule 6: Avoid data snooping

Rule 1: Clearly define your problem

Rule 2: You don’t have to use ML

Rule 3: Think outside the box

Rule 4: Use high quality data

Rule 5: Don’t use models you don’t understand

Rule 6: Avoid data snooping

Conclusion

A quick note about us

Written by AlphaGrow