6 more rules for using machine learning in your trading strategies (part 2)

15 min readMar 4, 2023

Welcome back! In a previous article, we presented six essential rules for using machine learning in trading strategies. We highlighted the importance of problem definition, questioning your need for ML, creativity, data quality, understanding the model and avoiding data snooping (if you don’t know what it means, go read that article). Here, we are excited to share with you six more rules that are just as critical.

The summary:

Rule 7: Choose transparency over sophistication

Rule 8: Focus on the right metrics

Rule 9: Never look ahead

Rule 10: Beware of the survivorship bias

Rule 11: Don’t forget to consider latency

Rule 12: Keep questioning your successful strategy

Rule 7: Choose transparency over sophistication

“Everything should be made as simple as possible, but not simpler.” — Albert Einstein

Let’s say that you have applied rule 1 (clearly define your problem) and rule 3 (think outside the box) to come up with a sound problem statement. Then after reflecting on rule 2 (you don’t have to use ML), you decide that ML would naturally fit the problem. You make sure that rule 4 is also checked (use high quality data). Then comes rule 5 (don’t use models you don’t understand). If you think to yourself “Hmm, this one is easy. This is my area of expertise. I have studied all existing ML models thoroughly. I can just choose the one that yields the best performance”, then think again.

Prioritising performance can lead to the selection of complex models at the cost of interpretability and explainability. It is actually often preferable to opt for simplicity and transparency over sophistication and opacity when choosing a machine learning model for algorithmic trading.

First and foremost, simpler models are easier to understand, interpret and troubleshoot. The underlying mechanics of simpler models are more intuitive. This allows a faster identification of potential issues which means that the rate at which you will be enhancing your strategy is higher. When a model is too complex, it can be challenging to identify the source of errors.

Secondly, simple models tend to be more robust and less prone to overfitting. Overfitting occurs when a model is too complex (which usually means it has too many parameters) and fits the training data too closely. As a result, the model will perform poorly on new data therefore causing losses.

Thirdly, simpler models are more efficient, both in terms of time and computational resources. While complex models may offer marginal improvements in performance, they often come at a high cost. They require more data to train, more computational resources to run, and more time to tune. These factors can slow down the process of building and deploying the model, ultimately impacting your ability to react quickly to market events.

Finally, simple models can be more interpretable and explainable. This is crucial when you need to justify your decisions and strategies to stakeholders.

In practice:

Here are some machine learning models that are considered to be easy to interpret and explain:

Linear regression: This model is simple and easy to understand because it involves a linear relationship between the input variables and output variables. The coefficients of the model can be easily interpreted to understand the effect of each input variable on the output.
Decision trees: This model is intuitive as it can be represented as a flowchart. It is easy to understand how the model makes a decision by following the path through the decision tree based on the input features.
Random forests: This model is an extension of decision trees and combines multiple decision trees to make a prediction. The ensemble approach makes it easy to interpret as the decision process of the model can be traced to the individual decision trees.
Naive Bayes: This model is based on the Bayes theorem and makes predictions based on the conditional probabilities of the input features. It is simple and easy to understand as it involves counting the number of times each input feature occurs and using that information to make a prediction.
Logistic regression: This model is similar to linear regression but is used for classification tasks. The coefficients of the model can be interpreted to understand the impact of each input feature on the probability of the predicted class.
K-Nearest Neighbours: This model is based on the distance between the input features and other data points in the training set. It is easy to interpret as the model makes predictions based on the closest neighbours in the training set.
Support Vector Machines: This model works by finding the optimal boundary that separates the different classes. It is easy to understand as the boundary can be visualised in two dimensions, and the position and orientation of the boundary can be interpreted to understand how the model makes predictions.

These models are simple, rely on intuitive principles, and involve a clear relationship between the input features and the output variable. They also have a limited number of parameters.

If you want to use a more opaque model, there also exist frameworks that can provide intuitive explanations.
LIME (Local Interpretable Model-Agnostic Explanations) and SHAP (SHapley Additive exPlanations) are two popular frameworks used to explain the predictions of complex and opaque machine learning models.

LIME is a model-agnostic method that explains the output of any classifier or regressor by approximating it with a local interpretable model. LIME generates explanations for individual predictions by perturbing the input data and observing how the prediction changes. It then fits a simpler, interpretable model to the perturbed data in the local neighbourhood of the instance of interest, and extracts feature importance values from the local model to explain the prediction.

SHAP, on the other hand, is a model-agnostic method that uses game theory to explain the output of any classifier or regressor. It aims to provide a unified framework to explain the output of any model based on the contribution of each feature to the prediction. SHAP assigns a feature importance value to each feature for each data point based on the Shapley values from cooperative game theory. Shapley values measure the marginal contribution of each feature to the prediction by considering all possible subsets of features.

Rule 8: Focus on the right metrics

“The goal is not to have a good model. The goal is to solve a problem.” — Michael I. Jordan (the researcher, not the basketball player)

The Sharpe ratio is a financial metric that measures the risk-adjusted performance of an investment strategy. It was developed by Nobel laureate William F. Sharpe in 1966 and has since become a widely used tool in finance. The Sharpe ratio takes into account both the return of an investment and the risk involved in achieving that return, with the aim of helping investors to evaluate the attractiveness of different investment opportunities. A higher Sharpe ratio indicates better risk-adjusted returns, while a lower Sharpe ratio indicates lower returns relative to the risk taken. It is a key metric for evaluating the performance of a trading strategy and determining whether it is worth pursuing.

While the Sharpe ratio is a widely used metric for evaluating the performance of investment strategies, it is not the only option available. Other commonly used metrics include the Sortino ratio and the information ratio. In this article we will focus on the Sharpe ratio.

When developing a ML-based trading strategy, you might end-up focusing your energy on optimising the ML evaluation metric (e.g. accuracy). However, don’t forget that the ultimate goal of a trading strategy is to make profit, which is better captured by the Sharpe ratio. Optimising for the ML metric is not equivalent to optimising the Sharpe ratio. For example, a model that predicts price direction with high accuracy may generate too few trades to achieve a high Sharpe ratio. Conversely, a model with lower accuracy but generates more trades with better risk management may achieve a higher Sharpe ratio. The ML model is only one component in your entire trading pipeline. Other components like the take profit, the stop loss or the maximum holding time will impact the strategy performance. This is why it is crucial to focus on the Sharpe ratio and not get lost in trying to optimise the ML metric. It doesn’t mean that the ML metric is useless, but it should be viewed as a mean to an end rather than an end in itself. It can for instance indicate whether the model has learnt a meaningful pattern or not.

Rule 9: Never look ahead

“If time travel is possible, where are the tourists from the future?” — Stephen Hawking (theoretical physicist)

Look-ahead bias occurs when information that would not have been available at the time of trading is used to evaluate the performance of a trading strategy. This creates an unrealistic expectation of the strategy’s effectiveness, as it is not based on historical data that would have been available for decision-making at the time of trading.

Therefore, it is critical to strictly adhere to a “no lookahead” policy. This means that data that is not available at the time of trading must not be used. Be certain about the timestamp for each data point.

Signs of look-ahead bias include unrealistically high returns or excessively low drawdowns when evaluating a trading strategy. Basically be wary when the results look too good to be true.

In practice:

K-fold cross-validation is a technique used to evaluate the performance of a machine learning model in a way that is more reliable than a simple train-test split. It involves partitioning the data into k subsets (or folds) of equal size, and then training the model k times. In each iteration, the model is trained on k-1 folds and tested on the remaining fold. The final evaluation score is the average of the k individual scores.

However, k-fold cross-validation is not well-suited for models used in the context of algorithmic trading (timeseries) because it is prone to look-ahead bias.

Indeed, we would end up with data points in the training sets that occur after some of the data points in the test set.

K-fold cross-validation data split with k=4

When dealing with time series data, an alternative is walk-forward validation.

The process is as follows:

Divide the available data into an initial in-sample period and a subsequent out-of-sample period.
Train a model on the in-sample period.
Use the trained model to make predictions for the out-of-sample period.
Evaluate the performance of the model on the out-of-sample period.
Move the “window” forward by one step, so that the in-sample period now includes the previous out-of-sample period, and repeat the process from step 2.
Continue this process until the end of the available data is reached.

The advantage of walk-forward validation is that it prevents look-ahead bias because it only uses past data to train the model and test its performance on future data. In other words, the model is only evaluated on data that it has not seen before, which is what will happen in production.

Walk-forward cross-validation with sliding window

Rule 10: Beware of the survivorship bias

“Success is most often achieved by those who don’t know that failure is inevitable.” — Coco Chanel (French fashion designer)

Survivorship bias is a common error that occurs when analysing any set of data, and only considering the subjects or data points that have “survived” or succeeded in a particular process, while ignoring those that have failed or dropped out. This can lead to distorted or incomplete conclusions, as the omitted data can often be just as important as the included data. Survivorship bias can occur in a variety of contexts, such as historical research, finance, marketing, and even personal decision-making.

Let’s take an example to illustrate the concept in our context. Let’s say I want to test my trading strategy on the top 10 cryptocurrencies by market capitalisation over the past year. I collect data for the past year for these 10 cryptocurrencies and test the strategy on this dataset. However, I may not realise that some of the cryptocurrencies that were in the top 10 at the start of the year may have dropped out of the top 10 by the end of the year due to poor performance. By excluding these cryptocurrencies from their dataset, I have inadvertently introduced survivorship bias into my analysis. I am only evaluating my strategy on the survivors, or the cryptocurrencies that have performed well enough to remain in the top 10 over the entire year. This can lead to an overestimation of the effectiveness of the strategy and could lead to poor performance when applied to the entire set of cryptocurrencies. Here, it would have been more judicious to include delisted assets, thus ensuring that your dataset is representative of the entire market, rather than just those assets that are currently popular or performing well.

Rule 11: Don’t forget to consider latency

“Time is money”

Latency refers to the delay between an event occurring and the time it takes for that event to be processed. In our context, latency can have a significant impact on both backtesting and the production environment.

In production, delays caused by latency can lead to missed opportunities, slippage, and reduced profitability. Slippage is a phenomenon where an order to buy or sell an asset is executed at a different price than expected. Outside of delays in order execution, it can occur due to other reasons such as market volatility or illiquidity.

In order to assess how big of a deal latency issues are in your use-case, you have to consider the trading timeframe you have chosen.

Intraday trading involves holding positions for a few hours or less, with trades being executed multiple times per day. Latency issues can have a significant impact on intraday trading as prices can move quickly, and traders need to enter and exit positions promptly to profit.
High-frequency trading (HFT) is a type of intraday trading that involves executing large numbers of trades in a short amount of time. In HFT, even small amounts of latency can result in missed trading opportunities and lost profits.

Swing trading involves holding positions for a few days to a few weeks, with trades being executed less frequently than in intraday trading. Latency issues can still impact swing trading, but not as significantly as in intraday trading.

Position trading involves holding positions for weeks, months, or even years, with trades being executed infrequently. Latency issues are less likely to impact position trading as there is more time to enter and exit positions.

If you cannot afford your current levels of latency you might want to consider hardware upgrades, network optimisation, and the use of low-latency software tools.
In Python, tools such Cython and Numba can help to speed up data processing. For ultra-low latency requirements, shifting to a faster language like C++ may also be worth considering.

We also advise optimising your infrastructure. To achieve this, having a team with strong skills in architecting and using the cloud computing platform of your choice is a must. Having the ability to design and implement the most efficient and cost-effective infrastructure for data storage, processing, and computation, is a game changer. At AlphaGrow we use AWS and the current infrastructure is the result of numerous refactoring. Besides latency, we also have criteria related to cost, scalability, stability and reliability. We will probably talk more extensively about the building blocks of a decent architecture for algorithmic trading in a future article.

Latency issues lead to discrepancies between production and backtesting. Indeed, the historical data used to train and backtest your ML models may not reflect real-world conditions if latency is not taken into account. Models that appear to perform well in backtesting may not work as well in real trading scenarios. Therefore, modelling the latency can be a rewarding yet challenging task because of the various factors that could be involved (e.g. network latency, processing latency, and data transmission latency). One approach is to incorporate latency as a feature in the model. For example, a model can be trained to predict the probability of a trade being executed based on the current market conditions and the expected latency of the trade. Another approach is to simulate different levels of latency during the training phase of the model. This allows the model to learn how to adjust its trading strategy based on the latency that is experienced in the real-world trading environment.

Rule 12: Keep questioning your successful strategy

“Stay hungry, stay young, stay foolish, stay curious, and above all, stay humble because just when you think you got all the answers, is the moment when some bitter twist of fate in the universe will remind you that you very much don’t.” — Tom Hiddleston (English actor)

One of the benefits of algorithmic trading over manual trading is the fact that human biases and errors are minimised. However, the designers and supervisors of these robots are humans. This means that human-related flaws can still manifest themselves in different ways (we have already mentioned some of them) and cause damage.

Overconfidence and complacency are common pitfalls that can arise when a strategy performs well. While a successful strategy can boost confidence and lead to a sense of security, it is paramount to remain vigilant and avoid becoming too comfortable.

Excessive confidence can make you take risks that are beyond what is appropriate for the situation. This can lead to large losses if the market turns against you, and can even wipe out an entire account. Overconfidence can even also lead you to neglect your risk management strategies, such as setting stop-loss orders or position sizing appropriately.

Complacency, on the other hand, can lead you to miss opportunities to improve your strategies. When things are going well, it can be tempting to stick with what is working and ignore new information or market conditions. The consequence is that you might miss out on potential profits or fail to adapt to changing market conditions.

From a mindset point of view it is highly advised to maintain a healthy level of self-doubt and continually reassess one’s strategy, even when it is successful to make the right decisions that will ensure the strategy adapts to the changing market conditions. Remain humble and don’t forget that past returns do not guarantee future returns. You should keep that state of mind when looking at all the components of your strategy. For instance, the models are only as good as the data they are trained on, and the data is not a perfect representation of future market conditions. And if we consider the universe of assets (the set of assets that the strategy trades on), as the market conditions change, the optimal universe may also change which requires periodically reviewing the universe of assets and ensuring that it is still appropriate.

If developing a strategy that shows stellar performance in in-sample and out-of-sample backtesting runs, confirmed by weeks if not months of live results, is not enough, what else needs to be done?

Well, a constant monitoring of the strategy is the answer. This involves tracking the performance of the strategy over time and comparing it to the benchmarks. One way to monitor the strategy is to keep an eye on its key performance indicators (e.g. returns, Sharpe ratio, maximum drawdown, holding times etc.). They can help to identify whether the strategy is still performing as expected, and if not, it can trigger an investigation into what needs to be done to rectify the situation.

Another key component of strategy monitoring is post-trade analysis. It basically involves analysing the trades executed by the strategy to identify any shortcomings or inefficiencies. This can be done by comparing the actual trades executed by the strategy in production to the trades that were expected based on the backtesting engine. By doing so, it is possible to identify unexpected events that occurred in production and that were potentially not captured during the calibration (selection of the optimal strategy parameters using the backtesting engine) phase. These discoveries are often opportunities to upgrade the strategy and make it more robust.

Conclusion

In conclusion, we have presented six more principles that we have learned through years of developing automated trading strategies with machine learning. We hope that our insights and experiences can help guide others on their own journeys towards successful trading strategies. While these principles are specific to machine learning in some respects, we recognise that many of them apply more broadly to algorithmic trading as a whole. It is our hope that these principles, combined with those presented in our previous article, will inspire traders to continue to seek out and develop their own best practices, and to approach their work with humility and a willingness to learn from both their successes and failures. With perseverance and dedication, we believe that the possibilities for success in algorithmic trading are limitless.

A quick note about us

At AlphaGrow, we are dedicated to help you grow your portfolio while boosting your trading revenues thanks to an in-house fully automated trading system that is hosted on a robust cloud infrastructure. Machine learning is only one of the tools at our disposal to achieve that mission. We also rely on other forms of statistical methods, mathematics and computer science techniques.
Our team of passionate quantitative analysts is constantly working on new strategies. If you are interested in learning more about our strategies and if you want to exchange ideas, feel free to contact us (see below) 🙂 🚀

How to contact us: contact@alphagrow.io

Our website: https://alphagrow.io

6 more rules for using machine learning in your trading strategies (part 2)

Rule 7: Choose transparency over sophistication

Rule 8: Focus on the right metrics

Rule 9: Never look ahead

Rule 10: Beware of the survivorship bias

Rule 11: Don’t forget to consider latency

Rule 12: Keep questioning your successful strategy

Rule 7: Choose transparency over sophistication

Rule 8: Focus on the right metrics

Rule 9: Never look ahead

Rule 10: Beware of the survivorship bias

Rule 11: Don’t forget to consider latency

Rule 12: Keep questioning your successful strategy

Conclusion

A quick note about us

Written by AlphaGrow