Fooled by Machine Learning Applied to Trading Algo Development
The chances of finding a profitable trading strategy via the use of machine learning are extremely low. There is always a small probability of success when using machine learning but one of the problems is that the results cannot be easily evaluated for significance due to multiple comparisons. Only in hindsight one can determine whether a strategy found by applying machine learning is profitable but that can be costly.
- Mistake 1: Data-mining bias is not only due to curve-fitting (see related article)
- Mistake 2: Conventional statistical tests cannot be used with machine learning
- Mistake 3: Not being aware that machine learning effectiveness depends on quality of features
Leda Braga, a well-known fund manager and a trading system developer, made the following statement in an interview:
There’s a creative moment when you think of a hypothesis, maybe it’s that interest rate data drives currency rates. So we think about that first before mining the data. We don’t mine the data to come up with ideas.
Why did Leda Braga say that? Those who use machine learning to develop trading algos should pay close attention: She said that because when using data-mining to generate algos this is essentially a data-fishing process that has a low probability of success. Trying different combinations of technical and fundamental indicators is a process that involves multiple comparisons. As the number of trials increases, that guarantees a result that even performs well in an out-of-sample and even passes all cross-validation tests.
In general, the probability of finding a trading algo by mining the data that also performs well in an out-of-sample, denoted here as P(algo) , approaches 1 as the number of tests gets very large:
P(algo) = lim[1 — a(n)/n], as n goes to infinity
where n is the number of tests made and a(n) is a real function of n. In the case that a(n) also tends to infinity, the probability is undefined.
Mistake 1: Data-mining bias is not only due to curve-fitting
Some trading system developers think that data-mining bias arises due to curve-fitting. Although the result of data-mining can be a curve-fitted strategy, in general a result that works by chance on an out-of-sample need not be curve-fitted in any particular sense. Data-mining arises from the practice of reusing data to test many different algos. The final selection of some algo that cross-validates on an out-of-sample ignores that many other algos did not pass the validation tests and increases the probability of it being a fluke, not necessarily fitted in some particular sense. It could be an algo that survived until market conditions changed.
Mistake 2: Conventional statistical tests cannot be used with machine learning
Most trading system developers do not avoid machine leaning because they think it leads to flukes necessarily but because they know that conventional statistical tests cannot be used when there are multiple hypotheses involved. One cannot take the Sharpe ratio and multiply it by the square root of the number of years in an out-of-sample test to calculate a t-statistic and test for significance of the observed results. Some serious adjustments must be made for the fact that tests are not independent but they are the result of multiple comparisons.
For example, a strict test involves dividing the required significance level by the number of uncorrelated rules, call that m. If the significance level is set at 5%, the selected algo must be significant at the 5%/m level. Given that some machine learning programs test billions or even trillions of trading rules, one can imagine that we are looking for a quite unusual result. There are ways of relaxing those tests but the idea is the same: corrections must be made and that limits the chances of a significant result.
Mistake 3: Not being aware that machine learning effectiveness depends on quality of features
Machine learning was used extensively in the 1980s for predicting stock returns and exchange rates with dismal results. It was abandoned by quants in favor of the process Leda Braga described above that involves independent hypothesis testing. However, even that process is not perfect as I note in the referenced article.
However, the major factor that determines the results from machine learning is feature engineering. This is an excerpt from another article:
Feature engineering is the hardest aspect of machine learning and algorithmic trading. If the features (predictors or factors) used do not have economic value, performance is unlikely to be satisfactory. Algorithmic trading and machine learning cannot find gold where there is none. The use of widely known features is unlikely to produce anything of value. Developing an algo and applying machine learning is the easy part of this process despite some common misconceptions. A few operators of platforms where aspiring traders gather to test their programming skills offer known features that are “tortured until they confess to anything”. These approaches will probably fail because of data-mining bias. Note that this bias is cumulative and at some point grows out of control.
The lack of creativity of machine learning
It is easy to get fooled by machine learning when developing trading strategies. Only those with an understanding of the impact of multiple comparisons on statistical significance and with knowledge of the appropriate tests can effectively use this process although success is not guaranteed because all tests are conditioned on past data and the future can be different from the past. However, a more important consideration is that most machine learning algorithms do not discover anything more fundamental than their inputs. If the indicators/features used are ineffective in capturing returns in the first place, it is quite likely that any higher complexity algo that uses them will also be ineffective. In other words, machine leaning lacks creativity and that limits its effectiveness besides the other problems related to multiple comparisons.
This article originally published in Price Action Lab Blog.
If you have any questions or comments, happy to connect on Twitter:@mikeharrisNY
Disclaimer: No part of this article constitutes a trade recommendation. The past performance of any trading system or methodology is not necessarily indicative of future results. Read the full disclaimer here.
About the author: Michael Harris is a trader and best selling author. He is also the developer of the first commercial software for identifying parameter-less patterns in price action 17 years ago. In the last seven years he has worked on the development of DLPAL, a software program that can be used to identify short-term anomalies in market data for use with fixed and machine learning models. Click here for more.