5 Big Obstacles for Investment Management AI

Dr. Justin Washtell-Blaise, ForecastThis

At the Deep Learning in Finance summit in London earlier this year, a topic of conversation that was floated was the inexplicably slow adoption of advanced AI technologies within finance.

It’s difficult to speculate on the accuracy of this premise with respect to investment management, without knowing what’s going on in the geekier recesses of the world’s hedge funds and quant shops. But having followed and worked on AI solutions within the space for a few years now, it’s not hard to find some big reasons…


Rightly or wrongly, hype surrounding the application of AI within investment management tends to focus on the identification of profitable trading strategies.

However, framed simply as making optimal use of the available information, AI has the potential to benefit all aspects of the asset management pipeline, from economic forecasting, through alpha generation, to risk management.

“AI offers a unique opportunity to create highly informed systems in which alpha generation and risk management are one integrated objective.”

Assessing the potential of AI in any one of these settings will at best give a limited picture, and will at worst give a misleading picture. For example it is understood that simple “backward looking” portfolio management can destroy alpha generated by the even most intelligent investment strategy.

For this reason, if we’re going to start anywhere it probably makes most sense to apply AI to the tail end of the chain. But, moreover, AI offers a unique opportunity to remove these bottlenecks altogether and to create highly informed and highly responsive systems in which alpha generation and risk management are one integrated objective.


Many of us would like to believe that we understand the limitations of back-tests, and the dangers of over-fitting. They are widely recognized and often written about. And yet we (both financial professionals and AI researchers) continue to obsess over such tests, and continue to make unjustifiable decisions on the back of them. While we get the basic principle and are happy to extol it to others, we rarely grasp the extent of its implications.

“While we get the basic principle and are happy to extol it to others, we rarely grasp the extent of its implications.”

Some professionals have estimated that the failure rate of AI algorithms, when they hit live tests, is about 90%. Those which make it though live tests often fail at the very next step.

The green plot below shows the back-tested PnL curve of a (recently deceased) AI driven hedge fund, proudly driven by “a team of data scientists who have broken new ground in the discipline of predictive modelling”. The red part of the plot shows what actually happened to their clients’ capital when they began trading it.

“This is a big announcement… Welcome the Third Generation of our Predictor.
We will begin trading it this Tuesday. It is, by far, the sexiest equity curve yet.”

— Quote from partners’ report

What went wrong?

It wasn’t that their trading affected the market (although it certainly did in a small way). Nor was it that world events caused the market to suddenly shift gears and escape the grasp of their model (although that might also have happened). The likely truth is much simpler, and really should serve as a wake-up-call…

Consider the chart below which shows the back-tested Sharpe ratios of 50 completely meaningless randomly generated long/short strategies, trading the S&P 500 throughout 2016 YTD.

The first thing you ought to notice is that nearly a quarter of these random “strategies” have Sharpes of greater than 3. This would by conventional accounts be considered excellent, and we weren’t even trying! Looking at the alphas tells a similar story.

“If you’re going to invest money predominantly on the basis of a back-test, ask to see the other 99 first.”

The problem is that generally speaking it is trivially easy — if not comedically easy — to come up with a strategy which exhibits outstanding test performance. This is not a problem that has arrived with the advent of AI — it is a basic statistical truth which has been making it hard to confidently identify good fund managers for years. But the current predilection for “playing” intensively with different algorithms has hugely exacerbated the problem.

It is for precisely this reason that serious, pipe smoking, scientists typically consider a “significant” result to mean one that is reproducible in 99/100 independent trials. So if you’re going to invest money predominantly on the basis of a back-test, you should probably start by asking to see the other 99 (statistical spoiler: the only way to run 100 independent trials is to test the same algorithm, unmodified, in something like 100 entirely unconnected market contexts).

“This renders any back-test driven strategy selection methodology completely dead in the water.”

This isn’t even the full extent of the problem. The very moment we go back to the drawing board and start using insights from our failed models to improve our algorithmic approach (and what’s the point of studying back-tests and paper trading output if we’re not going to do that), the requirements for achieving statistical significance go through the roof! This is because, with all the best scientific will in the world, we’re now actively selecting approaches which exhibit good test performance.

For this reason, if we want to ensure anything remotely like statistical rigor, we can never evaluate performance using the same data twice! This is of course patently unsustainable, and in fact it renders any back-test driven strategy selection methodology completely dead in the water.


This seems like a Cache 22. In a sense it is, but it’s not in any way specific to AI. We’re just asking the wrong questions.

Undue reliance on paper tests is an example of a broader issue facing AI: when assessing these systems we need to refrain from applying arbitrary assessment criteria which we would not apply in other contexts such as in the hiring of human quants or the selection and application of more traditional statistical methods.

If we were considering hiring a human quant based on their past performance in a very narrow market context (and presumably either unimpressive or untested performance in all others), we would at least require that they could give a coherent, plausible and reasoned account of their working process.

“Probably our very next question should be about how the system works and what its models look like under the hood.”

Accordingly, if somebody can actually back up an AI system with a suite of consistently impressive back-tests across a host of different market contexts, then by all means we should sit up and take notice… but probably our very next question should be regarding how the system operates and what its models look like under the hood. Why should we believe that it works?

Conversely, consider ARIMA, ordinary least squares regression, and mean-variance optimization — workhorse models and algorithms that have been used in economics and quantitative finance for decades. We neither completely rule out nor put everything behind these algorithms based on their test performance in a handful of narrow settings. Rather we accept them based on the soundness and limitations of their underlying math and logic… and we apply them with appropriate context and caveats.


It is entirely understandable that in a time of increasing economic uncertainty, and huge advances in AI and data availability, we should pursue a vision of fully automated investment management.

“The best results for a long time will be achieved by combining human and machine prowess.”

Clearly we are not there yet: we are at the profusely bleeding edge. Even so, it is hard to imagine a time soon when we will not want to know how and why our systems are making the recommendations they are. Even with an unbroken super-human track record. And even without regulatory restrictions.

There is evidence to suggest that — even once our AIs consistently have the edge on us — the best results for a long time will be achieved by combining human and machine capabilities. Years after Garry Kasparov was beaten by Deep Blue, the best chess players in existence are collaborative human-computer teams. As if testament to the power of human intuition, Kasparov realized this fact and embraced it (anybody who blindly tells you that “pure AI” systems are best is just an ironic luddite).

If we do not take this fork now, we risk throwing out the baby of opportunity with the bathwater of risk-aversion.

The popularity (and by extension the contribution to industry) of linear regression — in the face of many more advanced methods — is a testament to its explicability as much as anything else. In order to take applied AI to the next level in finance — or any high-stakes field of human endeavor — we do not need ever more powerful black box algorithms. We need transparent, interpetable, interrogable, ones. We need AIs whose reasoning can be laid bare, from which we can take the best and leave the worst… AIs which can both fuel and be fuelled by our own expertise, curiosity and intuition.

Methods like Deep Learning (which is a family of methods, not an algorithm per se) are only black boxes to the extent that the world is presently preoccupied with what they can “do”. In many respects Deep Learning represents an opportunity to make the markets more understandable than ever before! We need to take this fork now. If we do not, we risk throwing out the baby of opportunity with the bathwater of risk-aversion.


“The financial and mainstream media have an important role to play in resisting the perpetuation of ‘pop finance’ objectives.”

Clearly there is an onus on AI developers to make the operational premises and the limitations of their technologies transparent, and not to be drawn into unsound or shallow demonstrations and justifications. It is, after all, unashamedly not in the purview of AI experts to devise and sell financial products and strategies, but rather to create enabling tools and technologies to empower industry professionals to do so.

Even so, in order to succeed in this, said AI experts need to have a realistic grasp of the unique requirements and challenges of the financial sector. The financial and mainstream media have a very important role to play here, in particular by resisting the perpetuation of “pop finance” objectives.

To pick one example of many, it is notable that all recent media coverage of Hong Kong AI hedge fund Aidyia, led by AGI guru and all-round dude Dr. Ben Goertzel, has focused solely on the absolute return of the fund — a figure which is widely recognized as being completely meaningless without reference to risk. Such reporting inevitably feeds back into research. Of the 20 or so highest-Google-ranking academic papers and presentations that I recently surveyed on the subject of automated market forecasting and trading, only a fraction made any reference to relevant concepts such as Sharpe Ratio, Alpha, or even standard deviation of returns, while the majority talked liberally about “profit” or “returns”.

Rightly or wrongly, such goings-on project an image to the financial industry that the current state-of-the-art has fundamental limitations… and to some extent, as long as researchers are focusing on the wrong problems, the technology will remain irrelevant.