The Case For Finance-First Machine Learning Research

Published in

The Principled Machine Learning Researcher

20 min readJul 10, 2018

Today, we are announcing the launch of the Pit.AI Research Paper Series, which aims at contributing to the emergence of a global and open research protocol on AI research in finance.

We will be uploading all papers in the research section of our website.

Why Do This?

The short answer to this question is that we are witnessing cultural and social obstacles on the way to an AI revolution in finance that we believe can — and perhaps should — be addressed with open, finance-first machine learning research.

Examples of such cultural and social obstacles vary from old ideas that are strongly held and overly generalized, to the noise created by the AI hype, to the widespread adoption of modeling paradigms that can be dogmatic and might sometimes not appreciate the limitations inherent to their working assumptions.

When The Special Case Becomes The Rule

In an attempt to build a simple map of the trading or investment management landscape, it is not uncommon for investment management professionals to form unnecessarily strong associations.

For instance, I’ve spoken to several seasoned fund managers who were under the impression that only stocks can be traded long-short — arguably because the idea of constructing long-short portfolios of assets originated in stock markets. Other fund managers I spoke to had a hard time understanding why one would express a currency trade through a long-short basket of currencies rather than crosses — arguably because currency traders typically express views through crosses.

Needless to say that, like stocks, foreign currencies can be bought with, or sold for, the U.S. dollar. Hence, a long-short portfolio of foreign currencies (and other asset types) is as natural a concept as a long-short portfolio of stocks. Similarly, considering that trading one or multiple crosses effectively results in a long-short portfolio of currencies, one would think that expressing currency trading ideas directly through baskets of currencies should be as effective as doing so through crosses.

This example, which I chose for its simplicity, is one of many excessive generalizations that made me reevaluate the importance of being more open about how we think, in this case why we value thinking from first principles. In our Research Paper Series, we will be conducting and promoting research from first principles whenever appropriate.

Denoising The AI Hype

Another social obstacle which can be harmful to an AI revolution in finance is the noise generated by the AI hype.

AI is the new Big Data. Credit: Dan Ariely

Every quant fund claims to have AI, but two quant funds can hardly agree on what AI actually is. Some of the most established quant funds consider the algorithms they have been using over the past 20 years to be AI. Others, among the newer funds, go as far as equating any binary classifier, one of the most basic machine learning systems, to AI.

At Pit.AI Technologies, we believe that the future of investment management is one where computers are empowered, through a rigorous and innovative application of mathematics, to autonomously perform complex investment tasks once thought to require human intelligence, from automatically identifying investment opportunities in new data sources, to automatically and efficiently allocating capital to found investment opportunities.

In order to do so, we believe that some clarity around the problem formulation is of paramount importance.

What exactly do we call AI, and what do we call Machine Learning?

Artificial Intelligence: We use the expression Artificial Intelligence or AI to refer to one or multiple autonomous computer processes collaborating on solving a complex problem that is traditionally thought to require human intelligence, and performing at least as well as human experts.

Machine Learning: We use machine learning on the other hand to refer to the mathematical and engineering methods contributing towards creating Artificial Intelligence.

To further clarify these two definitions, let’s consider a couple counter-examples.

What do we not consider Artificial Intelligence (AI) or Machine Learning (ML)?

Binary Classifiers Are Not AI: It’s not uncommon to hear startups, including startup hedge funds, call any binary classifier ‘AI’. Most researchers with a basic graduate-level understanding of machine learning would agree that this is an abusive use of the expression AI, to say the least. Binary classification is a machine learning problem, a very old one, of which a binary classifier is an algorithmic solution.

This is not to say that binary classifiers cannot be used as building block to an AI system; they can, and they have been used as such. However, whether said system is indeed AI boils down to three key questions: i) Is it autonomous? ii) How complex a problem is it trying to solve? iii) Does it perform as well as human experts?

Machine Learning Is More Than Deep Learning: Another common misconception is that machine learning is all about deep learning. Despite the fact that most mainstream machine learning breakthroughs over the past decade can be attributed to deep learning, deep learning is not the only set of methods that can be used to build AI in general, let alone AI for investment management.

The essence of deep learning is the very idea that models need to be deep — typically have hundreds of thousands of parameters — to be effective. Successfully training complex models however requires two key ingredients that are often taken for granted in physical applications, but that are rare commodities in finance: a lot of the same type of data and a high signal-to-noise ratio.

Unlike physical systems, it is not always possible to acquire more of the same type of financial data. Getting more market data for instance often involves sampling the state of the market at a higher frequency (e.g. tick data as opposed to aggregates). However, investment opportunities are typically frequency-specific.

Let’s take the most granular form of market data, namely tick-by-tick data, as an example. They are available in vast amount (peta bytes and beyond), but mispricings at the tick level are often short-lived and the resulting trading strategies typically cannot sustain large asset allocations — it takes time to trade big orders without moving the market.

A simple example of an overfitted binary classification frontier. Credit: Wikipedia

Lower frequency data aggregates (e.g. daily, weekly, monthly) can be used to detect investment opportunities that scale to larger asset allocations. However, at this frequency, how long an asset has been listed on its exchange for places a hard cap on the amount of data available, limitation we can’t circumvent with crowdsourced data acquisition solutions such as Amazon’s Mechanical Turk — which have played, and are still playing, a key role in gathering the training data without which deep learning breakthroughs wouldn’t be possible. When dealing with physical signals, we can always collect more of the same data (at a cost). In finance this is not an option, and the amount of lower frequency returns data available for backtesting might not be sufficient to reliably train models with hundreds of thousands of parameters.

To add insult to injury, financial markets are noisy, much noisier than physical systems, so much so that overall, for the same model complexity, one would need a lot more financial data than physical data to achieve the same level of accuracy on a traditional machine learning problem such as a regression or a classification.

Lastly, recent works have shown neural networks — the building blocks of deep learning — to correspond to a special case of other machine learning approaches such as Gaussian Process models and Reproducing Kernel Hilbert Spaces, both of which offer a more principled and effective approach for learning the model complexity one can afford for a given amount of data and level of noise; I happen to have written my PhD thesis on this subject.

In short, when it comes to investment management, deep learning is certainly not the only research avenue to build AI; model complexity learning should be a critical part of model building, and this is something that deep learning is notoriously bad at.

With the Pit.AI Research Paper Series, we aim to provide some clarity on the type of research we undertake, so as to help stakeholders crack through buzzwords in pitches, and have higher technical expectations. We believe this will help denoise the AI hype.

The Regularization Imbroglio

As previously discussed, training too complex models on (noisy) financial data can be fraught with danger; this is something investment management professionals have, of course, long realized.

Use a deep enough neural network, and it can make your dataset tell any story you want! Such stories however will be specific to the training dataset, but will not be characteristic of the underlying phenomenon, and thus will fail to generalize to new data.

A trading strategy trained using too complex a model (relative to the amount of data available and the noise in the data) can perform very well over the training horizon, but will most likely fail to generalize to the future: enter backtest overfitting, the ultimate fight of hedge fund quants.

All models are wrong, some are useful, but one model’s imperfection, can be another’s strength.

Multiple approaches are used in practice to mitigate backtest overfitting. The basic idea is often to use a scheme to carefully select a small percentage of inputs powering the decision process, or put an emphasis on a smaller subset of the overall model space, idea referred to as regularization in machine learning. This can prove very useful, as long as one is aware of the limitations inherent to one’s approach, and one actively seeks to compensate said limitations with other models. However, some modeling paradigms, very popular in the investment management community, are considered dogmas; this goes against the need for flexibility and open-mindedness in any scientific exploration, and in the search for AI in investment management in particular. I’ll briefly illustrate this point with three examples.

Dogma 1: In Linear Models With Variable Selection We Trust, In Alternative Data Lies Our Alpha!

A popular approach for dealing with the low signal-to-noise ratio in financial markets and the resulting risk/fear of overfitting is to use linear models for pretty much everything, from understanding risk to forecasting returns, in an effort to keep things simple, robust, and ‘explainable’. Of course, the resulting model wouldn’t be simple if it depends on too many input factors, so regularization techniques (such as Ridge, the LASSO or Elastic Net) are often in order.

The underlying assumption here is that there is more competitive advantage to be gained from data, so-called alternative data, especially when one can gain exclusive access, than from smarter modeling. This can work for some time, but eventually all alternative datasets become linearly priced-in as they become widely adopted. In other words, researchers would have a hard time finding inefficiencies from easily available datasets using linear models, as other players are analyzing the same datasets with the same tools and models.

This is often thought to be the end of the story. However, there is strong empirical evidence of non-linear dependencies between financial time series, especially time series of returns (see our first paper), which would suggest that, as long as this school of thought prevails, non-linearities will not be priced-in, there will be alpha to mined from popular datasets using non-linear models, and therefore that this school of thought might be throwing the baby out with the bathwater.

Dogma 2: No Black-Box Please, The Strategy Needs To Be Driven By An Economic Narrative!

The most aggressive form of regularization, albeit rarely acknowledged as such, is the idea that all so-called ‘block-box’ strategies should be avoided, and that the best trading strategies ought to start with a simple economic narrative.

There is substance in this school of thought. After all, is it not part of human nature to always seek to understand why? The subtlety though is that we could be asking why at multiple stages in the investment process. Should we be asking why could mispricings occur as a prerequisite to finding mispricings, or should we be seeking to find reliable mispricings first, and then try to understand what structural reasons led to found mispricings? We believe the right answer depends on the circumstances.

When data to learn from is scarce, we might not be able to reliably identify mispricings from data, and asking why mispricings could occur can serve as a valuable regularizer, promoting ideas of mispricings that are consistent with one’s (hopefully expert) background knowledge. Here, background knowledge acts as complement to the small amount of empirical evidence available.

On the other hand, when data abound, background knowledge might not be needed. In fact, requiring the outcome of the inference process to be consistent with background knowledge might strongly limit the amount of information asymmetry one can gain. Let’s consider a simple example.

Hedge funds A and B have access to the same alternative dataset consisting of all credit card transactions of all U.S. stores, except that only hedge fund A knows what the dataset is made of; B sees the dataset as a set of time series. Hedge fund A sought access to the dataset hoping that it would provide an early indicator of the strength of the economy, which it would use to anticipate market moves following scheduled economic releases. Based on this rationale, hedge fund A is comfortable using the alternative dataset in a forecasting model. Hedge fund B on the other hand would not run any experiment with the alternative dataset, despite the dataset being big enough to train any model in B’s toolbox, on the basis that it does not know what the alternative dataset is, and therefore cannot relate it to any economic narrative.

Hedge fund A didn’t use any background knowledge to exploit the alternative dataset; it used its understanding of what the dataset represents to determine whether to use it, not to train its forecasting model. Hence, hedge fund B could have trained a model as accurate as A’s, and is clearly missing out!

In short, the need to understand why can only serve the purpose of building confidence in a trading idea. However, there are multiple ways of achieving the same goal, and one should strike the right balance between relying on background knowledge and relying on empirical evidence. When data abound in high quality, we should almost always let the data speak for themselves, and be open to the idea of empirical evidence disproving our prior believes. However, whenever good quality data is scarce, background knowledge, which are becoming increasingly accessible to machines in the form of knowledge graphs, can prove crucial.

Dogma 3: Regularization By Token Staking Solves It All!

Another attempt to address overfitting, although not as widely adopted as the other two, is to require quants to put their money where their mouth is, so to speak.

Numerai’s implementation of this idea is to require users on their platform to express a probability that their stock market predictions will beat a benchmark in live mode, and put down a ‘stake’ they are willing to loose in case they are wrong. The most confident user submissions are then selected for live trading, and selected users earn a payout proportional to their stakes if they beat the benchmark, but loose their stakes if they don’t.

This incentive mechanism is good at one thing: aligning the incentive of Numerai with those of its users. Thinking of this incentive mechanism as a general solution to backtest overfitting on a crowdsourced platform like Numerai implicitly assumes that backtest overfitting on such platforms can only arise intentionally; that is, if users are trying to game the platform. This is obviously not true: a data scientist with good intentions can — obviously — still produce an overfitted model!

In a world where all Numerai users have the best intentions, the likelihood of overfitting should have nothing to do with how much money users are willing to stake, or the Numeraire token for that matter. Indeed, how much money users are willing to stake is a function of i) their confidence in their model, ii) their savings, and iii) their risk appetite. ii) and iii) obviously have no (direct) effect on overfitting. As for i), any rational user sharing Numerai’s incentive will form his/her confidence in his/her model based solely on model performance on training and validation data, which Numerai can also access in user submissions. Thus, the stake of a user in this ideal case can only inform Numerai about ii) and/or iii), and we are back to square 1.

The aim here is not to single-out these 3 regularization approaches — they all have their merits — , but rather to illustrate how being dogmatic about a modeling approach can be suboptimal, thereby hindering the emergence of an AI revolution in investment management. With the Pit.AI Research Paper Series, we are open to revisiting even the most basic and popular modeling ideas, as long as there is a legitimate case for doing so.

Our Approach To Solving Intelligence For Investment Management

Our mission at Pit.AI is to Solve Intelligence for Investment Management.

To be specific, this means building an AI that successfully performs all key investment functions at a hedge fund that are traditionally attributed to human beings; from finding new investment opportunities from any data sources, to dynamically allocating capital to found investment opportunities.

We break down our AI into three layers of abstraction, each corresponding to a separate (non-conventional) machine learning problem.

The Eyes of Our AI Traders: The goal here is to aggregate as much price-sensitive data as possible into low dimensional features that are as useful for finding investment opportunities as the original raw data. This is so that we may empower our AI with as much price-sensitive information as a human Quant would have access to, but without overwhelming our AI computationally. Said differently, we are bridging the information gap between humans (Quants) and our AI, under a feature budget constraint.

The best analogy is how we make sense of images. Images are made of pixels, yet when the human brain attempts to make sense of charts on a Bloomberg terminal, or news articles online, it does not make sense of pixels individually, which would be computationally impractical. Instead, the human brain makes sense of features/representations such as curves, levels, shapes, faces, sentiment in a text, etc.

If you are a machine learning researcher and this sounds familiar to you, it is because this problem is closely relate to auto-encoding, lossy compression, and manifold learning. Sadly, the level of noise in financial data rules out applying nearly all these techniques out-of-the-box. For instance, financial data are unlikely to lie on a smooth manifold — noise and smoothness can’t co-exist. Moreover, using as reconstruction loss function some measure of the energy of the reconstruction error relative to the energy of the input data — as it is done in auto-encoding and lossy compression — would be ill-advised. Indeed the energy in financial data being dominated by noise, a reconstruction error with low energy does not necessarily mean that the signal or useful part of the input was preserved by the compression. To take a concrete example, if a time series of returns has a signal-to-noise ratio of 10%, — which is on the high end of the spectrum — one can achieve a ~90% reconstruction accuracy while literally loosing all the useful part of the time series.

The Brain of Our AI Traders: The objective here is to continuously learn from the compressed features coming out of the eyes of our AI, trading strategies that perform as closely as possible to a set of user-specified performance criteria such as Sharpe ratio, annualized return, maximum drawdown etc.

The outputs of this stage are trading strategies whose number keeps growing over time, and each AI trader gets better over time.

Our AI Fund Manager: The goal of the AI fund manager is to dynamically optimize capital allocation across the very large number of strategies found by our AI traders.

This problem can be regarded as the traditional asset allocation problem, but on a scale that renders all existing models impractical. Nearly all asset allocation models in the literature require computing a square matrix of size the number of assets/strategies to be allocated capital to (e.g. the covariance matrix, or a shrunk version thereof etc.) that also needs to be inverted. These operations simply do not scale beyond 10,000 assets or strategies, number that is orders of magnitude below our requirement.

Our First Paper

The Question: What Makes An Asset Useful?

Investment managers are routinely faced with determining whether to allocate capital to newly found trading strategies, or whether to consider trading new assets (classes). In our first paper, we provide an information-theoretic formalism for quantifying the potential value an asset or trading strategy can add to an investment manager’s setup, independently from the manager’s specific allocation process, and we derive scalable algorithmic tests.

Why Does It Matter?

A Fresh Perspective On Age-Old Problems

The first reason why this paper matters is that it provides a fresh, unified and, we hope, better perspective on age-old investment questions routinely and independently answered by practitioners and academics.

How much value is there in cross-asset-class diversification relative to within-asset-class diversification? How should one be thinking about balancing out exposures to different asset classes? How should fund-of-funds be thinking about manager and/or trading style diversification? Do linear i.i.d. factor models (i.e. the alpha-beta dichotomy) reflect the true risk in your portfolio? Do returns of funds, trading strategies or assets in your portfolio exhibit non-linearities, temporal dependencies and/or fatter tails than Gaussian? How hedged is your portfolio to Black-Swan events?

How do you build confidence in whether a returns time series can be forecasted, independently from, and prior to, attempting to forecast it? How do you build confidence in whether your new alternative dataset can improve the predictability of returns of assets you trade, before allocating any resources to exploiting it? When you have a hard time forecasting a time series of returns, which is to blame, your forecasting model or the lack of signal in the time series?

How do you assess if a newly found trading strategy is a good as your best performing trading strategies, while accounting for the number of trials, so as to mitigate backtest overfitting?

All these questions can be regarded as special cases of the fundamental question “What Makes An Asset Useful?”, which is the object of our first paper. That said, whether you are interested in the full discussion or not, if you ever asked yourself any one of the questions listed above, you’ll find in our first paper rigorous new machine learning methodological contributions developed with a finance-first mindset, as well as scalable algorithmic solutions and empirical evidence, which we hope will provide you with a fresh perspective.

A New Paradigm For The Machine Learning Age

Features learning plays a pivotal role in machine learning.

The first few breakthroughs in the ImageNet competition, which arguably paved the way for the computer vision revolution we’ve witnessed over the past decade, were based on the idea that, to classify objects in images, we are better-off first learning a lower-dimensional representation (or features) characteristic of all images we would like to classify, and then classify images based on these lower-dimensional features, idea known as unsupervised pre-training. Later on, it was found that unsupervised pre-training wasn’t necessary to classify images using deep learning, so long as enough training data are available. Even when enough training data are available, deep neural networks can always be thought of as hierarchical feature learning machines, where the output of each layer can be regarded as an abstraction of the raw input.

Similarly, kernel based alternatives to deep learning (e.g. Gaussian Process models, and frequentist kernel methods) are directly related to the idea of learning a feature space in which raw inputs will be easier to understand.

One would therefore expect features learning to play a similar role in the quest for AI in investment management. In finance, a feature map relates any (partial) state of the market and/or the economy to a lower dimensional output. Features of financial markets can be of two types: tradable or non-tradable. The outputs of tradable feature maps can be interpreted as investment decisions, whereas non-tradable features cannot be associated with a target portfolio of assets to hold. For instance, exchange-traded assets are a special type of tradable features, whereas nonfarm payrolls are a simple example of non-tradable features. The mapping associated exchange-traded asset X simply maps any market state at any time to the same output {X: 1} suggesting that one should allocate one’s entire capital to (buying) X.

The notion of tradable features, however, goes well beyond exchange-traded assets, and includes any trading strategy on or across any asset classes. To build an AI that finds trading strategies by itself, this is the level of abstraction at which we should be working; it might be beneficial to allow the AI to find opportunities across asset classes, and beyond well-known trading styles.

There is an infinite number of tradable feature maps, among which a large number might be of commercial interest. How should we then think about assessing whether any newly found tradable representation is worth considering, in the absence of a structural taxonomy? The traditional structural approach based on trading style, asset class, manager and so forth, isn’t applicable here. The protocol we propose in our paper would however work very well in this case, as it relies on information theory and empirical evidence rather than structural context.

Our Answer In A Nutshell

We propose that, for a new asset to be incrementally useful, it needs to satisfy four key criteria, two primary criteria and two secondary criteria.

Diversification Potential: The first primary criteria of incremental usefulness we propose is that, for an asset to be incrementally useful to an investment manager, its returns time series needs to sufficiently diversify the pool of assets the investment manager already has access to, as well as any reference set of risk factors or benchmarks he/she would like to avoid exposure to.

We provide an information-theoretic discussion on how to quantify the amount of diversification a new asset adds to a reference pool of asset and factors, and we provide a scalable algorithmic solution.

Active Alpha Potential: As second primary criteria of incremental usefulness, we propose that the returns time series of a candidate asset needs to be sufficiently predictable for it to be considered useful. The intuition here is that, if an asset has returns time series that is a white noise then, by definition, even the best active managers cannot do better at forecasting its moves than flipping a coin, and therefore cannot generate alpha out of trading the asset.

We provide an information-theoretic formalism for quantifying how predictable a time series of returns is, without presuming anything about what models can or should be used to forecast said time series of returns, and we provide a scalable algorithmic solution.

We consider these two criteria primary in that, if one of the two is not met, then the new asset can safely be ignored. Indeed, if the incremental diversification test fails, then the new asset can be replicated with existing assets, and is therefore redundant. If the time series of returns of the new asset isn’t sufficiently predictable, then no investment manager can generate alpha out of trading the new asset.

Additionally, we identify two criteria of incremental usefulness which we consider secondary in that they cater to specific types of investment managers. If the secondary criteria are not met and the primary criteria are, then the new asset will be useful to fewer investment managers, but it will be useful to some investment managers nonetheless.

Passive Alpha Potential: One of the two secondary criteria of incremental usefulness is the ability for investment managers to generate decent returns without changing their investment decisions too often (a.k.a. through passive investing). This criteria caters specifically to the needs of passive investment managers who might not have the expertise required to generate an active premium, or whose AUM is so large that it takes time to rebalance their portfolios without incurring excessive execution costs, and therefore they cannot do so very often.

We propose a scalable statistical hypothesis test which uses Bayesian Nonparametrics to assess whether an asset has as much passive alpha potential as U.S. blue chips, while properly accounting for multiple trials to mitigate backtest overfitting.

Potential To Lighten Tails: Finally, our fourth criteria of incremental usefulness caters to investment managers turning to new assets to reduce their risk concentration, in an attempt to mitigate their exposure to Black-Swan types of event.

We propose an approach for quantifying the impact a new asset might have on the tails of a reference pool of assets, and we suggest that, for the new asset to be incremental useful, it needs to make the tails of the pool of assets the investment manager currently trades lighter.

Coming Up Soon

We’ll be making the contributions and empirical findings in our yellow paper accessible to a broader audience through a multipart medium post.

The Case For Finance-First Machine Learning Research

Why Do This?

Our Approach To Solving Intelligence For Investment Management

Our First Paper

Coming Up Soon

Written by Yves-Laurent Kom Samo, PhD