Why artificial intelligence isn’t intelligent yet: Reinforcement learning isn’t a silver bullet for crypto trading

thomas snell
Aequicens
Published in
6 min readSep 14, 2018

A fascinating paper caused a stir in the crypto trading world a couple of years ago “Cryptocurrency Portfolio Management with Deep Reinforcement Learning” https://arxiv.org/pdf/1612.01277.pdf. This paper appears to be the holy grail of crypto-trading an automated system that can learn to trade profitably on its own, with only historical price as an input. They make the eyebrow-raising claim of 10-fold returns in 1.8-month periods. While there is some underlying growth in the market in the 1.8-month periods they test, they do consistently beat buy and hold, even after fees. To convince any doubters, they even have a GitHub repo as well giving access to the current version of their code https://github.com/ZhengyaoJiang/PGPortfolio.

Image after Jiang et al 2016.

So how does it work? The system relies on deep reinforcement learning, it takes random but continuous slices of the open/high/low/close (OHLC) price data for a series of cryptocurrencies, normalises and passes that to a convolutional neural network. The neural network is trained to output a vector representing portfolio allocations for each coin. At any time, the system is given further historical price data so that it can see 30–50 steps back in time. The network should assign and reallocate a portfolio to return the highest overall profit. This profit should be relatively stable and low volatility because the input time slices are between 50–100 time steps, the method it finds to make a profit has to work on a relatively short time scale (50 hours at most).

Image after Jiang et al. 2016.

To enable the network to know when it is performing well at this task it is given a policy gradient, rewarding it when it makes a profit after fees; it aims to maximise this over iterations in a way that yields a profit and will work for general input. If I had to guess how the fitted neural network works in practice, I would say it seems to rely heavily on some modified mean-reversion strategy.

So, how did we first come across this? My first task as Chief Data Scientist (and later CTO) of Aequicens was to implement this same system and improve it to our needs. We can confirm that for all the data they would have had access to at the time of writing their results are correct. However, we did start to notice issues with it when we ran tests which finished after February 2018, at this point Bitcoin’s long bull market had ended and we were in bear territory. Indeed, this was nothing new; people had also started to notice it on the GitHub too.

What went wrong? To be fair to both the authors this is an entirely transparent academic paper where they show their methods, so there is nothing untoward going on here. As Jiang, the author says, “So once there are fundamental changes in the market, making the price movement patterns varied, the performance of the agent will drop.” They show a technique that works in a limited set of market conditions, which is understandable as the data available at the time of writing indeed only contained these market conditions.

Why is this so interesting? Well, Jiang and Liang are incredible researchers and highly intelligent, they are unbelievably competent; they had a system that worked fantastically in backtest until it didn’t. This is the real crux, it is not sufficient to have a system that works in backtest, even for data it hasn’t seen (for a market as immature as crypto.)

So, you can say well let’s throw in some long-term data relating to market conditions and train it and get it working again, you could do that, and it will work until the market changes in some other way and it doesn’t again. You see you don’t have an automated system if every few months you must throw another input at it to keep accuracy up, particularly if the data you choose relies on a deeper human understanding of the market. The underlying issue here is that your AI isn’t all that intelligent if it continually draws on human knowledge to keep it working. You see, AI is good at nonlinear mapping, saying if I get this input I should do this, but it is data hungry, it needs to see a lot of examples before it can generalise. The examples it sees must run the full gamut of behaviours you want it to handle. Humans aren’t like this, in situations of sparse data we can draw from multiple models of the world and make an educated guess, you can see a chair that looks nothing like any other chair you have ever seen and know it is a chair, by drawing on subconscious models of human behaviour for instance. AI can’t do this yet; it needs a similar example to extrapolate from.

As an aside, they use two years of data as an input, for 30-minute candles that is about 36000 points so that they can train reliably a network of around 200 weights (definitely less than 1000), to add these further inputs, they will have to increase the data set size too, which doesn’t necessarily help them. For instance, the market conditions might make the model switch between lower models of the same complexity, which would vastly increase the number of inputs required. Instead of two years total, it might become two years per market condition.

When we on the management team at Aequicens saw this and the possibility for unexpected financial loss to our users, we knew we had to try something different. The only way to ameliorate against situations like this is to change the underlying strategy of the company. This realization was the day that Aequicens became a knowledge company and not a technology company, we balance AI and expertise, we use AI in a way where we can put constraints on what it can do and never without understanding what we hope it to fit. We use AI when it is useful and not just impressive. Never without being able to say what we are doing and why. We accepted that AI is not intelligent yet, we use it to fit better models but not to replace our understanding.

This scientific approach has been lacking in the crypto-trading space for a long time; backtesting is a necessary but not sufficient condition, you also have to understand what your assumptions are and how they are limiting you.

Prototype version of our next gen platform coming mid-October.

Visit us on www.aequicens.com and subscribe to our newsletter.

Contact us on info@aequicens.com

Follow us on:

Twitter

Facebook

Linkedin

Instagram

Join us on:

Discord

--

--

thomas snell
Aequicens

Chief Data Scientist at Aequicens. I have previously been a Physics Student, Biophysicist and Geophysicist. General problem solver and mathematical coder.