Cryptoassets and Investments: Deep Learning Approach

8 min readFeb 25, 2018

This is the second part of the article about investment strategies applied to the market of crypto assets.

With the breakthrough of Deep Neural Networks and Reinforcement Learning we can deeply explore many entrenched problems at the financial markets which haven’t been reachable till now.

The investors’ interest in topic is growing rapidly and here are some intriguing opinions about using Deep Learning on financial markets:

Is anyone making money by using deep learning in trading?

Answer (1 of 13): Yes. Absolutely yes. I have presented in a few recent industry conferences about how Deep Learning…

www.quora.com

There are existing a lot of Deep Learning approaches to the financial market trading. However many of them try to predict price movements or trends (Heaton et al., 2016; Niaki and Hoseinzade, 2013; Freitas et al., 2009). With history prices of all assets as its input, a neural network can output a predicted vector of asset prices for the next period.The performance of these price-prediction-based algorithms however highly depend on the degree of prediction accuracy, but it turns out that future market prices are too difficult to predict. High predictability is either false or not scalable and talking about markets with high volatility we can say that using Deep Learning for predicting market prices is not a good idea at all. Furthermore, price predictions are not market actions — converting them into actions which will maximize the profits requires additional layer of logic and that is where Reinforcement Learning can help us out.

“Nobody has a crystal ball for seeing the future.”

According to KDnuggets “Reinforcement Learning is concerned with the problem of finding suitable actions to take in a given situation in order to maximize a reward.”

In other words, we cannot predict future market prices but we can teach an agent to take the actions which maximize reward.

Deep Reinforcement Learning is lately drawing much attention due to its remarkable achievements in playing video games and board games. DeepMind built an AI which could play Atari games from the 70s or beat a human being in Go. Even games with incomplete or imperfect information such as Poker or DOTA2 are feasible for such kind of algorithms.

So it looks like financial markets are good candidates for Reinforcement Learning approach. Some attempts were made for traditional financial institutions such as Agent Inspired Trading, Reinforcement Learning for Trading, Reinforcement Learning For Automated Trading.

JPMorgan announced they will soon be using a first-of-its-kind robot to execute trades with its global equities after European trial of the bank’s new artificial intelligence (AI) program showed it was much more efficient than traditional methods of buying and selling.

Let’s take closer look to the idea of the Reinforcement Learning and how it can help the investors to maximize their profits.

Reinforcement Learning

Agent

You can think of the agent as a human trader who opens the GUI of an exchange and makes trading decision based on the current state of the exchange and his or her account.

Environment

We can see exchange as an environment for our agent. Important thing to note is that there are many other agents, both human and algorithmic market players, trading on the same exchange.

State

We can see State as an aggregation of all exchange events within time range, let’s say 30 minutes.

When we observe a new state it will be the response of the market environment, which includes the response of the other agents. Thus, from the perspective of our agent, these agents are also part of the environment. They’re not something we can control. However, by putting other agents together into some big complex environment we lose the ability to explicitly model them.

Reinforcement Learning problem can be formulated as a Markov Decision Process (MDP). We have an agent acting in an environment. Each time step t the agent receives as the input the current state s, takes an action a , and receives a reward r+1 and the next state s(t+1). The agent chooses the action based on some policy. It is our goal to find a policy that maximizes the cumulative reward over some finite or infinite time horizon.

Introduction to Learning to Trade with Reinforcement Learning

The academic Deep Learning research community has largely stayed away from the financial markets. Maybe that's because…

www.wildml.com

Deep Reinforcement Learning: Pong from Pixels

Musings of a Computer Scientist.

karpathy.github.io

It is easy to teach our Agent to make decisions at the one stock market but this is not a good idea. “Don’t put all your eggs in one basket”.

Thus the Agent is the software program/portfolio manager performing trading Actions in the Environment and it is impossible for the Agent to get total information of the state of such a large and complex environment as crypto assets market. Nonetheless all relevant information is believed in the philosophy of technical traders to be reflected in the prices of the assets which are publicly available to the Agent. According to this point of view, an environmental state can be roughly represented by the prices of all orders throughout the market’s history up to the moment where the State is at.

Model

Historic price data which looks like a tensor (some sort of matrix) is fed into Neural Network to generate the output of the portfolio vector . Every tensor is time series market data.

Recalculating market State every step of the time we can see these steps as a time-series data, and RNN-LSTM Neural Networks topologies can be used.

Let’s say x are our price vectors and y — portfolio weights

However this could be a good solution, another possibility can be used as well.

If we have our market state data as tensors then we can use Convolution Networks same way as we use them for image recognition.

Instead of image we have a matrix of market prices.

Launchpad used some derived market values (Moving Averages, Volatility Levels and Changes and so on) for their approach.

Let’s call that structure(Neural Network Model) Evaluator which evaluates market state and gives some output, portfolio weights. It is called policy — map of actions or portfolio weights. Ultimate goal of Reinforcement Learning is to learn a policy which helps to take the wisest action in any given state.

JPMorgan announced of usage Q-Learning in their approach, same was used by DeepMind for the models playing Atari games.

But policy networks take easier approach — just input the state and out comes an action. Then you simply find out which actions worked well and increase their probability. This may sound exceedingly stupid but this approach cracked AlphaGo!

We used Reinforcement Learning framework specially designed for the task to manage portfolios proposed by Z. Jiang, D. Xu, J. Liang, A Deep Reinforcement Learning Framework for the Financial Portfolio Management Problem16 Jul 2017.

We used Convolution Neural Network Evaluator, which showed better results comparing with simple RNN, or LSTM models.

Experiments

Same dataset from Poloniex exchange were taken as in the first part of the article.

Agent was trained on the historical prices of the 11 most-volumed non-cash assets which were preselected for the portfolio.

And we still impose two hypotheses which were described in the first part of the article:

Zero slippage: The liquidity of all market assets is high enough that each trade can be carried out immediately at the last price when order is placed.
Zero market impact: The capital invested by the software trading agent is so insignificant that is has no influence on the market.

One reason for selecting top-volumed cryptocurrencies (simply called coins below) is that bigger volume implies better market liquidity of an asset. In turn it means the market condition is closer to Hypothesis 1. Higher volumes also suggest that the investment can have less influence on the market, establishing an environment closer to the Hypothesis 2.

We do not take in account transaction costs for the sake of simplicity, but we add them later as well. Time-step is set to 30 minutes, so every 30 minutes we make a new evaluation.

We can see portfolio weights as crypto assets ranking on specific exchange. Different metrics are used to measure the performance of a particular portfolio selection strategy. The most direct measurement of how successful is a portfolio management over a timespan is the accumulative portfolio value (APV). APVs here are measured in the unit of their initial values, or equivalently p0 = 1 and thus pt = pt/p0

All data is gathered with 30 min granularity as a pairs to BTC (ETH/BTC, LTC/BTC, USDT/BTC) . All rebalancing decisions are made every 30 minutes and transactions costs are set to zero which is considered as ideal conditions. But if we can prove by math, some portfolio outperforms market in ideal conditions then it is possible to find optimal rebalancing periods until transactions fee are lower than cumulative returns. Cumulative return means return from one BTC invested in ideal conditions.

Results

Back test is started from 2017–10–19 20:00:00 to 2018–01–01 00:00:00

CRP — Constant rebalanced portfolio = use fixed weights all the time. Uniform weights are commonly used as a benchmark.

OLMAR — On-Line Portfolio Selection with Moving Average Reversion Shortly this approach finds best momentum (MAR) strategies among all possibilities by applying powerful online learning techniques. 2012 academic publication https://icml.cc/2012/papers/168.pdf.

NNAGENT — Our Reinforcement Learning agent.

We can see that nnagent significantly outperforms other strategies. But let’s take a closer look.

We can see that NNAGENT has an overall better drawdown profile. Same results were achieved in different RL experiments which show that RL agent acts better on drawdowns.

Same is with the Sharpe Ratio which is higher than with other strategies.

We can see that even CRP used for benchmark has more positive periods and less negative than NNAGENT, the actions of NNAGENT have more effective impact on overall performance as well.

We continue experiments using different hyperparameters of Neural Networks and different stock exchanges.

Summary

Reinforcement Learning can be applied to algorithmic trading producing a strategy that is both unique and outperforms common baseline techniques.

In the nearest future we want to introduce a new service based on our researches which will help investors make right decisions on different crypto exchanges.

Tough for humans, easy for bots.

Great thanks to Z. Jiang, D. Xu, J. Liang.

Fabio D Freitas, Alberto F De Souza, and Ailson R de Almeida. Prediction-based portfolio optimization model using neural networks. Neurocomputing, 72(10):2155–2170, 2009.
J. B. Heaton, N. G. Polson, and Jan Hendrik Witte. Deep learning for finance: deep portfolios. Applied Stochastic Models in Business and Industry, 2016. ISSN 1526–4025. doi: 10.1002/ASMB.2209. URL http://www.ssrn.com/abstract=2838013.
Seyed Taghi Akhavan Niaki and Saeid Hoseinzade. Forecasting S&P 500 index using artificial neural networks and design of experiments. Journal of Industrial Engineering International, 9(1):1, 2013. ISSN 2251–712X. doi: 10.1186/2251–712X-9–1. URL http://www.jiei-tsb.com/content/9/1/1.