Airbag: Crypto-Trading with Deep Learning

14 min readJun 26, 2019

Airbag.ai is a crypto trading bot oriented to simplicity and risk-mitigation, using the Binance API. We want to help people be more aware of their risk exposure, and reduce their volatility in the crypto market. In order to do that, we have been using Deep Learning to make trading decisions. There are many good theoretical articles and books written about using Deep Learning for trading. In this post, we explain what we have built in the real world.

When you trade, you need to look for a competitive edge. One thing we like about Deep Learning is that it can make “un-human” decisions, and might do something very different from what you would expect from a human or a deterministic bot based on traditional indicators. This can lead to unexpected game plays like the famous DeepMind second Go game move 37 against Lee Sedol. Airbag’s trading might go against human logic, and hopefully ends up being correct. That's Airbag's edge.

The bot is rewarded for optimizing Sharpe ratio (Return/Risk ratios) in the long run, but expect its performance to accept some ups and downs intraday. This is not a get-rich-quick scheme. It's actually the contrary.

Also, note that during bull-runs (like we seem to be in as of today), HODLing is often the best strategy. Of course you never know when the end of a bull market is, but we want to remind you of this fact before you consider letting Airbag help you. You might want to think of Airbag as a hedge, and simply part of a diversified portfolio.

Airbag has been running quietly for about 2 months. Today we have deployed our most significant improvements.

About Airbag’s AI technology:

The bot uses Deep Reinforcement Learning to make trading decisions, using a deep neural network. The bot does not know what “trading” means. We just give it pats on the back when it makes money, and so it learns to earn more pats. The bot does not know what a stop loss is, what a price is, what volume means, or what an order book is. It knows nothing about Bollinger bands or Ichimoku or Fibonacci. It just learns that more pats is better.
Some approaches use Deep Learning to predict prices, like: “hey, if I know the latest 100 prices, what’s the next one?”. In our experience, these approaches often lead to the best estimation of the next price to just be equal to the last known one, which provides little information. In general, you’d most likely not want a neural network to know the next price, but to know whether to buy or sell.
But it is tough to teach a neural network when it’s good to buy and when it’s good to sell: What level of detail should be used? Do we want to perform many frequent trades? Maybe we could train the network to trade every 10 minutes. What if the neural network can’t predict what will happen in exactly 10 minutes but it can, let’s say, know if the price will go up at some point during the next hour? This uncertainty makes it tough to simply use what is technically known as supervised learning (i.e., simply teaching the network when to buy and when to sell according to some inputs). Instead, this is a problem where reinforcement learning enables training an automated agent to automatically learn what action to take in an environment where rewards are emitted for every action taken (i.e., the pats on the back).
We train Airbag’s bot with millions of epochs (i.e., “training lessons”), which in practice are time steps where the bot can either buy or sell crypto. During the first 10,000 or so, it makes random trades, and starts to learn when it receives rewards and accumulates enough experience. At that point, we reduce the randomness in its actions, and give it more balance between exploration (some random actions) and exploitation (let the AI make decisions it believes will deserve a reward). Over time there is less randomness and more exploitation. Right now it takes us about 24 hours to run a single training of 6 markets with 2 years of data each on a GPU-powered server (this is not your average trading-view simulation resolution).
Some of the challenges we have encountered had to do with the bot memorizing the training set. Sometimes, with a neural network that is sufficiently complex, the AI starts to “remember” patterns. This is not bad news for most machine learning problems, where you want a model to be able to identify winning patterns, as it indicates that the input data has some predictive power. However, if those patterns do not work outside of the training data set, we would be overfitting, and it’s obviously not desirable. E.g., imagine the neural network learns to buy right before a price bump that wasn’t actually predictable, simply based on memorizing the inputs that are present at that time.
To validate our framework, we synthesized a scenario that looked like a sinusoid of ups and downs with some noise. A learning machine should be able to learn how to win tons of pats very quickly. Once it could beat this tutorial game, we would take it to more complex scenarios. Many problems and errors in building a solid framework were solved by testing it against the game, which saved us a lot of time later. For example, adding fees was difficult and testing here was easier.

Simple game for an AI who wants to learn to be a trader (yes, that’s xkcd plotting style)

Right now we use a Deep Neural Network of 10 layers. More layers enable a bot to learn more complex patterns, but having too many might make the network memorize all winning patterns in the training set and not generalize properly. To avoid this, layers are kept small to have fewer weights. Tricks like batch normalization between layers help training deep networks while avoiding vanishing gradients. Additionally, weight regularization, adding input gaussian noise, and using dropout layers help avoiding overfitting (i.e. memorization). These are standard practices when using Deep Learning.
We have worked with different Deep Reinforcement Learning techniques: Deep-Q-Learning, Asynchronous-Advantage-Actor-Critic, Asynchronous-Advantage-Actor-Critic with Gaussian outputs, and Deep Deterministic Policy Gradients, all of them with different variants. The first two produce discrete actions such as “Buy” or “Sell”, while the latter two enable using a continuous output like, e.g., knowing what’s the optimal percentage of crypto to keep for a specific market. While in principle the approaches with continuous output seem to be more powerful and accurate for this problem, we got the best results using Deep-Q-Learning (or simply, DQN). We will continue to revisit the different approaches over time as we aim to improve the bot’s performance.
In the case of DQN, we get an estimation of the future reward we’ll get for every possible action (either buying or selling) out of the neural network. The resulting policy of the bot could simply consist of applying the action (buy or sell) that produces the highest future reward, according to the neural network’s estimation. However, we’ve observed that it’s a better practice to balance the action between the two possible ones if they’re close enough. This means the bot will buy or sell smaller amounts of crypto in preparation for trend changes, or might stay with half of the balance in the market’s asset, half in the market’s currency. We’ve seen this approach reduces the risk and improves the long-term profit.
This means the bot is now learning to make trades by taking into account how much percentage of asset it would like to have. In the past, it would decide to go all-in into a decision (buy or sell), and then we forced it to make those decisions slowly (buy 50%, then if you are still sure by another 20%, then 10%, etc). Now it does it by itself. This is the starting point for the bot to create a diversified portfolio on its own.
The bot has learned to make Limit-buying trades. In the past, we had to force it to bid lower than it was prepared to bid (to avoid Market-buying). Now it does it by itself, which is cool to watch.
There are three main types of neural network layers that can be used to build a neural network: dense layers, convolutional layers, and recurrent layers. In dense layers, all input data is fully connected to all the output neurons. Convolutional layers take all inputs into account but weights are shared between the output neurons. Finally, recurrent layers keep a state that is typically useful for time-based input data (such as crypto prices). Our neural network mainly uses a combination of convolutional layers and dense layers (among other types of layers). We’ve got the biggest performance improvements thanks to the use of convolutional layers, which typically help to generalize learnings. When using dense layers exclusively, the neural network managed to learn patterns that did not generalize on the test period. We didn’t get good results with recurrent networks, in part because of the slow performance that forced us to use very simple topologies. We might explore them further in the future.
One time-consuming effort is that many times neural networks don’t learn at all. This was often caused by mistakes, such as not normalizing inputs, but also certain approaches failed to learn: using too few inputs, too much input gaussian noise or a too high drop probability in dropout layers (a way to make the network drunk so that it forgets data and uses more intuition) sometimes prevented the neural network from learning.
Dealing with fees is hard. This is because the moment the bot trades, it’s already losing money, and the AI needs to learn that buying can still be profitable once the future rewards are taken into account. This comes naturally out of the reinforcement learning approach, where an automated agent attempts to optimize the future reward, but it’s usually easier and faster to train networks when rewards are more immediate. This is one reason why using hour candles is easier than using minute candles, as the bot can quickly identify trading moments where the immediate reward is positive even after fees are applied. It’s also a compressed representation of the environment that reduces computational costs, speeding up learning.

2. Performance:

Below you can see how the neural network evolves and learns. The first iterations are very random and it starts to consistently make money with most trades once it reaches 800,000 iterations. You can observe that in the beginning, it trades randomly and is therefore stuck in -0.2% (paying 0.1% fees for buying and selling). Also, note that the neural network still makes bad trades even when it learned to make money, as exploring is required to keep learning. The orange line shows the moving average.

Something interesting to note is that, although variance reduces over time, which is valuable, the machine does not seem able to improve returns beyond a certain glass ceiling. This is related to the actual capacity of the neural network to estimate the value of actions, even after many training epochs.

In the charts below you can see the bot’s performance in unseen test data for different periods of USDT markets in the last years. The bot is trained with completely independent training data. We compare the bot's performance vs just buy-and-hold of the primary asset.

You can see how Airbag outperforms HODL in terms of aggregated return, reduced volatility and draw-down in most testing scenarios and for most assets trading pairs. When it losses against HODL (during strong bull seasons), it losses by a small amount, which is worth it given the reduced volatility (avoiding roller coasters) and increased Sharpe (Return/Risk ratio). When Bitcoin goes down, the bot maintains its value much better. From a drawdown perspective, the max drawdown of the bot is significantly lower than holding for all trading pairs and testing scenarios. Drawdown is useful to assess how low you can go when investing at any given time, which reduces the risk of timing your entry to the market.

(In bold letters the winning performance. Note that negative sharpe ratios don’t really make sense but we kept the numbers for visibility) The bot is good at optimizing for low risk and generalizes well for many different situations and trading pairs. It also beats HODL on pure return on aggregate of the different testing periods, and outperforms significantly in risk and drawdown.

With regards to dealing with performance weaknesses, it's clear that you cannot always win. In the simulations above, the network configuration beats HODL in most scenarios and in aggregate, but some scenarios, like test #1 and #7, are worse off. While it is unreasonable to think that our AI can win every time, we expect that some Airbag users would complain those days about how stupid the bot is. There are several things we have tried to mitigate losses in specific scenarios (e.g. very sharp ups or downs), including forcing more weight to the latest inputs, but we find those tweaks to overfit for specific situations and eventually underperform “in aggregate”, and we believe that we should reward the patient investor who looks at the long term.

3. Improvements and areas of work:

Different Neural Networks create different results. At the moment we have taken the “best” Neural Network across all tests and trading pairs (performance + generalization), but the “second best” Neural Network could be just as good as the first one, or even better, if it had been luckier with some non-technical movement. There is still an element of luck in all of this. However, most networks perform in a similar way, as the outputs are used to build a percentage of asset to buy/sell, instead of going all-in with the decision.
We have tried training a Neural Network that works for all markets. This allows us to have more data and more market situations for generalization. But we have tested individual networks for each trading pair and it seems to outperform the single network. It has pros and cons. In principle, we prefer a more general neural network because it is likely to cope better with changing conditions and changing risks, but specific neural networks seem to have better trading performance, so we will try to find the balance to get the best of both worlds.
We are giving the bot raw data. These are thousands of data points, volume, etc. as inputs every second. This is not different from how Deep Learning in face recognition takes raw pixels as input (instead of more traditional "features" (e.g. distance between eyes, etc)). This allows the AI to create its own indicators. However there is an argument that there might be value in feeding it with indicators that are already proven to be intelligent (e.g. EMAs, MACDs, BBs, etc.), for the NN to focus more on the trading generalization part and less on the “understanding what is coming in”. It might also help reduce the amount of inputs needed. In our first tests this has not worked as well as the raw data, but we will try more approaches as we believe there must be value in providing treated data to the AI, or at least it should reduce the computation power by giving the AI fewer things to figure out on its own.
We also want to feed more non-market data (e.g. blockchain transactions, live addresses, etc.) into the AI as inputs. The architecture we have built should be quite good at accepting those inputs.
We have not been able to make it work effectively in 1 minute candles. There are many reasons why this happened. With one minute candle it is more difficult for the bot to overcome the trading fees loss of 0.1% from the first minute it trades, so it won’t get short-term rewards. This is not the case in 1 hour, which is often enough for the market to go up by more than 0.1% within a single candle. Also, 1 hour candles enable providing a longer history of data to the network without overloading the network with too many inputs. We’ll research how to get the best of both worlds to enable the bot trade at 1-minute resolution while using a longer history of data as input. We are also testing with different ways of creating candles instead of traditional approaches. As an example, we already create double candles (one for selling orders, one for buying orders). We are also testing without candles at all and using something more probabilistic (price density) which we believe contains more information with fewer inputs.
We have been working on low resolution simulations infrastructure in order to more quickly reject bad performing bots, and run only the good ones with high resolution. In general, it’s all about having an environment that let us learn more quickly and get more conclusions more easily.

4. Upcoming features:

This new bot will run on several markets, but the next few days only BTC/USDT will be operative. The back-end is ready to operate more markets but we'll push features one at a time.
We mentioned in a previous post that we were trying to move to 1-minute candles but as explained, this is not ready yet. We aim to run 1-hour candles every minute, but there are some complexities we still need to solve.
The AI development is never finished. In fact, we had two significant improvements from the moment we started this blog post a week ago to the moment we have posted it. We keep updating the algos every day (right now we have a new simulation running which includes the significant value increase across all of crypto). In the future we will post shorter blog posts with more specific improvements we make to the algorithms, and might also share failed tests that did not work out.

That's it for now. We keep running simulations every day and we will continue to push improvements as they happen, with our focus on simplicity for all types of users and helping people manage their risk exposure more effectively.

If you are an early user, thank you for your patience in waiting for this update. If you just found out about us, you can still benefit from the last few days of Free Early Access on http://airbag.ai.

Neural Networks also love traditional whiteboards

You can also reach out to us on hello@airbag.ai

Airbag: Crypto-Trading with Deep Learning

Written by Airbag AI