How to predict Bitcoin and Ethereum price with RNN-LSTM in Keras

2017 was a great year for Artificial Intelligence and Cryptocurrency. There have been many researches and breakthroughs in AI industry and definitely AI is one of the trendiest technology these days and even more in the future. One thing that I personally did not see it coming to the mainstream in 2017 was cryptocurrencies. It was a massive bull run with some insane returns on investment on cryptocurrencies such as Bitcoin, Ethereum, Litecoin, Ripple and so on.

I have started diving into details of Machine Learning techniques in early 2017 and as many other ML experts and enthusiasts, applying these technics into cryptocurrency market is very tempting. The interesting part is variety of ways and methods that ML and Deep Learning models can be used in stock market or in our case crypto market.

I found building a single point prediction model could be a great start point to explore deep learning with time-series such as price data. Off course it doesn’t end here, there is always room for improvement and adding more input data. My favorite is to use Deep Reinforcement Learning for automated trading agents. Which I’m working on it currently, however, learning to use LSTM networks and building a good prediction model is going to be the first step.


Pre-requisites and Development Environment

I assume you have some coding skills in Python and basic knowledge of Machine Learning, in particular Deep Learning. If not, please check out this post for a quick overview.

My choice for the development environment is google’s Colab. I chose Colab because of simplicity of the environment setup and the access to free GPU which makes a big difference in training time. Here is a tutorial for how to setup and use colab in your Google Drive. You can find my full Colab Notebook here and here on GitHub.

In case you wish to setup AWS environment, I also wrote a tutorial a while ago on how to setup AWS instance with Docker on GPU. Here is the link.

I will be using Keras library with TensorFlow backend to build the model and train on historical data.

What is Recurrent Neural Network?

To explain Recurrent Neural Networks, let’s first go back to a simple perceptron network with one hidden layer. Such a network will do an OK job for simple classification problems. By adding more hidden layers, the network will be capable of inferencing more complex patterns in our input data and increase the accuracy of the predictions. However, these types of networks are good for tasks that are independent of the history where the temporal order does not matter. For example image classification which the prior sample in the training set does not affect the next sample. In another word, perceptrons have no memory of the past. This is the same for Convolutional Neural Networks which are more complicated architecture of perceptrons designed for image recognition.

A simple perceptron neural network with one hidden layer and two outputs

RNNs are type of neural network that solve the problem of past memory for perceptrons by looping in the hidden state of the previous time-step into the network in conjunction with the current input sample.

Let me elaborate more on this, at every time-step when a new sample comes in, the network will forget what was the sample in the previous step, one way to solve this problem for time-series is to feed-in the previous input sample with the current sample so our network can have an idea what happened previously, however, in this way we won’t be able to capture the full history of the time-series before the previous step. A better approach will be to take resulting hidden layer (the weight matrix of the hidden layer) from the previous input sample and feed it into our network alongside the current input sample.

I look at the hidden layer’s weight matrix as the state of the mind for the network, if we look at it this way, the hidden layer has already captured the past in the form of weight distribution over all of its neurons which is much richer representation of the past for our network. Below image from colah’s blog will provide a good visualization of what happens in a RNN.

when Xt comes in, the hidden state from Xt-1 will be concatenated with Xt and become the input for the Network at time t. This process will be repeated for every sample in a time-series.

I tried to keep it as simple as possible. There are great resources if you’d like to dive deeper into RNNs which I strongly recommend you to do so. Here are some good resources about RNNs:

What is Long-Short Term Memory?

Before I tell you what is LSTM let me tell you about the biggest problem with RNNs. So far everything looks good about RNNs until we train it via back-propagation. As the gradient of our training samples gets propagated backward through our network, it gets weaker and weaker, by the time it gets to those neurons that represent older data points in our time-series it has no juice to adjust them properly. This problem is called Vanishing Gradient. A LSTM cell is a type of RNN which stores important information about the past and forgets the unimportant pieces. In this way, when gradient back-propagates, it won’t be consumed by unnecessary information.

Think about yourself when you read a book, often after finishing a chapter, although you can remember what was the previous chapter about, you may not be able to remember all the important points about it. One way to solve this problem, we highlight and take notes of those points which are important to remember and disregard the explanations and and fillers which are not critical to the subject. Understanding LSTM Networks by Christopher Olah is a great resource for an in-depth understanding of LSTMs.

Let’s Code

First things first, let’s import libraries which we need for our project.

import gc
import datetime
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

import keras
from keras.models import Sequential
from keras.layers import Activation, Dense
from keras.layers import LSTM
from keras.layers import Dropout

Historical Data

I have used historical data from www.coinmarketcap.com, you may use any other source, but I found it very simple and straight forward for this post. We will be getting the daily price data for Bitcoin. However, in the colab notebook you will see the code for Ethereum as well. I wrote the code in a way to be reusable for other cryptocurrencies.

Now let’s write a function for getting the market data.

Now let’s get the data for Bitcoin and load it to the variable ‘‘‘btc_data’’’ and show the first five rows of our data.

btc_data = get_market_data("bitcoin", tag='BTC')
btc_data.head()
market data for BTC

Let’s take a look at Bitcoin’s ‘Close’ price and its daily volume over time

show_plot(btc_data, tag='BTC')

Data Prepration

A big part of building any Deep Learning model is to prepare our data to be consumed by neural network for training or prediction. This step is called Pre-Processing which could include multiple steps depending on the type of data we are using. In our case we will be doing below tasks as a part of our preprocessing:

  • Data cleaning, filling up missing data points
  • Merging multiple data channels. Bitcoin and Ethereum in one Data Frame
  • Remove unnecessary columns
  • Sort our data in ascending order based on date
  • Split the data for Training and Test
  • Create input samples and normalize them between 0 and 1
  • Create target outputs for the training and test sets and normalize them between 0 and 1
  • Convert our data to numpy array to be consumed by our model

The data cleaning part has been already done in our first function where we loaded the data. Below you can find necessary functions to do above tasks:

Here is the code for plotting functions and creating date labels

Here we will be calling above functions in order to create the final datasets for our model.

train_set = train_set.drop('Date', 1)
test_set = test_set.drop('Date', 1)
X_train = create_inputs(train_set)
Y_train_btc = create_outputs(train_set, coin='BTC')
X_test = create_inputs(test_set)
Y_test_btc = create_outputs(test_set, coin='BTC')
Y_train_eth = create_outputs(train_set, coin='ETH')
Y_test_eth = create_outputs(test_set, coin='ETH')
X_train, X_test = to_array(X_train), to_array(X_test)

Now let’s build our LSTM-RNN model. In this model I have used 3 layers of LSTM with 512 neurons per layer followed by 0.25 Dropout after each LSTM layer to prevent over-fitting and finally a Dense layer to produce our outputs.

TensorFlow computation graph exported from TensorBoard

I have used ‘tanh’ for my activation function and Mean Squared Error for my loss and ‘adam’ as my optimizer. I’d suggest to play around with different choices of these functions and see how they affect the performance of your model.

Here is our model summary:

I have decleared my hyperparameters for the full code in the begining of the code to make changes for different variation easier from one place. Here are my hyperparameters:

neurons = 512                 
activation_function = 'tanh'
loss = 'mse'
optimizer="adam"
dropout = 0.25
batch_size = 12
epochs = 53
window_len = 7
training_size = 0.8
merge_date = '2016-01-01'

Now it’s time for training our model on the data that we have collected

# clean up the memory
gc.collect()
# random seed for reproducibility
np.random.seed(202)
# initialise model architecture
btc_model = build_model(X_train, output_size=1, neurons=neurons)
# train model on data
btc_history = btc_model.fit(X_train, Y_train_btc, epochs=epochs, batch_size=batch_size, verbose=1, validation_data=(X_test, Y_test_btc), shuffle=False)

The above code may take a while to finish depending on your computing power and when it is done, your trained model is done too :)

Let’s check out the results for BTC and ETH

Not bad for starter :)

There is a great blog post written by David Sheehan where I first learned about how to use LSTMs for cryptocurrency price prediction. Here is the link to his blog.

Update 1:

  • Removed add_volatility as it did not affect the performance
  • Created plot_results() and added date labels to the figure
  • replaced MAE with MSE as loss function
  • Increased batch size from 64 to 128
  • Added TF computational graph diagram

Update 2:

  • I could slightly improve the performance (reducing the loss) by reducing the window length from 7 days to 3 days and number of neurons to 1024.

I hope you enjoyed this post!