# Bitcoin price forecasting with deep learning algorithms

*Disclaimer: All the information in this article including the algorithm was provided and published for educational purpose only, not a solicitation for investment nor investment advice. Any reliance you place on such information is therefore strictly at your own risk.*

Bitcoin is the first decentralized digital currency. This means it is not governed by any central bank or some other authority. This cryptocurrency was created in 2009 but it became extremely popular in 2017.

Some experts call bitcoin “the currency of the future” or even lead it as an example of the social revolution. The bitcoin price has increased several times during the 2017 year. At the same time, it is very volatile. Many economic entities are interested in tools for predicting the bitcoin prices. It is especially important for existing or potential investors and for government structures. The last needs to be ready to significant price movements to prepare a consistent economic policy. So, the demand for Bitcoin price prediction mechanism is high.

This notebook demonstrates the prediction of the bitcoin price by the neural network model. We are using 2-layers long short term memory (LSTM) as well as Gated Recurrent Unit (GRU) architecture of the Recurrent neural network (RNN). You can read more about these types of NN here:

- https://deeplearning4j.org/lstm.html
- http://colah.github.io/posts/2015-08-Understanding-LSTMs/
- https://arxiv.org/pdf/1412.3555v1.pdf
- http://karpathy.github.io/2015/05/21/rnn-effectiveness/

The dataset we are using is available here: Bitcoin Historical Data

The first thing we do is importing all the necessary python libraries.

Now we load the dataset in the memory and test it on the presence of the null values:

Out[2]:

`False`

We can see that there are not null values in the dataset. Now we want to preview the head of the dataset to know the structure of the data:

Out[3]:

We want to transform the data to get the average price grouped by the day and to see usual datetime format (not a timestamp as above).

Out[4]:

Out[5]:

We need to split our dataset because we want to train and test the model only on some chunk of the data. So, in the next cell, we are counting the necessary parameters for splitting (number of days between some dates). We want to train our model on the data from January 1, 2016, until August 21, 2017, and to test the model on the data from August 21, 2017, until October 20, 2017.

`654`

61

6

Now we are splitting our data into the train and test set:

`599 61`

# Exploratory Data Analysis

We want to estimate some parameters of our data because this can be useful in the further model designing. The first important thing when forecasting time series is to check if the data is stationary. This means that our data is influenced by such factors as trend or seasonality.

In the next cell, we concatenate train and test data to make analysis and transformations simultaneously.

In the next couple of cells, we perform a seasonal decomposition of the data to estimate its trend and seasonality. You can see the actual price movements on the plot below (“observed”) as well as the trend and seasonality in our data.

The next thing we do is the examination of the autocorrelation. It is it is the similarity between observations as a function of the time lag between them. It is important for finding repeating patterns in the data.

Now we need to recover our df_train and df_test datasets:

# Data preparation

We need to prepare our dataset according to the requirements of the model, as well as to split the dataset into train and test parts. In the next cell, we define a function which creates X inputs and Y labels for our model. In the sequential forecasting, we predict the future value based on some previous and current values. So, our Y label is the value from the next (future) point of time while the X inputs are one or several values from the past. The amount of these values we can set by tuning the parameter *look_back* in our function. If we set it to 1, this means that we predict current value *t* based on the previous value *(t-1)*.

Now we perform final data preparation:

- Reshape the train and test datasets according to the requirements of the model.
- Scale the dataset by using the MinMaxScaler because LSTM models are scale sensitive.
- Apply our
*create_lookback*function.

We have tried to train several different models and compare their results. You can find them in the table below. These results were obtained using the following hardware: 4-core CPU, 16 GB RAM and by training each model ten times with different random states. As we can see, the best result is obtained by using the 2-stacked LSTM. Nevertheless, this model is much slower then GRU or 1-layer LSTM. The Autoregressive integrated moving average model (ARIMA) shows the worst results both in performance and training time. We can also see, that the 1-layer LSTM model is not capable to recognize patterns in the data so we need more complex models. We are going to demonstrate 2-layers LSTM neural network in more detail.

# Training 2-layers LSTM Neural Network

Eventually, we can build and train our model. We use Keras framework for deep learning. Our model consists of two stacked LSTM layers with 256 units each and the densely connected output layer with one neuron. We are using Adam optimizer and MSE as a loss. Also, we use an early stopping if the result doesn’t improve during 20 training iterations (epochs). We performed several experiments and found that the optimal number of epochs and butch_size is 100 and 16 respectively. Also, it is important to set *shuffle=False* because we don’t want to shuffle time series data.

`Train on 599 samples, validate on 59 samples`

Epoch 1/100

599/599 [==============================] - 2s 3ms/step - loss: 0.0074 - val_loss: 0.1025

Epoch 2/100

599/599 [==============================] - 1s 2ms/step - loss: 0.0644 - val_loss: 0.2629

Epoch 3/100

599/599 [==============================] - 1s 2ms/step - loss: 0.0107 - val_loss: 0.0181

Epoch 4/100

599/599 [==============================] - 1s 2ms/step - loss: 0.0019 - val_loss: 0.0070

Epoch 5/100

599/599 [==============================] - 1s 2ms/step - loss: 5.3863e-04 - val_loss: 0.0017

Epoch 6/100

599/599 [==============================] - 1s 2ms/step - loss: 4.1020e-04 - val_loss: 0.0027

Epoch 7/100

599/599 [==============================] - 1s 2ms/step - loss: 2.1977e-04 - val_loss: 0.0022

Epoch 8/100

599/599 [==============================] - 1s 2ms/step - loss: 2.5272e-04 - val_loss: 0.0022

Epoch 9/100

599/599 [==============================] - 1s 2ms/step - loss: 2.4554e-04 - val_loss: 0.0020

Epoch 10/100

599/599 [==============================] - 1s 2ms/step - loss: 2.6365e-04 - val_loss: 0.0019

Epoch 11/100

599/599 [==============================] - 1s 2ms/step - loss: 2.5525e-04 - val_loss: 0.0018

Epoch 12/100

599/599 [==============================] - 1s 2ms/step - loss: 2.6679e-04 - val_loss: 0.0018

Epoch 13/100

599/599 [==============================] - 1s 2ms/step - loss: 2.5337e-04 - val_loss: 0.0017

Epoch 14/100

599/599 [==============================] - 1s 2ms/step - loss: 2.5953e-04 - val_loss: 0.0017

Epoch 15/100

599/599 [==============================] - 1s 2ms/step - loss: 2.4082e-04 - val_loss: 0.0016

Epoch 16/100

599/599 [==============================] - 1s 2ms/step - loss: 2.4312e-04 - val_loss: 0.0016

Epoch 17/100

599/599 [==============================] - 1s 2ms/step - loss: 2.2189e-04 - val_loss: 0.0016

Epoch 18/100

599/599 [==============================] - 1s 2ms/step - loss: 2.2231e-04 - val_loss: 0.0016

Epoch 19/100

599/599 [==============================] - 1s 2ms/step - loss: 2.0289e-04 - val_loss: 0.0016

Epoch 20/100

599/599 [==============================] - 1s 2ms/step - loss: 2.0255e-04 - val_loss: 0.0016

Epoch 21/100

599/599 [==============================] - 1s 2ms/step - loss: 1.8815e-04 - val_loss: 0.0016

Epoch 22/100

599/599 [==============================] - 1s 2ms/step - loss: 1.8700e-04 - val_loss: 0.0016

Epoch 23/100

599/599 [==============================] - 1s 2ms/step - loss: 1.7834e-04 - val_loss: 0.0016

Epoch 24/100

599/599 [==============================] - 1s 2ms/step - loss: 1.7617e-04 - val_loss: 0.0016

Epoch 25/100

599/599 [==============================] - 1s 2ms/step - loss: 1.7182e-04 - val_loss: 0.0016

Epoch 26/100

599/599 [==============================] - 1s 2ms/step - loss: 1.6926e-04 - val_loss: 0.0016

Epoch 27/100

599/599 [==============================] - 1s 2ms/step - loss: 1.6698e-04 - val_loss: 0.0016

Epoch 28/100

599/599 [==============================] - 1s 2ms/step - loss: 1.6496e-04 - val_loss: 0.0016

Epoch 29/100

599/599 [==============================] - 1s 2ms/step - loss: 1.6336e-04 - val_loss: 0.0016

Epoch 30/100

599/599 [==============================] - 1s 2ms/step - loss: 1.6200e-04 - val_loss: 0.0016

Epoch 31/100

599/599 [==============================] - 1s 2ms/step - loss: 1.6081e-04 - val_loss: 0.0016

Epoch 32/100

599/599 [==============================] - 1s 2ms/step - loss: 1.5982e-04 - val_loss: 0.0016

Epoch 33/100

599/599 [==============================] - 1s 2ms/step - loss: 1.5899e-04 - val_loss: 0.0016

Epoch 34/100

599/599 [==============================] - 1s 2ms/step - loss: 1.5830e-04 - val_loss: 0.0016

Epoch 35/100

599/599 [==============================] - 1s 2ms/step - loss: 1.5775e-04 - val_loss: 0.0016

Epoch 36/100

599/599 [==============================] - 1s 2ms/step - loss: 1.5735e-04 - val_loss: 0.0016

Epoch 00036: early stopping

We have trained our model. You can see that it has good performance even after several iterations. On the plot above, we compare the Train and Test loss on each iteration of the training process. We can see, that after some iterations the train and test loss became very similar, which is a good sign (this means we are not overfitting the train set). Below, we use our model to predict labels for the test set. Then we inverse original scale of our data. You can see a comparison of true and predicted labels on the chart below. It looks like our model gives good results (lines are very similar)!

Below we calculated the root mean squared error (RMSE). The meaning of this indicator is what is the average distance between predicted points on the test set and the actual (true) labels. In other words, it shows the extent of our error. The less this number, the better. We can see, that our model’s RMSE is not very big (consider that the price in our data set is in thousands of USD, and we are mistaken only by tens of USD).

`Test RMSE: 18.724`

Below we extract the convenient format of dates and plot the same chart as above, but with these dates on the X-axis.

The results we obtained can be improved. For this, we will try the following thing. We get 10 different train and test datasets and train the model on each train test and then test it on the corresponding test dataset. After this, we calculate the RMSE for each pair of train/test dataset. Then we find an average RMSE on all these datasets and subtract this value from each prediction, obtained from our current model. This can improve the performance.

We want to demonstrate this approach on the GRU model just to show different models.

First what we do is to define three functions, which will be acting as subsequent elements in the pipeline. Basically, these functions are very similar to what we do when preparing data and training our previous 2-layers LSTM model.

The function below uses all three previous functions to build workflow of calculations and return RMSE and predictions of the model.

Now we can run a *workflow* function to calculate RMSE for a single GRU model:

`Test GRU model RMSE: 32.764`

Now we can run a *cross_validate* function to trigger calculations:

`Iteration: 1`

Test RMSE: 9.233

Iteration: 2

Test RMSE: 16.251

Iteration: 3

Test RMSE: 12.337

Iteration: 4

Test RMSE: 38.239

Iteration: 5

Test RMSE: 49.088

Iteration: 6

Test RMSE: 3.908

Iteration: 7

Test RMSE: 7.206

Iteration: 8

Test RMSE: 38.290

Iteration: 9

Test RMSE: 4.388

Iteration: 10

Test RMSE: 9.347

Average RMSE: 18.8287473079

RMSE list: [9.233330864072622, 16.25122406236244, 12.3374370718704, 38.2387143974303, 49.08764082707623, 3.908100289970251, 7.206358361324355, 38.29018303096499, 4.387561412580847, 9.346922761677483]

Next, we subtract the mean RMSE from each prediction our model produced. Then, we recalculate the RMSE for the model.

`Test GRU model RMSE_new: 14.223`

We can see, that the RMSE has been reduced significantly. This means that our experiment was successful. On the plot below you can see the difference between the predicted and true test labels.

Let’s calculate a symmetric mean absolute percentage error ( SMAPE). It will show how good our predictions are in percentage. We define function *symmetric_mean_absolute_percentage_error*, which will perform all necessary calculations.

`Test SMAPE (percentage): 0.304`

We can see that our SMAPE is less than 1%, which means that the error of our model is very small.

In this notebook, we trained the 2-layers Long Short Term Memory Neural Network as well as Gated Recurrent Unit Neural Network using Bitcoin Historical Data. These models can be used to predict future price movements of bitcoin. The performance of the models is quite good. On average, both models considered here, makes an error measured only in tens of USD.

Sources: