Bitcoin forecasts with stateless LSTM in Tensorflow

Alberto Prospero
Coinmonks
9 min readJul 22, 2018

--

Introduction

#Motivation Cryptocurrencies are one of the most disruptive technologies in recent years. Nowadays, the total capitalization of the market reached $278 bilions and it was calculated that 24 million people invested in Bitcoin (with Bitcoin I mean the whole market, I use bitcoin to refer to the single cryptocurrency). Even if the market is not stable and quite speculative at the moment, data science techniques can provide valuable information to investors who want to predict future behaviours and trends. Because of noise, volatility and the lack of important indicators which probably cannot be found on-line (or at least in public and free open sources), we cannot set up an extremely accurate forecasting environment. However data analyses techniques can provide some information to investors up to a certain extent, and future stock market prices might be evaluated at more precise levels. Here, I provide a complete Python framework to perform predictions. Even if I don’t expect forecasts to be really precise, this work might be a starting point for further developments and new improvements.

#Abstract In this article I explain how to predict next bitcoin stock prices by using multi-output and multi-input stateless recurrent neural networks. After having downloaded some data for free, a machine learning model was built to predict both next day and next month stock prices.

#Prerequisites I assume you are familiar with Python, recurrent neural networks and Tensorflow. If this is not the case, don’t worry: literally is plenty of good stuff out there (take a look at the great explanation from Karpathy here and the course from NG here, here instead some tutorials to learn TF). I will also provide some explanation on some of the technical details during the trip! The complete code source can be found at my GitHub page, so take a look! Now, let’s code!

Coding — Preprocessing

#Data You need to provide the network all the data you believe are important in identifying bitcoin trends. This is step is crucial: better the data you give the network and higher their relevance for the problem, better the predictions you will generate! In this regard, I downloaded some of the Bitcoin related data by using pandas datareader library. I used quandl marketplace for downloading both litcoin and bitcoin cryptocurrencies.

#Plotting Next step is plotting data to have a sense of what is going on. For this aim, in a Jupyter Notebook you can use Plotly, a great resource for visualizing data. Plotly allows you to zoom on the plot and move around in a complete javascript fashioned style.

#RNNs Now we are ready to define the recurrent neural network architecture. Previously I mentioned the network is multi-output, multi-input and statefull. But what does this mean?

  • Unlike many other machine learning algorithms, ANNs (and RNNs) are naturally able to consider multiple time series inputs and outputs simultaneously. In other words, you can feed many time series data to the network at the same time and decide different prediction outputs. For example, the exact same network might be able to predict temperature and pressure of a tyre (2 outputs) based on GPS coordinates and weather conditions (3 inputs). That’s a multi-input and multi-output recurrent neural network.
  • LSTM neural networks can be either stateless or statefull. As the words suggest, stateless LSTMs do not keep track of the inner state propagated in time by the network. After having processed the first batch of sequences, they reset the inner state (but not their weights) to the initial value. On the contrary, statefull LSTMs remember their states (and weights) across training batches. Therefore, while the former can never take into account data dependencies greater than the sequence lenght, the latter can remember relations in data across many time steps.

#LSTM inputs The first step in every machine learning task is to pre-process input data. Depending on the input dataset and the model you want to implement, this might involve several different tasks: from cleaning the data to normalize columns or smoothing input signals. In this regards, I did the following:

  • Normalising input features. In machine learning, this common practice mainly allows gradient based optimization algorithms to work much better. Also it usually offers other important advantages.
  • Reshaping data inputs. Data must be provided to stateless RNNs in the right format. Input data must be tensors of shape B x L x F, where B is the batch size, L is the length of time step sequence, and F is the number of input features. In the code, the sentences_generator function transforms generic time series of shape L x F into two N x L x F and N x F sequences respectively, where N is the number of total rolling sequences you can retrieve from data. While the former is used as input training, the latter is utilized as target class.
  • Split data into training validation and test sets. Next I split the data into three disjoint sets: the training set, which is used to construct the model, the validation set, utilized to tune hyper-parameters of the network, and the test set, that is an unspoiled and completely unseen dataset used to evaluate the model. Since I want to predict future stock prices, training, validation and test datasets are ordered in time, so that the network can simulate the real case of having some historical data to predict ahead in future.
  • Shuffling training data at the begin of each epoch. Because the network is stateless, training data can be shuffled. This operation helps in reducing overfitting and improves the approximation of batch gradients in the mini-batch framework.

Below you can follow the code implementing all these steps. Since training data are shuffled at the begin of each epoch, this operation is performed in the next block of the code.

Coding — Modeling

#TF The model implementation was performed by using TF. Unlike Keras, TF allows to have more control on the code you’re developing, defining the neural networks at much more row levels. In TF, you can completely customize your model, constructing specific loss functions or network structures. Also, TF provides a complete framework to evaluate the performances of your model at runtime, called tensor-board. It allows you to track the values of tensors you choose at specific time steps. In so doing, you can for example identify the exact point in training were weights started to diverge or validation loss stopped to decrease.

#Model structure To carry out predictions, we generated an LSTM model having as input 128 training batches of lenght 10, each formed by 4 features. The LSTM processes the input and produces 10 outputs tensors, each containing 128 training instances composed by 128 features. Since I need to consider only the final time step of predictions (corresponding to next day forecasts), I take into account only the last output of the network, resulting in tensors of 128 training batch formed by 128 features. In turns, this is feed to 2 neural layers which allow to progressively reduce the feature dimension to produce the 128 training elements composed by 4 features. This output is compared with the real next values according to a mean squared loss function. You can view the network structure by using tensor-board:

Network structure in tensor-board

#Logging In the code some of the tensors are added to the set of variables to be tracked by the tensor board. Open a terminal and type the following:

You can now see the evolution of the parameters to track. Note also that I save the model at pre-selected timesteps to recover the whole network status at fixed points. By monitoring the losses you can choose the final best model to use: the model scoring the minimum value on the validation set is a good final candidate to be selected.

Training and validation loss evolutions
Weights and biases of the last two layers of the network while training

In the following you can see the block of code performing all these operations:

Coding — Evaluation

#Test data The last part in every machine learning process is to evaluate model performances. Test data used for this aim must be observations the algorithm did not process at all during the training, so that you can really validate the model in a real world scenario. Also, it is important that test records have the same data distribution of the training set. Keep in mind that the algorithm was built upon the training data, and (hopefully!) it managed to adapt to data distribution underlying them. Therefore, I assume data distribution does not change in time, so that I can safely use historical training data to predict future market prices.

#Predicting Predictions are made after having loaded the saved network. Two kind of predictions are performed:

  • Next day predictions. For each element in the training, validation and test sets, I predict the next values for all the the input features. As you can see in the plot below errors are small.
  • Next month predictions. To really evaluate the model, it is not enough to produce next day forecasts, but it is necessary to get predictions long in time. Next day predictions are indeed expected to be close to the ground truth, and therefore can be misleading. Instead, long time predictions can give a sense of real model performances, reflect the real situation and provide more useful estimates. To do that, I first progressively shift by one all the input time series, and then I make a new forecast for the next time step. Note that in this second case errors are more evident, even if forecasts seem to provide some sort of future trends.

Improvements

This is maybe the most important part of the article. Here, I propose some suggestions to improve the whole process:

  • Collect more data. Many typical problems in machine learning can be solved by collecting more data. In this article, I used a very small sample size to build the neural network. Collecting data at higher frequencies (hourly or by minute) can greatly help the network to improve the results and become more able to generalise.
  • Perform a better selection of the variables. Finding relevant data for the problem is crucial to produce good models. Several variables might affect future Bitcoin prices, and it is fundamental to select those which may provide valuable information for the task. So is the Dollar/Euro change affecting the price of Bitcoin? Are other cryptocurrency trends affecting the market?
  • Smoothing input time series. One problem in predicting stock prices is the intrinsic noise which is present in the input time series (if you look at the plot of the volume, you can easily recognise this behaviour). Standard neural networks are just optimized deterministic mappings from input to output spaces, and so suffer from stochastic behaviours in the target class.
  • Make time series stationary. Stationary time series preserve mean, variance and covariance across time steps. Most statistical forecasting methods are based on the assumption that the time series can be rendered approximately stationary through the use of mathematical transformations. A stationarized series is relatively easy to forecast: you simply predict that its statistical properties will be the same in the future as they have been in the past!
  • Hyper-tune neural network parameters. No tuning of the network was performed, but this is a major step in modeling. Starting from the learning rate, you can decide to tune the network to find optimal parameters with respect to data you provided.
  • Change the target variable. The network predicts the next day values for all the input time series. To generalise the process ahead in time, I progressively shift input time series to include last predictions and then made a fresh forecast. Therefore, it is natural that the errors will sum (and increase) as we keep predicting forward in time. Since everybody is interested in knowing whether the price will increase or decrease after one month for example, you might want to re-define the target variable to be a binary output (increasing vs decreasing).

Thank you for reading the article! Hope you enjoyed it. If I find the time (eheheh!), I would like to make a second post where I try to improve the results, applying some of the suggestions above. Let me know if this might be a good idea for you (or not :) ) in the comments! Feel free to ask questions if something is unclear, I will do my best to answer you!

--

--

Alberto Prospero
Coinmonks

Head of Data Science @ Revo. Enthusiastic data scientist, math lover!