Time series forecasting of exchange currency with Tensor Flow

Edward Ortiz
Analytics Vidhya
Published in
10 min readSep 21, 2020

--

Case of study for the single step implementation of a time series forecast model using RNN-LSTM.

Image credit: raconteur.net

In this article, we will cover the concept of time series but injecting some machine learning into this. At the end of this reading, I promise you will feel more attracted to machine learning and how you can apply time series forecasting with any neural network model, and who knows? maybe you will be the next wolf of wall street by predicting prices in stocks, ETF’s, currency exchange, and more.

If you are new into machine learning feel free to read some of my previous articles were I explain comprehensively some complex topics in a way you can understand without feeling frustrated.

The basics

Time series is An ordered sequence of values of a variable at equally spaced time intervals. The applications of time series are now wide common but mostly these models seek to obtain an understanding of the underlying forces and structure that produced the observed data and fit a model and proceed to forecasting, monitoring, or even feedback and feedforward control.

Time series analysis accounts for the fact that data points taken over time may have an internal structure (such as autocorrelation, trend, or seasonal variation) that should be accounted for.

The context

Having defined the concept of time series is time to look at what of these models machine learning applications can fit to enhance a time series forecasting. One of the main advantages to apply neural networks techniques to real problems is that we can generate a model that can be trained by itself, and not be limited by a specific range of data. In other words, our model can receive as many parameters as we want as long as our computational expense can be sustained by a machine.

Another advantage to use neural networks in time series is that the input is stored in the own network instead of a database so the loss of data won’t be affected. The most common neural networks that can implement a time series prediction are the convolutional neural networks, deep neural networks, and recurrent neural networks.

Another particular aspect when working with ANN (artificial neural networks) is that when doing forecasting the model can be trained under a method called a single feature, in which the mode is going to perform training and prediction of one variable vs multiple variables. This one to many evaluations is the first step when implementing an ANN for time series. If you want to take the model to another step the other technique is doing forecasting by multiple steps.

Multiple steps forecasting with ANN can perform a single shot prediction of all the parameters at once. For example, in a forecast for a logistics company, the model intends to forecast a correlation of multiple variables e.g. supply, demand, stock, production at the same time, and between them. It may seem unnecessary or incoherent, but depending on the problem to solve, time forecasting with ANN using multiple steps can give specific insights that a traditional model can deliver, and even more, it can make one prediction at a time and feed the output back to the model.

The problem

Bitcoin is surrounded by mystery on who created this digital currency; also, it is a highly volatile asset, and some governments still deny the use of this currency for their model of decentralization commerce. Nevertheless, the hype and the same fact of being a decentralized currency has maintained this digital currency alive, and now banks are investing in it and started adopting it to offer bitcoin-related services.

In 2018 Bitcoin peaked with the transaction price and many people started to see the way to know more about the trend of this currency to invest and win a large profit over that investment. We are going to use a dataset from coinbase.com of the historical data by the minute of the bitcoin BTC transaction in USD Dollars from 2017 to 2020. Based on that, we are going to use an ANN to predict based on the last 24 hours the close price of the next hour of the BTC.

EDA and Data Engineering

The data set from coinbase contains around 1.485.982 million records distributed at the open price, close price, high, low volume in BTC, volume in USD, weighted price, and transaction time in one minute. This is a lot of data, and we need first to make an exploration of that data to gather as much accurate information of the data.

By making this EDA (Exploration Data Analysis), a lot of NaN values were found, but those NaN values belonged to the first four years in which the bitcoin was not that popular as today. Therefore, taking this into account the first step of handling this data is to use the records from 01–01–2017 until the last date 02–02–2020.

Because the problem intends to forecast each hour the close price of BTC based on the last 24 hours, the data is going to take into account each hour from 01:00:00 to 00:00:00. This will also reduce the dataset but the number of records will not affect the performance of the model.

Table 1: EDA basic data of coinbase
Line plot: time series per hour BTC

At this stage, any other exploration is possible to do from a correlation heat map, to distribution analysis to identify other patterns that can serve to train a real ANN model.

Setting up the model: data splitting & normalization

Once performed the initial exploration cleaning of the data, it is time to decide what will be the final data to put into the neural network model. In this case, it was dropped the column of volume currency due it was the one with less correlation with the other variables. The new total of parameters to be evaluated with the neural network model is 17.687.

The split data process for the model was to choose 70% of the total data to be trained, 20% for validation, and 10% for the test data. This proportion is a standard one, but in case you want to work with an 80% — 20% ratio, it is possible to do it. The final amount of parameters to be trained is:

column_indices = {name: i for i, name in enumerate(df.columns)}
n = len(df)
train_df = df[0:int(n*0.7)]
val_df = df[int(n*0.7):int(n*0.9)]
test_df = df[int(n*0.9):]
  • Train set: 12380 parameters with six labels
  • Valid set: 3538 parameters with six labels
  • Test set: 1769 parameters with six labels

Also, it is important to scale the data (normalization process) this can be done by subtracting the mean and dividing it by the standard deviation.

train_mean = train_df.mean()
train_std = train_df.std()
train_df = (train_df - train_mean) / train_std
val_df = (val_df - train_mean) / train_std
test_df = (test_df - train_mean) / train_std

This graphic shows that only the Volume_BTC column has a long tail, but this feature is not the most relevant one so it will not affect the training of the model.

Violin plot of normalized dateset BTC

Forecasting with TF: Introducing Data Windowing

Once having the data set, split, and normalized, the next step is to define the variable to be evaluated in a single step model. For the time series forecasting, the library to do this implementation is tensor flow, using a process called Data windowing needs to be done first.

Data windowing is to create a window (array) then will contain the number of steps (width) of the features. That means that if we are going to predict the close price of BTC each hour taking the last 24h transaction record, the width is going to be 24. The following graphical example can give an intuition of this.

window that makes a prediction 1h into the future, given 6h of history. Image credit: TensorFlow.org
w1 = WindowGenerator(input_width=24, label_width=1, shift=1,                      label_columns=['Close']) w1
Total window size: 25
Input indices: [ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23]
Label indices: [24]
Label column name(s): ['Close']

This piece of code shows the output of implementing the data windows taking the main feature (Close price) as the column label.

By identifying the window that we have to construct, the next step is to split the window into pairs of features and labels. To achieve this, the most efficient way is to create a python class named Windows Generator and then generate batches of these windows from the training, evaluation, and test data, using tf.data.Dataset.

Also, two methods were created, split_window and make_dataset. Both of them serve for handling the label_columns generated in the windows generator function so it can serve useful to a single-step method training with the ANN.

Finally, the model and is going to compute three different scenarios in which the prediction will take place

Implement the architecture in a single step model RNN-LSTM

Recently, for a business decision, logistic monitoring, and financial prediction amidst other uses, the complexity of forecasting problems required clever solutions to heads up with these problems. With the continuous improvement of ANN models and properly applying them, a robust alternative is now available to predict accurately and extract unseen features and relationships in this kind of time series problems.

The chosen ANN for this example was an RNN using an LSTM network. This network is similar to an RNN (recurrent neural network); the big difference is that LSTM (Long Short-Term Memory) is a modified version of an RNN, allowing to remember past data easily. LSTM is popular to predict time series given time lags of unknown duration. The training process is done by using back-propagation. Below a diagram of an RNN-LSTM

LSTM diagram Image credit: mahnerak

For this implementation, the model will consist of a 16 units layer LSTM architecture, optimized with ADAM. Because there is a window generated, (24h step), this window will be the input of the dimensions in the cells of the network. Also, the model will have a dense layer for the output of the prediction.

Finally, a callback was selected to save the loss MSE (mean squared error) and to gather the most accurate prediction point based on the smallest error possible of the complete training in the valid set.

Baseline result

The baseline result is the time series forecasting of the data set without any ANN model implemented. The absolute error of this training was 0.0051. To understand these diagrams, see the pattern between the label points and the predictions. When both data points are connected between themselves, it means the close price of that data point will be occur based on the MSE computed.

However, the patter of the three different scenarios is different. A way to improve this is to merge the three predictions into a single line plot to see which point is most likely to connect, and that should be the investment momentum (the prediction) of the model.

Baseline Time series forecasting BTC predictions

RNN-LSTM result

The ANN model used for this forecasting got a result of 0.089. There is not a considerable difference compared to the baseline model except for the third line plot where the train of the BTC is not relevant. The hypothesis on this situation is due to some of the inputs are not higher correlated with the close of BTC (parameter defined for prediction) so, when normalizing the data, some outliers affected this training. You can remove those columns that do not have good correlation or none correlation at all but, the idea of this technique is to create a prediction based on the evaluation of multiple variables with the Close price of BTC.

RNN-LSTM Time series forecasting BTC predictions

Performance of the model

Performance Baseline vs RNN-LSTM BTC

The baseline model with no ANN implementation on it got a better score of loss MSE with 0.0051. Contrarily the RNN-LSTM loss was 0.0089 in both models exceed in performance and notice that the RNN-LSTM is a much more complex model that takes into account the regression of multiple variables. Therefore, the small difference of loss respects the naive implementation.

The idea to have both models compared is to understand the behavior between a traditional implementation versus a machine learning approach and based on both models to create a strategy that can fulfill the forecast decision, in this case, the advanced purchase decision of bitcoins in a future of 24 hours.

Final thoughts

Time series forecasting is a hot topic these days, and people seem that mastering this technique will be the dream job, and be like the wolf of wall street. However, time-series prediction still has limitations; the forecasting with a naive implementation of a machine learning approach will compute the data of the historical record entered in the model. In a real-life scenario, that in the case of the stock market or currency market, volatility is wild, is a kind of entropy that its quite hard to measure for this specific case. So anything new that happens outside the environment created to train a model of forecasting will have a different output than the one predicted.

I hope you liked this fun article, and if you get rich with this, buy me a coffee and why not? donate me a bitcoin I will be motivated to do more crazy prediction using ML

Skimming the code

Feel free to use the code of this example and try to understand the intuition so you can implement by yourself tie series forecasting into currencies, ETF’s or stocks.

--

--

Edward Ortiz
Analytics Vidhya

30 years of innovation, inspiration, fascination. -All rights reserved- #whatisyourstory