Time Series Forecasting — LSTM

Venkatakrishna Reddy
Analytics Vidhya
Published in
5 min readMar 6, 2020

In this blog, we will understand the concept of RNN networks, different types of networks available and its practical implementation. We also see the performance comparison between RNN and ANN networks.

RNN Network: The idea behind RNNs is to make use of sequential information. In a traditional neural network, we assume that all inputs (and outputs) are independent of each other. But for many tasks that are a very bad idea. If you want to predict the next word in a sentence you better know which words came before it. RNNs are called recurrent because they perform the same task for every element of a sequence, with the output being depended on the previous computations and you already know that they have a “memory” which captures information about what has been calculated so far.

RNN Architecture

Applications

Different types of RNNs

  1. One to One: This network typically has one input and one output node. It can have ’n’ number of hidden nodes. This network used in predicting the share price of the company/stock.
  2. One to Many: This network typically has one input and multiple output nodes. It can have ’n’ number of hidden nodes. This network is used in image captioning.
  3. Many to One: This network typically has multiple inputs and a single output node. It can have ’n’ number of hidden nodes. This network is used for sentiment analysis.
  4. Many to Many: This network typically has multiple inputs and multiple output nodes. It can have ’n’ number of hidden nodes. This network is used for language translation.

Disadvantages of Recurrent Neural Network

  1. Vanishing Gradient and Exploding Gradient.
  2. Transfer learning is not possible.
  3. Training an RNN is a very difficult task.

Let’s quickly see what is vanishing and exploding gradient and how to solve this problem. Weights are updated too slowly when we going back from the last layers to the first layers. This is called the Vanishing gradient. We can avoid this by using LSTM, GRU networks and the introduction of dropout in layers. An exploding gradient is a problem where weights updated too fastly, it causes a divergent problem.

LSTM — Long Short Term Memory: LSTM’s have a Nature of Remembering information for a long period of time is their Default behavior. This network has three gates useful in remembering long sequences.

Forget gate, Input gate, and Output gate

Forget gate is used to decide what information has to remember or forgot. While dealing with long sentences, it is important to remember the old state until a new state occurs. For example, when dealing sentences have to forget the gender of the old person when new personal information is discussed. The input gate is used to decide what information has to be added to the network. The output gate is used to combine the results of both gates and to forward the response to the next layer.

GRU has two gates, update and reset gates. It is less complex, will take fewer operations hence it is much faster than LSTM. Generally, two layers have shown to be enough to detect more complex features. More layers can be better but also harder to train. As a general rule of thumb, one hidden layer work with simple problems, and two are enough to find reasonably complex features.

So far we discussed RNN architecture and LSTM. Now let’s jump into my favorite and exiting practical problem solving using LSTM.

Problem: Predicting Open, Close, High and Low values of shares using LSTM. It is a Many to Many problem, here I will explain each and every step of the implementation. This is implemented using Pytorch, we used 50 years of data for this prediction.

Importing necessary packages

warnings.filterwarnings(‘ignore’)

Making use of CUDA for faster processing

device = torch.device(‘cuda:0’) if torch.cuda.is_available() else torch.device(‘cpu’)

This line will help to run the process in GPU with a higher speed. If we didn’t specify CUDA, then it will default run in CPU.

Creating Custom dataset

First, we selected the necessary columns, then, we applied feature scaling to the data. In PyTorch inbuilt packages not there for feature scaling so, have to write custom functions to do this. The most important step is to create batches of data. Here we consider the last thirty samples to predict the next output, as per this we have created the data.

Creating Model

Created LSTM layers followed by Liner layers. Input and Output size is 4 for this case as we are predicting Open, Close, Low and High values. LSTM used hidden state and Cell state to store the previous output so, we defined ho and co. The forward function is a default a function, used to pass data between one layer to the other layer.

Training Model

This function used for training, these steps consist of predicting output, calculating loss, backpropagation and parameters optimization.

The training process will start here. Here we applied backpropagation to each step, to make the training more effective. This process is called stochastic gradient descent. We calculated the loss for each batch and each epoch.

Validating Model

In the training/validation phase, we shouldn’t do backpropagation. Model and inputs should move to the GPU for faster processing. GPU can able to access only GPU memory, so all required inputs to the Model have to send to GPU.

Performance Comparision

LSTM Predictions
ANN Predictions

From the above predictions, we can able to see clearly how RNN works much better than ANN for Time-series data.

Conclusion: Time series forecasting is one of the interesting and exiting domain in Deep learning space. It used in Retail, Healthcare, Agriculture, Banking, Security and many industries. I hope the concepts given were clear to get started into the Time series domain. In the next blogs, I will explain more techniques and their implementation.

--

--

Venkatakrishna Reddy
Analytics Vidhya

I am a Data Scientist. Interested in learning, exploring, and sharing knowledge.