Let’s Build an RNN, Pytorch style!

Published in

IET-VIT

7 min readAug 21, 2020

Have you ever wondered how Deep Learning tries to mimic the concept of memory from the human brain? I guess it’s a yes because it is the very reason for you being here. So, let’s dive right into the topic of Recurrent Neural Networks which have made this possible!

Why RNNs?

• We all know how Convolutional neural networks and other Artificial neural networks allow us to analyze the spatial information in a given input image. CNN’s excel in tasks that rely on finding spatial and visible patterns in training data.

For each iteration you train the CNN it starts fresh, it doesn’t remember what it saw in the previous iteration when you are processing the current set of data. This is a big disadvantage when identifying correlations, data patterns and temporal dependencies. This is where Recurrent Neural Networks (RNN) comes into the picture.

The Distinguishing feature of RNNs:

RNN’s have a very unique architecture that helps them to model memory units (hidden state). It enables them to persist data, thus giving it the ability to model short term dependencies. Due to this reason, RNNs are extensively used in time-series forecasting to identify data correlations and patterns.

Architecture of an RNN:

A simple RNN contains:

· An input layer(x) — the layer into which we feed the data

· A hidden layer (s) — the layer in which the assumptions on the data are made and all the information regarding these assumptions are stored in a unit known as memory. Further these assumptions are passed onto the output layer.

· An output layer(o) at time t — this layer receives the processed data from the hidden layer and makes the final predictions on our data.

So, let’s create a simple recurrent neural network using pytorch!

Okay, so let’s take the first step to build your own Recurrent Neural Network using none other than the most user-friendly library available in Python to practice Deep Learning is Pytorch!

First, let’s understand what is PyTorch? Pytorch is an open source machine learning library. It’s a Python-based scientific computing package.

The most basic intuition behind using Pytorch is:

1. A replacement for NumPy to use the power of GPUs.

2. A deep learning research platform that provides maximum flexibility and speed.

Now let’s understand how to use Pytorch by creating our very own Time-series prediction using Recurrent Neural Networks.

Prerequisites:

To understand our RNNs better there are a few prerequisites:

1. Basic understanding of programming language Python

2. Have knowledge about how Artificial Neural Networks works.

Importing the basic libraries required.

First step would be importing all the libraries that will help us create our recurrent neural network with pytorch.

import torch
from torch import nn
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

Next step would be importing the data but for this example we will create our data. Stock prediction is one of the first stops, for deep learning enthusiasts. Fortunately, this blog does just that. So, here we are going to build a time series prediction model with RNNs.

seq_length = 20
time_steps = np.linspace(0, np.pi, seq_length + 1)
data = np.sin(time_steps)
data.resize((seq_length + 1, 1)) 
#size becomes (seq_length+1, 1), adds an input_size dimension

Here, we first create a series of time steps consisting of 21 points, each at equal intervals within the range of 0 to pi and their corresponding sine values makeup our dataset.

x = data[:-1] # all but the last piece of data
y = data[1:] # all but the first

x which is our input variable consists of the first 20 sine values of our time steps and y which is our output variable consists of values starting from the second point. This means that y precedes x by one time step.

This is just a glimpse of the time series data that I created in the above step.

Building our Recurrent Neural Network:

Finally, we have reached at the most awaited step i.e. building our RNN. So, come along and let’s have a look at how to implement it in pytorch!

The code of our RNN is given below for your reference:

class RNN(nn.Module):
    def __init__(self, input_size, output_size, hidden_dim, n_layers):
        super(RNN, self).__init__()
        
        self.hidden_dim=hidden_dim
        # batch_first means that the first dim of the input and output will be the batch_size
        self.rnn = nn.RNN(input_size, hidden_dim, n_layers, batch_first=True)
        # last, fully-connected layer
        self.fc = nn.Linear(hidden_dim, output_size)

    def forward(self, x, hidden):
        # x (batch_size, seq_length, input_size)
        # hidden (n_layers, batch_size, hidden_dim)
        # r_out (batch_size, time_step, hidden_size)
        batch_size = x.size(0)
        
        # get RNN outputs
        r_out, hidden = self.rnn(x, hidden)
        # shape output to be (batch_size*seq_length, hidden_dim)
        r_out = r_out.view(-1, self.hidden_dim)  
        
        # get final output 
        output = self.fc(r_out)
        
        return output, hidden

We pass our data to the input layer which consists of 1 node and it is further connected to the hidden state with 32 nodes which eventually leads us to the final output layer with 1 node as well. Here we take only 1 hidden layer.

We then create our object of the RNN class, which we’ll name as rnn.

input_size=1 
output_size=1
hidden_dim=32
n_layers=1

rnn = RNN(input_size, output_size, hidden_dim, n_layers)
print(rnn)

Now for the loss function and optimizer:

Since we are working on a regression model, we will use mean-squared error to calculate our loss while training the data. It’s typical to use an Adam optimizer for recurrent models.

criterion = nn.MSELoss()
optimizer = torch.optim.Adam(rnn.parameters(), lr=0.01)

Now it’s time to train our model!

For training the model we define a function train() which takes the model, number of steps for which it trains the model and print_every which will show us the results after those many steps.

def train(rnn, n_steps, print_every):
    
    # initialize the hidden state
    hidden = None      
    
    for batch_i, step in enumerate(range(n_steps)):
        # defining the training data 
        time_steps = np.linspace(step * np.pi, (step+1)*np.pi, seq_length + 1)
        data = np.sin(time_steps)
        data.resize((seq_length + 1, 1)) # input_size=1

        x = data[:-1]
        y = data[1:]
        
        x_tensor = torch.Tensor(x).unsqueeze(0) # unsqueeze gives a 1, batch_size dimension
        y_tensor = torch.Tensor(y)

        # outputs from the rnn
        prediction, hidden = rnn(x_tensor, hidden)

        # Representing Memory #
        # make a new variable for hidden and detach the hidden state from its history
        # this way, we don't backpropagate through the entire history
        hidden = hidden.data

        # calculate the loss
        loss = criterion(prediction, y_tensor)
        # zero gradients
        optimizer.zero_grad()
        # perform backprop and update weights
        loss.backward()
        optimizer.step()

        # display loss and predictions
        if batch_i%print_every == 0:        
            print('Loss: ', loss.item())
            plt.plot(time_steps[1:], x, 'r.') # input
            plt.plot(time_steps[1:], prediction.data.numpy().flatten(), 'b.') # predictions
            plt.show()
    
    return rnn

Quite similar to training a Neural Network, right? For an individual coming across to train the network for first time you can check this out https://medium.com/iet-vit/pytorch-101-an-introduction-to-deep-learning-5797272fa618, it will just ease past you through this.

Visualizing the final results:

As we can observe that at initial stages our model does not make accurate predictions. But after training the model for 75 steps we are able to fetch pretty good results.

Loss: 0.11401782184839249

Loss: 0.045131802558898926

Loss: 0.0026211580261588097

Loss: 0.00037287137820385396

Loss: 0.0005231606774032116

Conclusion:

Congratulations on making your very first time series prediction with a Simple Rnn. However, Time-series prediction can be applied to many tasks. Think about weather forecasting or predicting the ebb and flow of stock market prices. You can even try to generate predictions much further in the future than just one-time step!

“Predicting the future isn’t magic, it’s artificial intelligence.” ~Dave Waters

You can find the Jupyter Notebook with the full Python code here:

Nishil07/Simple-Rnn-for-my-first-Medium-blog

This is the complete code for a simple Rnn which you just read about on Medium. I hope anyone who reads this finds it…

github.com

~ Written by Nishil Madhani