In this tutorial, I will first teach you how to build a recurrent neural network (RNN) with a single layer, consisting of one single neuron, with PyTorch and Google Colab. I will also show you how to implement a simple RNN-based model for image classification.
This work is heavily inspired by Aurélien Géron’s book called “Hand-On Machine Learning with Scikit-Learn and TensorFlow”. Although his neural network implementations are purely in TensorFlow, I adopted/reused some notations/variables names and implemented things using PyTorch only. I really enjoyed his book and learned a lot from his explanations. His work inspired this tutorial and I strongly recommend the book.
We will be using Google Colab so we need to manually install the PyTorch library first. You can do this by using the following command:
Now we can import the necessary libraries we will use in the tutorial:
RNN with A Single Neuron
The idea of this tutorial is to show you the basic operations necessary for building an RNN architecture using PyTorch. This guide assumes you have knowledge of basic RNNs and that you have read the tutorial on building neural networks from scratch using PyTorch. I will try to review RNNs wherever possible for those that need a refresher but I will keep it minimal.
First, let’s build the computation graph for a single-layer RNN. Again, we are not concerned with the math for now, I just want to show you the PyTorch operations needed to build your RNN models.
For illustration purposes, this is the architecture we are building:
And here is the code:
In the above code, I have implemented a simple one layer, one neuron RNN. I initialized two weight matrices,
Wy with values from a normal distribution.
Wxcontains connection weights for the inputs of the current time step, while
Wy contains connection weights for the outputs of the previous time step. We also added a bias
forward function computes two outputs — one for each time step (two overall). Note that we are using
tanh as the non-linearity (activation function) via
As for the input, we are providing 4 instances, with each instance containing two input sequences.
For illustration purposes, this is how the data is being fed into the RNN model:
And this is the code to test the model:
After we have fed the input into the computation graph, we obtain outputs for each time step (
Y1), each of size
4X1 which represents the size of batch and hidden units, respectively. (See output below)
Increasing Neurons in RNN Layer
Next, I will show you how to generalize the RNN we have just build to let the single layer support an
n amount of neurons. In terms of the architecture, nothing really changes since we have already parameterized the number of neurons in the computation graph we have built. However, the size of the output changes since we have changed the size of number of units (i.e., neurons) in the RNN layer.
Here is an illustration of what we will build:
And here is the code:
Now when we print the outputs produced for each time step, it is of size (
4 X 5), which represents the batch size and number of neurons, respectively.
PyTorch Built-in RNN Cell
If you take a closer look at the
BasicRNN computation graph we have just built, it has a serious flaw. What if we wanted to build an architecture that supports extremely large inputs and outputs. The way it is currently built, it would require us to individually compute the outputs for every time step, increasing the lines of code needed to implement the desired computation graph. Below I will show you how to consolidate and implement this more efficiently and cleanly using the built-in
Let’s first try to implement this informally to analyze the role
With the above code, we have basically implemented the same model that was implemented in
torch.RNNCell(...)does all the magic of creating and maintaining the necessary weights and biases for us.
torch.RNNCell accepts a tensor as input and outputs the next hidden state for each element in the batch. Read more about this module here.
Now, let’s formally build the computation graph using the same information we used above.
The output of the above code is as follows:
You can see how the code is much cleaner since we don’t need to explicitly operate on the weights as shown in the previous code snippet — everything is handled implicitly and eloquently behind the scenes by PyTorch.
RNN for Image Classification
Now that you have learned how to build a simple RNN from scratch and using the built-in
RNNCell module provided in PyTorch, let’s do something more sophisticated and special.
Let’s try to build an image classifier using the MNIST dataset. The MNIST dataset consists of images that contain hand-written numbers from 1–10. Essentially, we want to build a classifier to predict the numbers displayed by a set of images. I know this sounds strange but you will be surprised by how well RNNs perform on this image classification task.
In addition, we will also be using the
RNN module instead of the
RNNCell module since we want to generalize the computation graph to be able to support an
n number of layers as well. We will only use one layer in the following computation graph, but you can experiment with the code later on by adding more layers.
Importing the dataset
Before building the RNN-based computation graph, let’s import the MNIST dataset, split it into test and train portions, do a few transformations, and further explore it. You will need the following PyTorch libraries and lines of code to download and import the MNIST dataset to Google Colab.
The code above loads and prepares the dataset to be fed into the computation graph we will build later on. Take a few minutes to play around with the code and understand what is happening. Notice that we needed to provide a batch size. This is because
testloader are iterators which will make it easier when we are iterating on the dataset and training our RNN model with minibatches.
Exploring the dataset
Here is a few lines of code to explore the dataset. I won’t cover much of what’s going on here, but you can take some time and look at it by yourself.
And the output of the code is a batch of images displayed on a grid:
Let’s construct the computation graph. Below are the parameters:
And finally, here is a figure of the RNN-based classification model we are building:
And here is the code for the model:
ImageRNN model is doing the following:
- The initialization function
__init__(...)declares a few variables, and then a basic RNN layer
basic_rnnfollowed by a fully-connected layer
init_hiddenfunction initializes hidden weights with zero values. The
forwardfunction accepts an input of size
n_steps X batch_size X n_neurons. The data flows through the RNN layer and then through the fully-connected layer.
- The output represent the log probabilities of the model.
Testing the model with some samples
A very good practice encouraged by PyTorch developers throughout their documentation, and which I really like and highly recommend, is to always test the model with a portion of the dataset before actual training. This is to ensure that you have the correct dimensions specified and that the model is producing the information you expect. Below I show an example of how to test your model:
Now let’s look at the code for training the image classification model. But first, let’s declare a few helper functions needed to train the model:
Before training a model in PyTorch, you can programatically specify what device you want to use during training; the
torch.device(...) function tells the program that we want to use the GPU if one is available, otherwise the CPU will be the default device.
Then we create an instance of the model,
ImageRNN(...), with the proper parameters. The criterion represents the function we will use to compute the loss of the model. The
nn.CrossEntropyLoss() function basically applies a log softmax followed by a negative log likelihood loss operation over the output of the model. To compute the loss, the function needs both the log probabilities and targets. We will see later in our code how to provide this to the criterion.
For training, we also need an optimization algorithm which helps to update weights based on the current loss. This is achieved with the
optim.Adam optimization function, which requires the model parameters and a learning rate. Alternatively, you can also use
optim.SGD or any other optimization algorithm that's available.
get_accuracy(...) function simply computes the accuracy of the model given the log probabilities and target values. As an exercise, you can write code to test this function as we did with the model before.
Let’s put everything together and train our image classification model:
We can also compute accuracy on the testing dataset to test how well the model performs on the image classification task. As you can see below, our RNN model is performing very well on the MNIST classification task.
Please notice that we are not using GPU in this tutorial since the models we are building are relatively simple. As an exercise, you can take a look at the PyTorch documentation to learn how to program specific operations to execute on the GPU. You can then try to optimize the code to run on the GPU. If you need help with this, reach out to me on Twitter.
That’s it for this tutorial. Congratulations! You are now able to implement a basic RNN in PyTorch. You also learned how to apply RNNs to solve a real-world, image classification problem. I have also implemented this on Google Colab already so you can take a look at the result here.
In the next tutorial, we will do more advanced things with RNNs and try to solve even more complex problems, such as sarcasm detection and sentiment classification. Until next time!
- Fixed the output of the SingleRNN model; initially added the wrong output. Thanks to Arnav for capturing this.
- (27/Feb/2020) In CleanBasicRNN, we changed
self.rnn. This was a bug in the code. Thanks to folks in the comment section that caught this one.
- (28/Feb/2020) The Colab notebook was updated to reflect the changes above.