Faster AI: Lesson 6— TL;DR version of Fast.ai Part 1

Published in

Deep Learning Journal

7 min readSep 11, 2017

This is Lesson 6 of a series called Faster AI. If you haven’t read Lesson 0, Lesson 1, Lesson 2, Lesson 3, Lesson 4 and Lesson 5 please go through them first.

This lesson is all about Recurrent Neural Networks (RNNs). For the sake of simplicity this lesson is divided into 4 parts:

Overview of Recurrent Neural Networks [Time: 15:33]
Implementing simple RNN to predict a character [Time: 25:30]
Architectures in RNNs [Time: 51:36]
LSTM and more [Time: 1:05:55]

1. Overview of Recurrent Neural Networks

If we look at above picture, three things stands out the most in case of RNNs. First, the memory. Our model need to remember things from before and predict accordingly. Second, long term dependency, it needs to function in a way that can relate to the things in past and work on its actions in each step. Third, States. There needs to be some sort of state representation in the model, which keep track of stuffs as the model progresses.

Recurrent Networks are such networks, which interpret inputs differently than other networks in deep learning. They keep track of how the inputs are fed into the network, what is the present state of the input and how the next input will be added to the network.

2. Implementing Simple RNN to predict a character

This is a simple representation of a RNN. Here , we can see that, the first input, Char 1 is fed into the network from the first layer, which is again passed through Fully connected layer 1 via some activations. Now the second input, instead of passing from layer 1, it is passed directly to FC2 layer with the output of FC1 layer, these two inputs, output from FC1 layer of char 1 and Char 2 input are now added and fed into FC2 layer. The result of that sum, again passed through some activation function and passed to output layer to predict the char 3 by reading and understanding char 1 and char 2 in given manner.

Now lets add another input to the layer, lets say char 3.

Using the same process as above, now instead of predicting char 3, now we add char 3 as an input and this network will predict char 4, given the sequence of char 1 to 3.

This network is then implemented in Keras and then tested to check the accuracy of the model.

Here, as you can see, if the given 3 character are ‘phi’ then the network will predict ‘l’ as forth character. Similarly, given ‘th’, network predicts ‘e’.

The diagram we used above is called unrolled network diagram and this is how Keras will interpret RNNs if Tensorflow is set as the backend. But if we use Theano, another form called Recurrent form will be used, which is faster than unrolled form.

3. Architectures in RNNs

Here, as mentioned above a recurrent form of network is used, instead of specifying each input to the network, this lopping diagram is used to specify how the inner hidden layers of the network will interpret the coming inputs. Here, Input 1 which is ‘char 1’ is given at first layer then only the network will go into the loop and after the completion of the loop the network will go into output layer and will predict the next character.

Three different lines, Input->Hidden (Green), Hidden->Output (Blue) and Hidden->Hidden (Orange) is used to specify distinct operations on Input data to the network. And as some of these lines are inside the loop, suggests that they are repeatable and only input is different but function is same.

Lets look at another form of network

Here, one thing to look at is, the output layer is inside the loop and at the end of each loop the output layer will predict some output. This type of network will continuously predict the output at each iteration which is eventually increase the accuracy of the model.

To implement such network, in Keras there is one parameter called return_sequences, and if we set that to true, we can implement above network.

In above networks, the computations are fairly large compared to other neural models, which is why sometimes when dealing with large recurrent layers, the activation values are massively large and causes something called ‘Exploding Gradients’. This simply means the activation values are near to infinite while computing the matrix products in these layers. That is why simple RNNs are not used for complex tasks. For such task, another variant of RNN called LSTM (Long short term memory) is used.

4. LSTM and more

LSTM (Long Short Term Memory) models are a variant of RNN models. These models are used for complex tasks which requires more computations than simple RNNs.

One of the reason they are good at these tasks and compute without exploding the gradients is because, it uses mini neural networks inside which lets them normalize the activations and which let them select the desirable weights from the given input values.

Here, LSTM model is used for same application as above with the addition of states being recorded by the model.

We can also use more than 1 LSTM model to predict characters.

Here the output from one loop of a LSTM model is used at input to another loop of another LSTM model. This architecture can be implemented in code as

Here, as you can see two different LSTM models are used as above diagram. One thing to notice here is that, the models uses different types of dropout, dropout_U and dropout_W. Like I said above LSTM uses mini neural networks so these are the dropouts of these networks.

Here, LSTM is briefly covered because in next lesson it will be covered in more detail.

Theano

Up to now we are using Keras to implement our models. But Keras under the hood uses Theano to calculate all these models.

Theano uses something called ‘Static Computational Graph’. What it means is that, before actually executing the code, it will develop a computational graph before hand and by reading all the ins and outs of the graph then only the actual execution takes place.

This approach has both advantages and disadvantages. It is good to debug any bad code beforehand but harder if we quickly want to try out some new ideas with this architecture.

In this lesson Jeremy implements a RNN entirely on Theano without using Keras on top of it.

Which shows that, it is entirely different process to code using Theano. Up to now we have been specifying only in terms of Layers and models and inputs to design our model in Keras.

Now, if we have to code in Theano, we would have to specify, each and every details on how each layers gets computed, how to store values in variables, as a matrix or vector and how to loop through them. We would have to code our own accuracy checking mechanism, which was offered out of the box in Keras.

Its a bit harder in Theano only, but this approach also gives us direct access to more details we would want to play in the model, which before wasn’t possible. Implementing something at this level of detail will definitely teach you some great deal about the inner workings of the model.

You can watch the video from here, to see it in action.
Here is timeline to jump to any particular topic.
Here are the notes of this lesson

In next lesson we will talk about LSTM in detail and some CNN architectures.

See you there.