Recurrent Neural Network using Tensor Maths

Understanding how the RNN network flows internally is of primary importance. Lets deal with it in this blog post.

4 min readJun 17, 2017

RNN Theory is written by many people and two links I shared below are undoubtedly the best. In simple words , RNNs are Networks in Loops, Allowing information to persist.

Signup for my newsletter

The Unreasonable Effectiveness of Recurrent Neural Networks

Musings of a Computer Scientist.

karpathy.github.io

Understanding LSTM Networks -- colah's blog

These loops make recurrent neural networks seem kind of mysterious. However, if you think a bit more, it turns out that…

colah.github.io

We will be using TensorFlow for Tensor math. PyTorch is equally good and dynamic too.

Lets import TensorFlow, number of steps is the max_length of a sentence, state_size is the internal state size (3 neurons here ). Lets consider 400 words(n_words) to be in the corpus.

import TensorFlow and define important parameters

create the placeholders for input and output vectors.

Now convert x to one_hot vector and unstack all the vectors (Each input will go into each FC network ( See the above diagram where word1, word2 inputs will be each network). In the below photo you can clearly see rnn_input[0] which will be input to our first word1 network, We have 30 (num_steps or max_length of a sentence) such vectors each entering the network.

one_hot encoding and unstacking in TensorFlow

Now define Weights and bias . There will be two weight vectors

Wih : Between Input layer (batch_size, n_words) and hidden layer (Input layer, hidden_layer_neurons).
Whh: Between Hidden state (hidden_state_neurons, hidden_layer_neurons) and hidden layer. Both hidden_state_neurons and hidden_layer_neurons should be same else addition of the status won’t happen. you will see it soon.
b: bias on the hidden layer

weight and bias initializer — TensorFlow

Now define an rnn_cell which will essentially do simple Feed forword multiplication and hidden_state multiplication. Later combine them and add bias to it. In the end send it through an activation function (used tanh here )

We need to perform the above cell operation on all 30(num_words in a sequence) inputs. We can run a simple for loop and add the last output as an input to next state.

Lets add an output layer to the last state. This will be one layer below the Dense Layer we have mentioned in the diagram.

Lets add one more dense layer ( You can add more depending on the complexity of the work you are doing)

Add the Final output layer and Add softmax to it.

Its too much of a code and effort to write down about LSTM but worth the short if you want to exactly understand how it works. try it if you have time.

Do we need to write all this code every time?

No. You can wrap these all under a function and use it every time. All Deep Learning frameworks provide rnn cell function. you can just use them. It will be just one line.

Note:

I love Keras and use it to experiment different models but if you really want to under stand deep learning jump on to TensorFlow or Pytorch and use Tensor math to solve your problems. This will help you in long run.