Recurrent Neural Networks

Introduction to Recurrent Neural Network

Deep Learning and Natural Language Processing are the buzzwords of our generation in the field of Artificial Intelligence and everyone wants to learn it. RNN is one of the most basic structure of deep neural networks for NLP tasks. NLP requires the data to be understood in sequence and each word in a sequence is related to each other. It is even difficult for humans to understand the exact meaning of a sentence if each word of the sentence is separated i.e., in bag-of-word model.

Important Points to Ponder before we dive into architecture of RNN:

1. Time stamps have nothing to do with past, present, and future. Here time stamp represents a word or an item in a long sequence.
Example : Consider the sequence [“I”, “am”, “a”, “boy”]. Time stamp for “I” is x(0), “am” is x(1), “a” is x(2) and “boy” is x(3).
If t=1,
x(t) = “am” →“Event at current time stamp”
x(t-1) = “I” → “Event at previous time stamp”
2. Do not get confused with unrolled version network. In most of the libraries (e.g. Tensorflow, Theano), RNN networks are not implemented as unrolled version. They internally use while loop to calculate the values.
3. RNN network shares weights in time time. W and U in RNN equation remain same for all the inputs at t and hidden states from t-1.

What is Recurrent Neural Network(RNN) ?

RNN or Recurrent Neural Network is a class of artificial neural network which uses the output of hidden state produced by previous input and current input to produce current output. In other words, it uses memory for producing the desired output.

RNN remembers what it knows from previous input using a simple loop. This loop takes the information from previous time stamp and adds it to the input of current time stamp.

The above figure shows the basic RNN structure. X_t is the input to the network and h_t is the output of the network at time “t”. A is an RNN cell. RNN cells contain neural networks just like a feed-forward network or a perceptron. The hidden states of the feed-forward network are again used along with the input in next step.

The above image shows how the hidden state from previous time stamp is again used along with current time stamp input in Vanilla Recurrent Neural Network.

RNN takes input in sequence and produces output in sequence. RNN can take input one word at a time and produce another word from the vocabulary (e.g. Language Modeling Tasks) or can read the whole sentence and produce another sentence using the words in the vocabulary (e.g. Text Summarization, Language Translation, etc).

How RNN works ?

The current output of RNN depends on the hidden states produced by previous input and current input. In NLP terms, the selection of current output word depends on the current input word and what other word was used previously. This is the reason RNNs are considered to have memory. RNN uses this memory to understand the semantic information of the sequence.

The semantic information of the sequence is preserved in the hidden states of the recurrent neural network. This semantic information keeps on altering as the new input is observed and is again passed to the next input. This leads to the seamless flow of sequential information through the network of RNN. Passing of information from one time to other helps in finding the correlation among the events or words in a sentence and is often known as “long-term dependencies”.

To properly understand how RNN works and how it propagates the information, we need to look at its mathematical equations.

h_t : Hidden State at time stamp t
h_t-1 : Hidden State at time stamp t-1
W : Weight Matrix for Input to Hidden layer at time stamp t
U : Weight Matrix for Hidden layer at t-1 to Hidden layer at t
∅ : Activation function, Sigmoid or Tanh
h_t-1 : Hidden State at time stamp t-1
x_t : Input at time stamp t

The input x_t at time t is modified by the input-to-hidden weight matrix W and then added to the hidden state h_t-1 of previous time stamp modified by hidden-to-hidden weight matrix U. RNN network learns these weights i.e. U and W through training using backpropagation technique. These weights decide the importance of hidden state of previous time stamp and importance of current input. By modifying the values of previous hidden state and current input it decides how much value from each should be used for generating current output.

The result of the addition of modified hidden state of the previous time stamp and modified current input is squashed by an activation function mostly Logistic Sigmoid function or Tanh function for adding non-linearity to the network. Such activation function helps to simplify the calculation of gradients for performing backpropagation.

The propagation of hidden states from one time stamp to another time stamp can be shown with below diagram.

Next steps in RNN:

After you have successfully understood the architecture of RNN, following are the steps to be followed:

1. Training RNNs using Backpropagation through Time.
2. Exploring and solving the vanishing and exploding gradients problems in RNN and LSTM.