RNN and LSTMs-Easy Concept Grasp in under 5 minutes

Deepak_Raj
Analytics Vidhya
Published in
5 min readMar 16, 2021

RNN = Recurrent Neural Network:

The primary use of RNNs is at predicting a sequence of data. eg: Poem Generator or Auto-filling of sentences. When we talk about RNNs, pictures like these come into our minds. These are nothing but basic RNN cells.

RNN-Model Demo

Some basics to keep in mind:

1)Gradient Update Rule:

New weight = Current Weight - (learning rate * gradient)

2)Backpropagation:

In short, Backpropagation is the backward wave of calculations of gradients for each neuron layer backward with the help of loss function. The loss function is calculated during Feed forward training(Forward wave of calculations). During Backpropagation the model updates the weights of each neuron backward so that it performs well(minimizes the loss) in the subsequent Feed Forward training of the neurons.

Model training is nothing but the Foward mathematical wave of error calculation and reversed mathematical wave of error correction by updating weights in each neuron.

Backpropagation.

3)Vanishing gradient :

During Back Propagation(wave of backward calculations), the gradient diminishes with each consecutive layer backward so that the weights of the neurons in the frontal layers have minimal weight change. In a Neural Network neurons with little weight update do not learn much. In this case, there is a vanishing of gradient change hence this is called the vanishing gradient problem.RNN also has a vanishing gradient problem(short-term memory loss).In order to prevent this LSTM(Long Short term Memory) is used.

LSTM(Long Short Term Memory):

Follow the appropriate explanation for each number in the picture to understand the inputs and outputs better.

Symbol Explanation used in the above picture for better understanding:

Compare it with the above image

Both inputs and outputs and outputs of LSTM are the weights of each neuron in the form of arrays or matrices. These weights are manipulated with the help of gates inside the unit.

Inputs :

1 → previous cell state ( arrays of numbers ).

2 → Information from the previous hidden state ( also an array of numbers that holds the previous memory).

3 →Information from the current input (also an array of numbers).

Outputs:

4 →Manipulated cell state (output array of numbers passed on to next cell as a new cell state).

5 →Manipulated Hidden state ( output array of numbers passed on to next cell as a new hidden state).

Now that we know what is sent in as input and what is sent out as output we can go into the details of how LSTM processes the input data.

The LSTM unit is basically made up of three gates that decides which words to keep and which to throw. You can imagine it as a check-post for data flow. There is also an additional parameter called cell state which is passed to the next LSTM cell as well. These three gates basically manipulate the data in the cell state and determine which data passes through the cell state.

The three types of gates are shown in the figure:

All the gates do is manipulate the array of weights
Compare it with the above image

Please take a couple of minutes and compare each of the following steps with the above pictures for clear understanding.

Step 1: At first, the current Input (3) and previous Hidden state (2) → [2+3] is passed as an array to the forget gate. Forget gate determines what weights to keep and what weights to throw away by multiplying the input array with the sigmoid function which ranges between 0 and 1 and the output from the forget gate is multiplied with the previous cell state.

Step 2: Now order to update the cell state the same array[2+3] is then passed to the input gate(sigmoid and tanh function) individually and then multiplied together to give a single array of weights which is later added to the cell state.

Step 3: Now the same array[2+3] is passed through the output gate(sigmoid function) and it is ready to be multiplied.

Step 4: Meanwhile simultaneously the cell state is also updated (From outputs of Step 2 and Step 3). This updated cell state is sent to the next LSTM cell via output 4.

Step 5: At the same time in order to calculate the New Hidden State, the updated cell state is passed through a tanh function and is multiplied with the output from the output gate which is ready to be multiplied(Step 3). The multiplied output is sent out via output 5 as the new hidden state.

Notes to remember :

Please note that the array [2+3] is sent to four different functions individually in four different branches.

The previous cell state is updated twice(once multiplied and once added) before going through output 4.

The output of the sigmoid function ranges between 0 and 1 while the output of the tanh function ranges between -1 and 1.

LSTMs can be daunting at first. But please don’t lose hope. Once the neurons in your brain have been trained on understanding this algorithm it will be much easier.

I have written the blog with minimal mathematical expressions for beginners. For wider knowledge and mathematical understandings in LSTMs and RNNs please refer to the following blog which is very mathematically descriptive.

Please feel free to clap, follow and share this blog if you have learned something. In case of any questions and suggestions please feel free to comment.

--

--

Deepak_Raj
Analytics Vidhya

Masters in Robotics and AI @ Technical University of Munich | AI Developer | Passionate about Computer Vison and Transformers