Recurrent Neural Network | An introduction for beginners

What makes RNN so special? Difference between CNN and RNN

vasanth ambrose

Published in

PerceptronAI

4 min readJul 9, 2020

Recurrent Neural Network

Recurrent Neural Network is an artificial neural network. It is a multi-layered network designed to get predictions for sequential data. Speech and text are forms of sequence data.

First, let’s know about sequence memory by doing a thought experiment.

Source: https://stories.freepik.com/illustration/thoughts/bro

We are able to say the alphabets in the “A to Z” order but it is impossible to say it in the “Z to A” order unless we are trained. Our brain works in the way by analyzing the previous letter to say the upcoming letter. This is the same technique is used in Recurrent Neural Networks.

First, it breaks down the sentence into separate words. The words get transformed into machine-readable vectors.

source: https://easyai.tech/en/ai-definition/rnn/

The feed-forward neural network has an input layer, a hidden layer, and an output layer. There is a looping mechanism in the hidden layer.

While processing it passes the previous hidden state to the next step of the sequence. The hidden state holds the information of the previous data and the current data.

Let’s know how the hidden state is calculated:

The previous hidden state is combined with the current value, to form a vector. This vector has information on the current input and the previous inputs. This vector passes through the activation and the new hidden state is formed. The activation function used here is tan function because it keeps all the values between negative one and positive one.

When the training process is completed the RNN is able to assign weights to every input feature by itself.

https://towardsdatascience.com/animated-rnn-lstm-and-gru-ef124d06cf45

The feature here is the individual texts in the sentence. According to gradient descent, RNN determines what information is to be passed to the feedback loop. During backpropagation, RNN suffers from vanishing gradient problem. This creates a short term memory. To mitigate this short term memory, two specialized RNN where developed. They have internal mechanisms called gates. This can regulate the flow of information. It only maintains the relevant information to make predictions.

Long Short Term Memory (LSTM)

LSTM has feedback connections. It has forget the gate, input gate, and output gate. These gates learn which information is relevant to forget or remember during the training process. The gates contain a sigmoid activation function. It has cell states, which act as a transport highway that transports relative information all the way down to a sequence chain. It acts as the memory of the network. This helps it to process the entire sequence of data. Therefore it helps it to have a longer memory than usual.

Gated Recurrent Unit (GRU)

GRU is similar to LSTM but has fewer parameters and it performs well on less frequent datasets. It uses hidden state to transfer information to transfer information instead of cell state. It has reset gate and an update gate. The reset gate decides how much pass information to forget. And the update gate decides the information to be thrown away or new information to add.

Applications of RNN:

Sentimental Analysis
Text analysis
Speech Recognition
Language translation
Video analysis
Stock prediction

Why RNN is Different from CNN?

Convolutional, pooling, and fully-connected are the three layers on CNN. These layers help in the transformation of the data to produce the prediction. RNN uses the looping mechanism in analyzing the input data to produce output.
In CNN, every hidden layer has separate functions to perform. In RNN, the output depends on the looping mechanism in hidden layers.
CNN is most suited for image classification. On the other hand, RNN is suited for sequential text data.