Fundamentals of Recurrent Neural Network (RNN) and Long Short-Term Memory (LSTM) network.

7 min readJun 5, 2020

What is Recurrent Neural networks ?

Recurrent Neural networks and the types of situations in which they can be used, to solve a problem. A Recurrent Neural Network, or RNN for short, is a type of deep learning approach, that tries to solve the problem of modeling sequential data.

Whenever the points in a dataset are dependent on the previous points, the data is said to be sequential.

For example,

A stock market price is a sequential type of data because the price of any given stock in tomorrow’s market, to a great extent, depends on its price today. As such, predicting the stock price tomorrow,is something that RNNs can be used for. We simply need to feed the network with the sequential data, it then maintains the context of the data and thus, learns the patterns within the data.
We can also use RNNs for sentiment analysis. Let’s say for example, you’re scrolling through your product catalogue on a social network site and you see many comments related to a particular product of yours. Rather than reading through dozens and dozens of comments yourself and having to manually calculate if they were mostly positive, you can let an RNN do that for you. Indeed, an RNN can examine the sentiment of keywords in those reviews. Please remember, though, that the sequence of the words or sentences, as well as the context in which they are used, is very important as well. By feeding a sentence into an RNN, it takes all of this into account and determines if it the sentiment within it those product reviews are positive or negative.
RNNs can also be used to predict the next word in a sentence. I’m sure we’ve all seen how our mobile phone suggests words when we’re typing an email or a text. This is a type of language modeling within RNN, where the model has learned from a Big textual corpus, and now can predict the next word in the sentence. As you can see, thinking in a sequential way, the word being suggested is very dependent on the previously typed words and the context of that message. When needing quick translation of certain words into another language, a great many people today, use the translation service of Google translator. We enter a sequence of words in English and it outputs a sequence of the words in French, as seen here. This type of text translation is another example of how RNNs can be used.

The Sequential Problem

Whenever the points in a dataset are dependent on the other points, the data is said to be sequential. A common example of this is a time series, such as a stock price or sensor data, where each data point represents an observation at a certain point in time.

There are other examples of sequential data, like sentences, gene sequences, and weather data. But traditional neural networks typically can’t handle this type of data.

Why we can’t use feedforward neural networks to analyze sequential data ?

Let’s consider a sequential problem to see how well-suited a basic neural network might be. Suppose we have a sequence of data that contains temperature and humidity values for every day. Our goal is to build a neural network that imports the temperature and humidity values of a given day as input and output. For instance,to predict if the weather for that day is sunny or rainy.

This is a straightforward task for traditional feedforward neural networks. Using our dataset, we first feed a data point into the input layer.

The data then flows to the hidden layer or layers, where the weights and biases are applied.

Then, the output layer classifies the results from the hidden layer, which ultimately produces the output of sunny or cloudy. Of course, we can repeat this for the second day, and get the result.

However, it’s important to note that the model does not remember the data that it just analyzed. All it does is accept input after input, and produces individual classifications for every day.

In fact, a traditional neural network assumes that the data is non-sequential, and that each data point is independent of the others. As a result, the inputs are analyzed in isolation, which can cause problems if there are dependencies in the data.

To see how this can be a limitation, let’s go back to the weather example again. As you can imagine when examining weather, there is often a strong correlation of the weather on one day having some influence on the weather in subsequent days.

That is, if it was sunny on one day in the middle of summer, it’s not unreasonable to presume that it’ll also be sunny on the following day.

A traditional neural network model does not use this information, however, so we’d have to turn to a different type of model, like a recurrent neural networks model.

A Recurrent Neural Network has a mechanism that can handle a sequential dataset.

Long Short-Term memory, or LSTM for short.

Introduction :

The recurrent neural network is a great tool for modeling sequential data, but there are a few issues that need to be addressed in order to use the model on a large scale.

For example, recurrent nets are needed to keep track of states, which is computationally expensive. Also, there are issues with training, like the vanishing gradient, and the exploding gradient.

As a result, the RNN, or to be precise, the “Vanilla RNN” cannot learn long sequences very well. A popular method to solve these problems is a specific type of RNN, which is called the Long Short-Term memory, or LSTM

**LSTM cells connected to each other. source:Google**

**LSTM cell visual representation, source: Google**

LSTM maintains a strong gradient over many time steps. This means you can train an LSTM with relatively long sequences.

An LSTM unit in Recurrent Neural Networks is composed of four main elements. The “memory cell”, and three logistic “gates”.The “memory cell” is responsible for holding data.The write, read, and forget gates, define the flow of data inside the LSTM.

The Write Gate is responsible for writing data into the memory cell.The Read Gate reads data from the memory cell and sends that data back to the recurrent network.The Forget Gate maintains or deletes data from the information cell, or in other words, determines how much old information to forget.

In fact, these gates are the operations in the LSTM that execute some function on a linear combination of the inputs to the network, the network’s previous hidden state and previous output.

what is important here, is that, by manipulating these gates, a Recurrent Network is able to remember what it needs in a sequence and simply forget what is no longer useful.

Now, let’s look at the data flow in LSTM Recurrent Networks.

So, let’s see how data is passed through the network.

In the first time step, the first element of the sequence is passed to the network.

The LSTM unit uses the random initialized hidden state and output to produce the new hidden state and also the first step output. The LSTM then sends its output and hidden state to the net in the next time step. And the process continues for the next time teps.

So, as you can see, the LSTM unit keeps two pieces of information as it propagates through time.

First, a hidden state, which is in fact, the memory the LSTM accumulates using its gates through time, and second, the previous time-step output.

The original LSTM model has only one single hidden LSTM layer. But, as you know, in cases of simple feedforward neural network, we usually stack layers to create hierarchical feature representation of the input data.

So, does this also apply for LSTM’s? What about if we want to have a RNN with stacked LSTM,

for example, a 2-layer LSTM? In this case, the output of the first layer will feed as the input into the second layer. Then, the second LSTM blends it with its own internal state to produce an output. Stacking an LSTM allows for greater model complexity.

So, the second LSTM can create a more complex feature representation of the current input. That is, stacking LSTM hidden layers makes the model deeper, and most probably leads to more accurate results.

let’s see what happens during the training process.

The network learns to determine how much old information to forget through the forget gate. So, the weights, denoted as Wf, and biases denoted as bf will be learned through the training procedure.

We also determine how much new information, to incorporate through the input gate and its weights.

We also calculate the new cell state based on the current and the previous internal state, so the network has to learn its corresponding weights and biases.

Finally, we determine how much of our cell state we want to output an output gate. Basically, the network is learning the weights and biases used in each gate, for each layer.

Wrap Up

As we have seen, it is basic of Recurrent Neural Network (RNN) and Long Short-Term Memory (LSTM) network..