DLOA (Part-22)-Standard RNN

Dewansh Singh
7 min readMay 23, 2023

--

Hey readers, hope you all are doing well, safe, and sound. I hope you have already read the previous blog. The previous blog briefly discussed RNNs and their Types. If you didn’t read that you can go through this link. In this blog, we’ll be discussing the Standard RNN, it’s working, and its basic implementation.

Introduction

Standard Recurrent Neural Networks (RNNs) are a type of neural network specifically designed to process sequential data. They were one of the earliest models designed to process sequential data and capture temporal dependencies. They are capable of capturing and modeling temporal dependencies in a sequence, making them well-suited for tasks such as natural language processing, speech recognition, and time series analysis. Standard RNNs operate on the principle of maintaining a hidden state that carries information from previous time steps and influences the computations at the current time step. In this explanation, we will dive into the architecture and workings of standard RNNs.

Standard RNN unit Architecture

Architecture

The architecture of a standard RNN consists of three main components: the input layer, the hidden layer, and the output layer. Let’s explore each component in detail.

Unfolding RNNs

At every time step, we can unfold the network for k time steps to get the output at time step k+1. The unfolded network is very similar to the feedforward neural network. The rectangle in the unfolded network shows an operation taking place.

  1. Input Layer: The input layer of a standard RNN receives sequential data as input. In the case of natural language processing, the input can be a sequence of words or characters, while in time series analysis, it can be a sequence of numerical values. Each element in the sequence is represented as a vector, and the input layer processes these vectors one by one at each time step.
  2. Hidden Layer: The hidden layer is the core component of a standard RNN. It maintains a hidden state that carries information from previous time steps and influences the computations at the current time step. The hidden state is updated iteratively as the network processes each element in the input sequence. The hidden layer performs a linear transformation of the current input and the previous hidden state, followed by a non-linear activation function. This computation can be represented as follows: h_t = f(W_hh * h_{t-1} + W_hx * x_t + b_h) where h_t is the hidden state at time step t, h_{t-1} is the hidden state at the previous time step, x_t is the input at time step t, W_hh and W_hx are weight matrices, b_h is the bias term, and f denotes the activation function.
  3. Output Layer: The output layer of a standard RNN takes the hidden state at the current time step and produces the desired output. The specific configuration of the output layer depends on the task at hand. For example, in a language modeling task, the output layer can be a softmax layer that predicts the probability distribution over the vocabulary, while in a sentiment analysis task, it can be a sigmoid layer that predicts the sentiment of a given text.

Working

The working of a standard RNN can be understood through the following steps:

  1. Initialization: At the beginning of processing a sequence, the hidden state of the RNN is initialized to a vector of zeros. This hidden state serves as the initial memory of the network.
  2. Sequential Computation: The RNN processes the elements of the input sequence one by one, updating the hidden state at each time step. At time step t, the current input vector x_t and the previously hidden state h_{t-1} are fed into the hidden layer to compute the new hidden state h_t. This computation is performed recursively for each element in the sequence.
  3. Forward Propagation: During the sequential computation, the hidden state is updated iteratively, allowing the RNN to capture the dependencies between elements in the sequence. The forward propagation process continues until the final element of the sequence is processed.
  4. Output Generation: Once the RNN has processed the entire sequence, the final hidden state h_T (where T is the length of the sequence) contains information about the entire input sequence. This hidden state can be used to generate the desired output by passing it through the output layer. The output can be a single value, a vector, or a probability distribution depending on the task.
  5. Backpropagation and Training: To train the standard RNN, a loss function is defined to measure the discrepancy between the predicted output and the target output. The gradients of the loss concerning the network parameters (weights and biases) are computed using the backpropagation algorithm. The gradients are then used to update the parameters via an optimization algorithm, such as gradient descent, to minimize the loss and improve the performance of the network.

It is important to note that standard RNNs suffer from the vanishing gradient problem, which limits their ability to capture long-term dependencies in sequences. As the gradients are backpropagated through time, they can diminish or vanish, making it challenging for the network to propagate useful information over long sequences.

This limitation led to the development of more advanced RNN architectures, such as Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU), which address the vanishing gradient problem and allow for more effective modeling of long-range dependencies.

Implementation

Here’s an example code to implement a standard RNN using the TensorFlow framework:

import tensorflow as tf
from tensorflow.keras.layers import SimpleRNN, Dense

# Define the RNN model
def rnn_model():
input_shape = (timesteps, input_dim)

# Input layer
inputs = tf.keras.Input(shape=input_shape)

# RNN layer
rnn = SimpleRNN(units=hidden_units, activation='tanh')(inputs)

# Output layer
outputs = Dense(output_units, activation='softmax')(rnn)

# Create the model
model = tf.keras.Model(inputs=inputs, outputs=outputs)

return model

In this code:

  1. We import the necessary libraries, including TensorFlow and the required layers from tensorflow.keras.
  2. We define the rnn_model function that creates the RNN model.
  3. We define the input shape of the RNN model, which consists of the number of timesteps and the dimensionality of the input data. Adjust these values based on your specific problem.
  4. We define the input layer of the model using the tf.keras.Input function.
  5. We add a Simple RNN layer to the model using the SimpleRNN class. We specify the number of hidden units (neurons) in the units parameter and the activation function as 'tanh'. Adjust these values based on your specific problem.
  6. We add an output layer to the model using the Dense class. We specify the number of output units and the activation function as 'softmax'. Adjust these values based on your specific problem.
  7. We create the model using the tf.keras.Model function, specifying the input and output layers.
  8. We return the created model.

To train and evaluate the model, you will need to compile it with an appropriate loss function, optimizer, and metrics, and then fit it to your training data. Additionally, make sure to preprocess your input data and encode your target variables if necessary.

Note that this code provides a basic implementation of a standard RNN. You can customize it further by adding more layers, adjusting hyperparameters, or incorporating regularization techniques based on the requirements of your specific task.

Conclusion

In conclusion, a standard Recurrent Neural Network (RNN) is a type of neural network architecture designed to handle sequential data. It is particularly effective in tasks involving natural language processing, speech recognition, time series analysis, and other tasks that require modeling temporal dependencies.

The architecture of a standard RNN consists of an input layer, a hidden layer, and an output layer. The input layer receives sequential data as input, while the hidden layer maintains a hidden state that carries information from previous time steps. The hidden layer performs computations using the current input and the previous hidden state, updating the hidden state at each time step. The output layer generates the desired output based on the final hidden state or intermediate hidden states.

The working of a standard RNN involves initializing the hidden state, sequentially processing the input elements, and updating the hidden state iteratively. The hidden state captures the contextual information from previous time steps, enabling the network to model dependencies in the sequence. The final hidden state can be used to generate the output, and the entire network is trained using backpropagation and optimization algorithms to minimize the discrepancy between the predicted and target outputs.

Standard RNNs have some limitations, including the vanishing gradient problem, which hinders their ability to capture long-term dependencies in sequences. As the gradients are backpropagated through time, they can diminish or vanish, resulting in the loss of important information. To address this issue, more advanced RNN architectures such as Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU) have been developed, which incorporate gating mechanisms to selectively retain and update information over long sequences.

Despite its limitations, a standard RNN can still be a powerful tool for modeling sequential data when the dependencies are relatively short-term. It can capture patterns and context within the sequence, making it suitable for tasks like sentiment analysis, named entity recognition, machine translation, and more. Additionally, standard RNNs are computationally efficient and have a relatively straightforward architecture, making them easier to implement and understand compared to more complex architectures.

Overall, standard RNNs provide a foundation for understanding and working with sequential data. They serve as a building block for more advanced architectures and have proven to be effective in a wide range of applications. By understanding the principles and workings of standard RNNs, one can gain insights into how to model and process sequential data effectively.

That’s it for now….I hope you liked my blog and got to know about Standard RNNs, it’s working, applications, and their different types.

In the next blog, I will be discussing the different types of Gated Recurrent Units (GRU) in detail one by one.

If you feel my blogs are helpful, please share them with others.

Till then Stay tuned for the next blog…

***Next Blog***

--

--

Dewansh Singh

Software Engineer Intern @BirdVision | ML | Azure | Data Science | AWS | Ex-Data Science Intern @Celebal Technologies