DLOA (Part-23)-Gated Recurrent Unit (GRU)

Dewansh Singh
6 min readMay 26, 2023

--

Hey readers, hope you all are doing well, safe, and sound. I hope you have already read the previous blog. The previous blog briefly discussed Standard RNNs and their implementation. If you didn’t read that you can go through this link. In this blog, we’ll be discussing the Gated Recurrent Unit (GRU), its working, and its basic implementation.

Introduction

The Gated Recurrent Unit (GRU) is a type of recurrent neural network (RNN) architecture that was introduced as a variation of the traditional RNN and addresses some of its limitations. GRU, like other RNN variants such as Long Short-Term Memory (LSTM), is designed to capture long-term dependencies in sequential data by maintaining a hidden state that carries information from previous time steps. However, GRU achieves this with a simpler structure and fewer gating mechanisms compared to LSTM, making it computationally more efficient.

RNNs are powerful models for handling sequential data, but they suffer from vanishing and exploding gradient problems, which make it difficult for them to effectively capture long-term dependencies. GRU was proposed as a solution to these issues by introducing gating mechanisms that allow the model to selectively update and retain information over time.

GRU and its Cells
Description of Functions Used

Architecture of GRU

The architecture of a GRU consists of a hidden state and two types of gates: an update gate and a reset gate. The update gate controls how much of the past information should be retained, and the reset gate determines how much of the past information should be forgotten.

GRU Architecture
  1. Update Gate: The update gate in GRU is responsible for deciding which parts of the hidden state should be updated with the current input. It takes as input the previous hidden state and the current input and outputs a value between 0 and 1 for each element of the hidden state. A value of 0 indicates that the corresponding element should be completely updated, while a value of 1 means that the element should remain unchanged.
  2. Reset Gate: The reset gate in GRU determines which parts of the hidden state should be reset to a default value. It also takes the previous hidden state and the current input as input and outputs a value between 0 and 1 for each element of the hidden state. A value of 0 means that the element should be completely reset, while a value of 1 indicates that the element should remain unchanged.
  3. Hidden State Update: Once the update and reset gates are computed, the new hidden state is updated using the following steps:
  • A reset update is obtained by element-wise multiplication of the reset gate and the previous hidden state.
  • The current input and the reset update are combined, and the result is passed through a non-linear activation function, typically the hyperbolic tangent function.
  • The update gate determines the weighting between the previous hidden state and the reset update. A value of 1 means that the previous hidden state should be completely retained, while a value of 0 indicates that only the reset update should be considered.
  • Finally, the updated hidden state is obtained by combining the previous hidden state and the reset update based on the weighting determined by the update gate.

Working of GRU

The working of GRU involves processing sequential data and updating the hidden state at each time step. The steps involved are as follows:

  • Initialization:
  1. Initialize the hidden state to a vector of zeros or small random values.
  2. Define the update gate and reset gate weights and biases.
  • Sequential Processing:

For each time step:

  1. Compute the update gate and reset gate values based on the previous hidden state and the current input.
  2. Update the hidden state based on the update and reset gates.
  • Output:
  1. After processing all the time steps, the final hidden state can be used for making predictions or passed through additional layers for further processing.

The training process involves feeding the sequential data along with the corresponding target outputs to the GRU model. The model is trained using backpropagation through time (BPTT), where the gradients are calculated with respect to the model parameters and used to update the weights and biases via an optimization algorithm such as gradient descent.

The benefits of GRU lie in its ability to capture long-term dependencies while being computationally efficient compared to LSTM. It achieves this by using fewer gating mechanisms and simplifying the update and reset processes. GRU has shown promising results in various applications such as machine translation, speech recognition, sentiment analysis, and more.

Implementation

We import the necessary TensorFlow modules, including the tf module for TensorFlow and the layers module for defining the layers of our model.

import tensorflow as tf
from tensorflow.keras import layers
  • We define a function build_gru_model that takes input_shape (the shape of the input data) and num_classes (the number of output classes) as arguments.
  • Inside the function, we create an instance of the Sequential class from tf.keras to build our model. This allows us to stack layers sequentially.
def build_gru_model(input_shape, num_classes):
model = tf.keras.Sequential()
  • We add a GRU layer to our model using the add method of the model object.
  • The GRU layer has 64 units (or cells) and return_sequences=True is set to preserve the output sequence for subsequent layers.
  • We also specify the input_shape as the shape of our input data.
model.add(layers.GRU(64, return_sequences=True, input_shape=input_shape))
  • We add a fully connected layer (Dense layer) to our model.
  • The layer has 64 units, and we use the ReLU activation function to introduce non-linearity into the model.
model.add(layers.Dense(64, activation='relu'))
  • We add an output layer to our model.
  • The output layer has num_classes units, corresponding to the number of classes in our classification problem.
  • We use the softmax activation function to obtain class probabilities.
model.add(layers.Dense(num_classes, activation='softmax'))
  • We compile the model by specifying the optimizer, loss function, and evaluation metrics.
  • In this case, we use the Adam optimizer, categorical cross-entropy loss (suitable for multi-class classification), and accuracy as the evaluation metric.
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
  • We train the model using the fit method of the model object.
  • We pass the training data (X_train and y_train), set the batch size to 32, and the number of epochs to 10, and provide validation data (X_val and y_val) to monitor the model's performance during training.
model.fit(X_train, y_train, batch_size=32, epochs=10, validation_data=(X_val, y_val))
  • We evaluate the trained model using the evaluate method.
  • We pass the test data (X_test and y_test) to compute the loss and accuracy of the model on the unseen data.
  • Finally, we print the test loss and accuracy to assess the model’s performance.
loss, accuracy = model.evaluate(X_test, y_test)
print('Test Loss:', loss)
print('Test Accuracy:', accuracy)

Remember to replace X_train, y_train, X_val, y_val, X_test, and y_test with your actual data when implementing the code.

Complete Code:

import tensorflow as tf
from tensorflow.keras import layers

# Define the GRU model
def build_gru_model(input_shape, num_classes):
model = tf.keras.Sequential()

# GRU layer
model.add(layers.GRU(64, return_sequences=True, input_shape=input_shape))

# Fully connected layer
model.add(layers.Dense(64, activation='relu'))

# Output layer
model.add(layers.Dense(num_classes, activation='softmax'))

return model

# Set the input shape and number of classes
input_shape = (sequence_length, input_dim) # Define the input shape of your sequence data
num_classes = 10 # Define the number of output classes

# Build the GRU model
model = build_gru_model(input_shape, num_classes)

# Compile the model
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

# Train the model
model.fit(X_train, y_train, batch_size=32, epochs=10, validation_data=(X_val, y_val))

# Evaluate the model
loss, accuracy = model.evaluate(X_test, y_test)
print('Test Loss:', loss)
print('Test Accuracy:', accuracy)

Conclusion

In conclusion, the Gated Recurrent Unit (GRU) is an effective variant of the traditional RNN architecture that addresses the limitations of capturing long-term dependencies. With its simplified structure and gating mechanisms, GRU provides a computationally efficient solution for processing sequential data while maintaining competitive performance in various tasks.

That’s it for now….I hope you liked my blog and got to know about GRU RNNs, it’s working, and their implementation.

In the next blog, I will be discussing the different types of Long Short Term Memory (LSTM) in detail one by one.

If you feel my blogs are helpful, please share them with others.

Till then Stay tuned for the next blog…

***Next Blog***

--

--

Dewansh Singh

Software Engineer Intern @BirdVision | ML | Azure | Data Science | AWS | Ex-Data Science Intern @Celebal Technologies