Building Multi-Layer RNN from Scratch

Yash Bhaskar
3 min readJun 23, 2024

--

Google

In this article, We are making Multi-layer RNN from scratch for tasks like Many-To-Many Input Output, Many-To-One Input Output and One-To-Many Input Output. Referring them you can model them in any way you want.

First here is the code for a single RNN Cell in Pytorch:

self.parameters() is a method inherited from nn.Module, which this class extends.

In PyTorch, self.parameters() is a method that returns an iterator over all the parameters of the model. In this context, parameters typically refer to the weights and biases of the neural network layers.

For this RNNCell class, the parameters would include:

  1. The weights and (if enabled) biases of self.x2h, which is a linear layer transforming the input.
  2. The weights and (if enabled) biases of self.h2h, which is a linear layer transforming the hidden state.

The reset_parameters() method is using self.parameters() to iterate over all these parameters and initialize them with values drawn from a uniform distribution.

This choice of standard deviation (std) for weight initialization is based on mathematical considerations and is known as “Xavier initialization”. There are good reasons for this approach, one of them is Variance Preservation.

Variance preservation: The goal is to maintain the variance of activations and gradients as they flow through the network. This helps prevent the vanishing or exploding gradient problem.

This initialization helps ensure that the input to each neuron has unit variance, which is particularly important at the start of training. It helps the network converge faster and avoid issues with very large or very small activations.

Need for non-linearity

The non-linearity is crucial because:

  1. It allows the RNN to learn and represent complex, non-linear relationships in the data.
  2. Without it, the RNN would only be capable of learning linear combinations of its inputs.
  3. It helps in preventing the vanishing/exploding gradient problem, especially when using tanh.

This RNNCell processes a single time step. To create different RNN architectures:

a) Many-to-One:

for t in range(sequence_length):
hx = rnn_cell(input[t], hx)
output = some_output_layer(hx) # Use only the last hidden state

b) One-to-Many:

outputs = []
hx = rnn_cell(input, None)
for t in range(output_sequence_length):
output = some_output_layer(hx)
outputs.append(output)
hx = rnn_cell(zero_input, hx)

c) Many-to-Many:

outputs = []
for t in range(sequence_length):
hx = rnn_cell(input[t], hx)
output = some_output_layer(hx)
outputs.append(output)

Here is a flowchart showing how Data goes through a multi-layer RNN.

Simple Representation
How Data passes through Multi-layer RNNs

Example of Many-To-One:

Will publish similar article for LSTM, GRU and Transformers soon…

Update :
LSTM Article — Link
GRU Article — Link

Thank you for reading, I hope this article keeps you updated.

Connect with me :
Linkedin | Github | Medium : Email

--

--

Yash Bhaskar

AI/ML Enthusiast | AI Researcher | Kaggle Master | IIIT Hyderabad | LinkedIn: linkedin.com/in/yash-bhaskar | Open to work