Much Ado About PyTorch

Constructing RNN Models (LSTM, GRU, standard RNN) in PyTorch

Eniola Alese
ExplainingML
3 min readJun 13, 2018

--

The model in this tutorial is a simplified version of the RNN model used to build a text classifier for the Toxic Comment Challenge on Kaggle. The full model and project notebook which also contains: preprocessing the dataset in TorchText, loading pre-trained vectors, creating a model in PyTorch, fitting the model in FastAI and submission to Kaggle can be found on my Github repo here.

After looking at the math and theory behind training RNNs in our previous posts here and here. The next step is to actually implement the models in code, and to do this we would be using the PyTorch deep learning framework.

Creating our PyTorch Model Class involves two main activities:

  • Initialize Model Class
  • Declare forward() method.

Initialize Model Class

We initialize the nn.Module class with modules we would like to use in the model.

In our model, we use:

  • self.embedding_layer = nn.Embedding(input_size, embz_size) : an embedding layer to lookup the embedding vectors for the input sequence.
  • self.rnn = nn.RNN(embz_size, hidden_size) : a standard RNN module to compute the hidden state activation and output features.
  • self.output_layer = nn.Linear(hidden_size, output_size) : a linear layer to apply linear transformation to the output features from the RNN module.

Declare forward() method

The next step is to declare the forward() function which defines the computations to be performed at every call.

  • hidden = torch.zeros(self.num_layers, batch_size, self.hidden_size) : creates a tensor of zeros for the initial hidden state.
  • input_tensor = self.embedding_layer(input_sequence) : passes the input sequence to the embedding layer and returns its embedding vector.
  • output, hidden = self.rnn(input_tensor, hidden) : passes the input embeddings and initial hidden state to the RNN Module, and returns a tensor containing the output features and a tensor containing the hidden state for the length of the input sequence.
  • output = self.output_layer(output) : passes the output features through a linear layer y = Ax + b .
  • output = F.sigmoid(output[-1, :, :]) : returns a tensor containing the output activation (in this case sigmoid) of the last sequence. We output only the last sequence because we are building a text classifier which uses a many-to-one architecture.

Putting it all together:

Conclusion

The above shows how to construct a simple RNN model in PyTorch. You can find the full version of the model here, which accommodates for:

--

--