PyTorch Deep Learning Nanodegree: Recurrent Neural Networks

Andrew Lukyanenko

Published in

DataDrivenInvestor

6 min readAug 1, 2019

A fourth part of the Nanodegree: CNN

Introduction

Neural Networks

Convolutional Neural Networks

Recurrent Neural Networks

Generative Adversarial Networks

Deploying a Model

The end of this journey

General

In this lesson we learn about recurrent neural nets, try word2vec, write attention and do many other things. Also, we’ll work on a third project — generating TV scripts.

Recurrent Neural Nets

In this lesson, we go through the basics of RNN — Recurrent Neural Nets. There are many applications of this type of neural nets and one of them is generating sequences. It could be a sequence of text or time series. This little program draws sketches based on your drawing!

RNN Introduction

RNN History

RNN Applications

Feedforward Neural Network-Reminder

second part of the reminder

RNN (part a)

The basic three layer neural network with feedback that serve as memory inputs is called the Elman Network and is depicted in the following picture:

RNN (part b)

RNN- Unfolded Model

RNN- Example

Backpropagation Through Time (part a)

Backpropagation Through Time (part b)

Backpropagation Through Time (part c)

RNN Summary

From RNN to LSTM

Long Short-Term Memory Networks (LSTM)

Putting it All Together

Other architectures

Implementation of RNN & LSTM

In this lesson, we learn how to implement RNN. I’ll skip this section. Why? Because it is available and free to anyone. There is a free course by Udacity: Introduction to Neural Networks. Lesson 7 of this course contains this lesson, so you can go through it if you are interested.

Hyperparameters

This is quite an interesting section. Here we see the importance of several hyperparameters and how to select good values for them.

Introduction

Learning Rate

Minibatch Size

Number of Training Iterations / Epochs

Number of Hidden Units / Layers

RNN Hyperparameters

If you want to learn more about hyperparameters, these are some great resources on the topic:

Practical recommendations for gradient-based training of deep architectures by Yoshua Bengio
Deep Learning book — chapter 11.4: Selecting Hyperparameters by Ian Goodfellow, Yoshua Bengio, Aaron Courville
Neural Networks and Deep Learning book — Chapter 3: How to choose a neural network’s hyper-parameters? by Michael Nielsen
Efficient BackProp (pdf) by Yann LeCun

More specialized sources:

How to Generate a Good Word Embedding? by Siwei Lai, Kang Liu, Liheng Xu, Jun Zhao
Systematic evaluation of CNN advances on the ImageNet by Dmytro Mishkin, Nikolay Sergievskiy, Jiri Matas
Visualizing and Understanding Recurrent Networks by Andrej Karpathy, Justin Johnson, Li Fei-Fei

Embeddings & Word2Vec

This is an important lesson. Even though currently huge nets like BERTs become more commonly used, “usual” RNNs are still efficient, so it is worth learning how to use them. And embeddings play the key role in their training.

Notebooks with exercises and solutions are available here: https://github.com/udacity/deep-learning-v2-pytorch/tree/master/word2vec-embeddings

Word Embeddings

Embedding Weight Matrix/Lookup Table

Data & Subsampling

Subsampling Solution

Context Word Targets

Batching Data, Solution

Word2Vec Model

Model & Validations

Negative Sampling

SkipGramNeg, Model Definition

Complete Model & Custom Loss

Sentiment Prediction RNN

In this lesson, we’ll build a recurrent neural network that can accurately predict the sentiment of movie reviews. I’ll skip this section. Why? Because it is available and free to anyone. There is a free course by Udacity: Introduction to Neural Networks. Lesson 8 of this course contains this lesson, so you can go through it if you are interested.

Project: Generate TV Scripts

This is the third project of this course. In this project, we will try to write an RNN which will generate TV scripts. The code template for this project can be found here: https://github.com/udacity/deep-learning-v2-pytorch/tree/master/project-tv-script-generation

Data used

We will be using scripts from 9 season of TV series Seinfeld. Let’s have a look at it:

First of all, if we want to use word embeddings, we need to create unique ids for all words:

def create_lookup_tables(text):
    """
    Create lookup tables for vocabulary
    :param text: The text of tv scripts split into words
    :return: A tuple of dicts (vocab_to_int, int_to_vocab)
    """
    counts = Counter(text)
    vocab = [i[0] for i in sorted(counts.items(), key=lambda x: x[1], reverse=True)]
    
    vocab_to_int = {word: i for i, word in enumerate(vocab)}
    int_to_vocab = {v:k for k, v in vocab_to_int.items()}
    return (vocab_to_int, int_to_vocab)

Then we write a function which will convert punctuation into tokens:

def token_lookup():
    """
    Generate a dict to turn punctuation into a token.
    :return: Tokenized dictionary where the key is the punctuation and the value is the token
    """
    # TODO: Implement Function
    return {'.':  '||period||',
        ',':  '||comma||',
        '"':  '||quotation_mark||',
        ';':  '||semicolon||',
        '!':  '||exclamation_mark||',
        '?':  '||period||',
        '(':  '||left_parenthesis||',
        ')':  '||right_parenthesis||',
        '-': '||double_hyphen||',
        '\n': '||newline||'}

After this we write a custom data loader which will generate batches of data:

def batch_data(words, sequence_length, batch_size):
    """
    Batch the neural network data using DataLoader
    :param words: The word ids of the TV scripts
    :param sequence_length: The sequence length of each batch
    :param batch_size: The size of each batch; the number of sequences in a batch
    :return: DataLoader with batched data
    """
    # TODO: Implement function
    n_batches = len(words) // batch_size
    # full batches
    words = words[:n_batches * batch_size]
       
    features = []
    targets = []for idx in range(0, len(words) - sequence_length):
        features.append(words[idx: idx + sequence_length])
        targets.append(words[idx + sequence_length])   
        
    features = np.asarray(features)
    targets = np.asarray(targets)
    
    data = TensorDataset(torch.from_numpy(features), torch.from_numpy(targets))
    data_loader = torch.utils.data.DataLoader(data, shuffle=False, batch_size = batch_size)
 
    # return a dataloader
    return data_loader

And at last we build a simple LSTM:

class RNN(nn.Module):
    
    def __init__(self, vocab_size, output_size, embedding_dim, hidden_dim, n_layers, dropout=0.5):
        """
        Initialize the PyTorch RNN Module
        :param vocab_size: The number of input dimensions of the neural network (the size of the vocabulary)
        :param output_size: The number of output dimensions of the neural network
        :param embedding_dim: The size of embeddings, should you choose to use them        
        :param hidden_dim: The size of the hidden layer outputs
        :param dropout: dropout to add in between LSTM/GRU layers
        """
        super(RNN, self).__init__()
        # TODO: Implement function
        
        # set class variables
        
        # define model layers
        self.vocab_size = vocab_size
        self.output_size = output_size
        self.embedding_dim = embedding_dim
        self.hidden_dim = hidden_dim
        self.n_layers = n_layers
        
        self.embedding = nn.Embedding(self.vocab_size, self.embedding_dim)
        self.lstm = nn.LSTM(self.embedding_dim, self.hidden_dim, self.n_layers, dropout=dropout, batch_first=True)
        self.linear = nn.Linear(self.hidden_dim, self.output_size)
        
    
    def forward(self, nn_input, hidden):
        """
        Forward propagation of the neural network
        :param nn_input: The input to the neural network
        :param hidden: The hidden state        
        :return: Two Tensors, the output of the neural network and the latest hidden state
        """
        bs = nn_input.size(0)
        embedding = self.embedding(nn_input.long())
        
        lstm, h = self.lstm(embedding, hidden)
        lstm = lstm.contiguous().view(-1, self.hidden_dim)
        
        output = self.linear(lstm)
        output = output.view(bs, -1, self.output_size)[:, -1]return output, h
    
    
    def init_hidden(self, batch_size):
        '''
        Initialize the hidden state of an LSTM/GRU
        :param batch_size: The batch_size of the hidden state
        :return: hidden state of dims (n_layers, batch_size, hidden_dim)
        '''
        weight = next(self.parameters()).data
        
        if (train_on_gpu):
            hidden = (weight.new(self.n_layers, batch_size, self.hidden_dim).zero_().cuda(),
                  weight.new(self.n_layers, batch_size, self.hidden_dim).zero_().cuda())
        else:
            hidden = (weight.new(self.n_layers, batch_size, self.hidden_dim).zero_(),
                      weight.new(self.n_layers, batch_size, self.hidden_dim).zero_())
        
        return hidden

This is an example of script which was generated by my model:

Attention

Here are the notebooks with exercises and solutions: https://github.com/udacity/deep-learning-v2-pytorch/tree/master/attention

Introduction to Attention

Sequence to Sequence Models

part2

The encoder and decoder do not have to be RNNs; they can be CNNs too!

In computer vision, we can use this kind of encoder-decoder model to generate words or captions for an input image or even to generate an image from a sequence of input words. We’ll focus on the first case: generating captions for images, and you’ll learn more about caption generation in the next lesson. For now, know that we can input an image into a CNN (encoder) and generate a descriptive caption for that image using an LSTM (decoder).

Sequence to Sequence Recap

Encoding — Attention Overview

Decoding — Attention Overview

Attention Encoder

Attention Decoder

Bahdanau and Luong Attention

Multiplicative Attention

Additive Attention

Computer Vision Applications

Other Attention Methods

The Transformer and Self-Attention

This was the third part of Deep Learning Nanodegree. We learned how to write RNN and use them for a variety of tasks. Next part will be about Generative Adversarial Networks!