PyTorch Deep Learning Nanodegree: Recurrent Neural Networks
A fourth part of the Nanodegree: CNN
Recurrent Neural Networks
Generative Adversarial Networks
General
In this lesson we learn about recurrent neural nets, try word2vec, write attention and do many other things. Also, we’ll work on a third project — generating TV scripts.
Recurrent Neural Nets
In this lesson, we go through the basics of RNN — Recurrent Neural Nets. There are many applications of this type of neural nets and one of them is generating sequences. It could be a sequence of text or time series. This little program draws sketches based on your drawing!
Feedforward Neural Network-Reminder
The basic three layer neural network with feedback that serve as memory inputs is called the Elman Network and is depicted in the following picture:
Backpropagation Through Time (part a)
Backpropagation Through Time (part b)
Backpropagation Through Time (part c)
Long Short-Term Memory Networks (LSTM)
Implementation of RNN & LSTM
In this lesson, we learn how to implement RNN. I’ll skip this section. Why? Because it is available and free to anyone. There is a free course by Udacity: Introduction to Neural Networks. Lesson 7 of this course contains this lesson, so you can go through it if you are interested.
Hyperparameters
This is quite an interesting section. Here we see the importance of several hyperparameters and how to select good values for them.
Number of Training Iterations / Epochs
Number of Hidden Units / Layers
If you want to learn more about hyperparameters, these are some great resources on the topic:
- Practical recommendations for gradient-based training of deep architectures by Yoshua Bengio
- Deep Learning book — chapter 11.4: Selecting Hyperparameters by Ian Goodfellow, Yoshua Bengio, Aaron Courville
- Neural Networks and Deep Learning book — Chapter 3: How to choose a neural network’s hyper-parameters? by Michael Nielsen
- Efficient BackProp (pdf) by Yann LeCun
More specialized sources:
- How to Generate a Good Word Embedding? by Siwei Lai, Kang Liu, Liheng Xu, Jun Zhao
- Systematic evaluation of CNN advances on the ImageNet by Dmytro Mishkin, Nikolay Sergievskiy, Jiri Matas
- Visualizing and Understanding Recurrent Networks by Andrej Karpathy, Justin Johnson, Li Fei-Fei
Embeddings & Word2Vec
This is an important lesson. Even though currently huge nets like BERTs become more commonly used, “usual” RNNs are still efficient, so it is worth learning how to use them. And embeddings play the key role in their training.
Notebooks with exercises and solutions are available here: https://github.com/udacity/deep-learning-v2-pytorch/tree/master/word2vec-embeddings
Embedding Weight Matrix/Lookup Table
Sentiment Prediction RNN
In this lesson, we’ll build a recurrent neural network that can accurately predict the sentiment of movie reviews. I’ll skip this section. Why? Because it is available and free to anyone. There is a free course by Udacity: Introduction to Neural Networks. Lesson 8 of this course contains this lesson, so you can go through it if you are interested.
Project: Generate TV Scripts
This is the third project of this course. In this project, we will try to write an RNN which will generate TV scripts. The code template for this project can be found here: https://github.com/udacity/deep-learning-v2-pytorch/tree/master/project-tv-script-generation
Data used
We will be using scripts from 9 season of TV series Seinfeld. Let’s have a look at it:
First of all, if we want to use word embeddings, we need to create unique ids for all words:
def create_lookup_tables(text):
"""
Create lookup tables for vocabulary
:param text: The text of tv scripts split into words
:return: A tuple of dicts (vocab_to_int, int_to_vocab)
"""
counts = Counter(text)
vocab = [i[0] for i in sorted(counts.items(), key=lambda x: x[1], reverse=True)]
vocab_to_int = {word: i for i, word in enumerate(vocab)}
int_to_vocab = {v:k for k, v in vocab_to_int.items()}
return (vocab_to_int, int_to_vocab)
Then we write a function which will convert punctuation into tokens:
def token_lookup():
"""
Generate a dict to turn punctuation into a token.
:return: Tokenized dictionary where the key is the punctuation and the value is the token
"""
# TODO: Implement Function
return {'.': '||period||',
',': '||comma||',
'"': '||quotation_mark||',
';': '||semicolon||',
'!': '||exclamation_mark||',
'?': '||period||',
'(': '||left_parenthesis||',
')': '||right_parenthesis||',
'-': '||double_hyphen||',
'\n': '||newline||'}
After this we write a custom data loader which will generate batches of data:
def batch_data(words, sequence_length, batch_size):
"""
Batch the neural network data using DataLoader
:param words: The word ids of the TV scripts
:param sequence_length: The sequence length of each batch
:param batch_size: The size of each batch; the number of sequences in a batch
:return: DataLoader with batched data
"""
# TODO: Implement function
n_batches = len(words) // batch_size
# full batches
words = words[:n_batches * batch_size]
features = []
targets = []for idx in range(0, len(words) - sequence_length):
features.append(words[idx: idx + sequence_length])
targets.append(words[idx + sequence_length])
features = np.asarray(features)
targets = np.asarray(targets)
data = TensorDataset(torch.from_numpy(features), torch.from_numpy(targets))
data_loader = torch.utils.data.DataLoader(data, shuffle=False, batch_size = batch_size)
# return a dataloader
return data_loader
And at last we build a simple LSTM:
class RNN(nn.Module):
def __init__(self, vocab_size, output_size, embedding_dim, hidden_dim, n_layers, dropout=0.5):
"""
Initialize the PyTorch RNN Module
:param vocab_size: The number of input dimensions of the neural network (the size of the vocabulary)
:param output_size: The number of output dimensions of the neural network
:param embedding_dim: The size of embeddings, should you choose to use them
:param hidden_dim: The size of the hidden layer outputs
:param dropout: dropout to add in between LSTM/GRU layers
"""
super(RNN, self).__init__()
# TODO: Implement function
# set class variables
# define model layers
self.vocab_size = vocab_size
self.output_size = output_size
self.embedding_dim = embedding_dim
self.hidden_dim = hidden_dim
self.n_layers = n_layers
self.embedding = nn.Embedding(self.vocab_size, self.embedding_dim)
self.lstm = nn.LSTM(self.embedding_dim, self.hidden_dim, self.n_layers, dropout=dropout, batch_first=True)
self.linear = nn.Linear(self.hidden_dim, self.output_size)
def forward(self, nn_input, hidden):
"""
Forward propagation of the neural network
:param nn_input: The input to the neural network
:param hidden: The hidden state
:return: Two Tensors, the output of the neural network and the latest hidden state
"""
bs = nn_input.size(0)
embedding = self.embedding(nn_input.long())
lstm, h = self.lstm(embedding, hidden)
lstm = lstm.contiguous().view(-1, self.hidden_dim)
output = self.linear(lstm)
output = output.view(bs, -1, self.output_size)[:, -1]return output, h
def init_hidden(self, batch_size):
'''
Initialize the hidden state of an LSTM/GRU
:param batch_size: The batch_size of the hidden state
:return: hidden state of dims (n_layers, batch_size, hidden_dim)
'''
weight = next(self.parameters()).data
if (train_on_gpu):
hidden = (weight.new(self.n_layers, batch_size, self.hidden_dim).zero_().cuda(),
weight.new(self.n_layers, batch_size, self.hidden_dim).zero_().cuda())
else:
hidden = (weight.new(self.n_layers, batch_size, self.hidden_dim).zero_(),
weight.new(self.n_layers, batch_size, self.hidden_dim).zero_())
return hidden
This is an example of script which was generated by my model:
Attention
Here are the notebooks with exercises and solutions: https://github.com/udacity/deep-learning-v2-pytorch/tree/master/attention
The encoder and decoder do not have to be RNNs; they can be CNNs too!
In computer vision, we can use this kind of encoder-decoder model to generate words or captions for an input image or even to generate an image from a sequence of input words. We’ll focus on the first case: generating captions for images, and you’ll learn more about caption generation in the next lesson. For now, know that we can input an image into a CNN (encoder) and generate a descriptive caption for that image using an LSTM (decoder).
This was the third part of Deep Learning Nanodegree. We learned how to write RNN and use them for a variety of tasks. Next part will be about Generative Adversarial Networks!