Long Short Term Memory(LSTM): Practical Application
Part 2 of NLP with Disaster Tweets Series
This is the part 2 of NLP with Disaster Tweets Series. The previous Part along with the blog post can be found in this link.
In the previous part, we created a baseline nlp model, we covered the basics of building a NLP model and ended up with an accuracy which is far from good. In this part, we are going to implement NLP model with LSTM architecture.
We will be seeing in detail, how a LSTM model works and how to implement it in our code.
Let’s get started.
This is the prerequisite block which we coded in previous tutorial. This code downloads the data set, imports all the necessary packages and imports the useful methods.
We should also do preprocessing on our data.
We clean our data by removing the URLs and make everything lowercase. This prevents our word embedding from duplicating a same word because it is either uppercase or lowercase.
The next step is to split our data into training and testing instances. Before we split data, we need to perform word embedding, get the word index and pad them. This was all explained in the previous blog post.
In this block, we perform word embedding, we index the words and pad them. We also create a function which can decode an encoded tweet back to its original form.
Before we get to building our model, we need to understand what is an LSTM (Long Short Term Memory).
LSTM is a type of Recurrent Neural Network (RNN).
Traditional neural networks are called Feed-Forward Networks because information only flows in one direction.
In a RNN, information can flow both in the forward direction and backward direction.
Let us consider an example to see, why feed forward networks are not useful for NLP.
Consider the sentence:
“Hello, I am Gakuto Kajiwara, I am from Japan… I speak”
When we read this sentence, we can correctly predict that the next word is japanese.
This is possible because the word japan has importance on the prediction even though it is not the last word before prediction. It might even be a few sentences ahead.
The ability of a network to remember information learned from previous words to use it in future predictions is embedded in something called a cell state.
Cell state is implemented in RNN where the hidden layers are connected and information flows between them.
LSTMs are much better modifications on RNN, they leverage few activation functions and gates.
Gates have the ability to add or remove information from cell state.
For a much better deep dive read colah’s blog on LSTM.
In the final block, we create a model with a LSTM layer. We set the embedding dimension to be 16, so that would be the output dimension of the first layer.
We need to set the input layer of the subsequent layers to be 16 to be able to predict using these inputs.
We are implementing CuDNNLSTM. We change the activation function of LSTM layer to be ‘tanh’ and set recurrent_activation to be ‘sigmoid’. The main difference between CuDNNLSTM and LSTM is the speed of processing. GPUs perform much better on CuDNNLSTM thereby saving us training time.
Our model was able to get 75% validation accuracy, this is much better than previous model which we built in previous post.
This is an ongoing series to build the best model for performing NLP on Disaster Tweets data set.