Emotion Classification from tweets with LSTM, NLTK, Keras.
I’m gonna elaborate the usage of LSTM (RNN) Neural network to classify and analyse sequential text data.
Full Code Available at : https://github.com/saitejdandge/Sentimental_Analysis_LSTM_Conv1D
Problem Statement
We have to train a model that outputs an emotion for a given input text data. Output that we are trying to predict is just a label, not a continuous number hence we can structure this as a Classification Problem.
We’ll break this problem into 3 different modules,
1.0 Data Preparation
1.0 Understanding Data
1.1 Removing punctuations, words that start with ‘@’ and stop words
1.2 Tokenising words / Converting words to indices
1.3 Padding Words
1.4 Building Word Embeddings
1.5 One hot encoding labels
2.0 Building Model
2.1 Understanding Embedding Layer
2.2 Understanding LSTM Layer
2.3 Understanding Dense Layer
2.4 Adding Activations at each Layer.
2.5 Model Architecture with input and output shapes
3.0 Training our Model
3.1 Splitting data into training and testing dataset
3.2 Training the network
3.2 Plotting training and testing accuracies
Imports
Let’s start importing modules needed
1. Data Preparation
1.0 Understanding Data
//todo add csv here
We have 4000 tweets each labelled into one of below sentiment’s (labels)
{ anger, boredom, empty, enthusiasm, fun, happiness, hate, love, neutral, relief, sadness, surprise, worry}
What features to consider as input ? #Feature Selection
We call this methodology as Feature Selection. We should only consider columns that we think will affect the output. We can ignore tweet_id and author column, as emotional outcome don’t depend on them.
1.1 Removing punctuations, words that start with ‘@’ and stop words
- Word vectors are sensitive to words with punctuation and are case sensitive.
- Words that start with “@” are user and page references and doesn’t add value to output, as their just usernames and page names.
- Removing Stop words like a, an, the….etc, we need to remove them as they might bias our model’s output. We need concentrate on more important and key words that we think will have impact on our output.
1.2 Tokenising words / Converting words to indices
Now that, we have preprocessed words by removing unnecessary and modifying them, we now go forward and convert each word into an index, We get indices by sorting all the words in alphabetical order and adding +1 (index 0 — Unknown word)
1.3 Padding Words
We’ll pad each input entry with 20 words each, we pad our last empty input entry with unknown words, if we ran out of words.
1.4 Building Word Embeddings
Word Embeddings are vectorised representation of words. Assume we have an space of n-dimensions. Each word in our dictionary has n dimensions and fit in our word space. This is to preserve relative distances among words and give a semantic understanding to our neural network.
Example : Distance between words “kitchen and battery” should be high while compared to distance between words “kitchen and bathroom”. As words “kitchen, bathroom” are somehow closely related as they are rooms.
We’ll use trained GloVe 50D (Each word will have 50 dimensions) model for word embeddings, we’ll just transfer its weights, instead of retraining it. This on a whole gives kickstart for our neural network.
Download GloVe 50D Word Embedding from here. We’ll add Embedding Layer in our network, this layer will return us word vector, given word index.
Embedding Layer internally has Embedding Matrix of
shape ( vocab+1, Embedding Dimension)
In our case, it is
(vocab +1, 50D) as we are using GloVe 50D vectors, Each word will be represented as 50 D vector and they all are sorted row wise based on indexes that are obtained after labelling words alphabetically.
1.55 One Hot encoding Labels
In machine learning, one-hot is a group of bits among which the legal combinations of values are only those with a single high (1) bit and all the others low (0).
Labels : { anger, boredom, empty, enthusiasm, fun, happiness, hate, love, neutral, relief, sadness, surprise, worry}
Each unique emotion is assigned to an integer value (Label Encoding).
For example, “anger” is 0, “boredom” is 1, and “empty” is 2….etc in an alphabetical order.
After this, a one-hot encoding can be applied to the integer representation. This is where the integer encoded variable is removed and a new binary variable is added for each unique integer value.
There are 13 unique labels/emotions and therefore 13 binary variables are needed. A “1” value is placed in the binary variable for the emotion and “0” values for the other emotions.
Example :
- anger : 0000000000001
- boredom : 0000000000010
- empty: 0000000000100
..etc
2. Building Model
2.1 Understanding Embedding Layers (First Layer)
This layer acts as lookup table for vectors, given word index. It will return embedded word vector.
Embedding layer can only be used as first layer in Keras.
Our input layer will of be size : (None,20) ; None means variable number.
As we have padded 20 words for each input, in data preparation stage. we have 20 word indices in each row.
Embedding layer will convert each index to its corresponding vector by taking help of Embedding . 50 because, we have used GloVe 50 D in word embedding step.
input (None, 20) = >(Embedding Layer) => (None,20,50)
2.2 Understanding LSTM / GRU layers (Hidden Layers)
They fall under category call Recurrent Neural Networks. Recurrent Neural networks will consider output from previous timestamp as input for current timestamp.
Due to an internal memory, which makes it perfectly suited for Machine Learning problems that involve sequential data.
Output of Embedding layer will be fed to this LSTM layer.
We’ll use LSTM layer with 100 units. This layer has 100 RNN Cells, this number is variable and can be adjusted according to our need and complexity of our data.
Input given to LSTM will be considered as (batch_size, timesteps, features).(From Keras Documentation)
We have an option to modify return_sequences variable in LSTM constructor.
There are two different scenarios based on return_sequences
return_sequences = True
Output of RNN layer will include all the outputs from all the units/cells in that layer.
(None, 20,50) = > LSTM(100, return_sequences=True) => (None,20,100)
In the next step, we’ll flatten.
(None, 20,100) = > Flatten => (None,2000)
return_sequences = False
Output of RNN layer will only include the output from its last unit/cell.
input (None, 20) = >LSTM(100, return_sequences=False) => (None,100)
we can either go with any of above scenario, based on our requirement, in the end we’ll have output shape in 2 dimensions that is either
output from LSTM Layer + Flattening=> (None,2000) if return_sequences=True
or
output from LSTM Layer =>(None, 100 ) if return_sequences=False
2.3 Understanding Dense Layer (Last Layers)
We connect all the data that we get from previous levels using Dense Layers.. We keep reducing output units to (None, labels_count) by adding multiple Dense Layers.
(None, 2000) or (None,100)= > Dense(300) => (None,300)
Adding another dense layer
(None,300) => Dense(labels_count) => (None,13)
13 is labels’ count in our problem. i.e Total number of emotions count.
2.4 Adding Activation at each layer
We’ll add activation at each layer to add non linear understanding to our model. We’ll use ReLU at every layer and softmax for last layer.
Softmax is probability distribution activation function and helps in achieving better results by distributing probability among labels for a given input.
After adding this, we get 13 Outputs each lying between 0 and 1 for each input. Each output represents probability of that emotion for given input. One with highest value can be considered as our prediction.
2.5 Model Architecture with input and output shapes
Additional Info : we can add 1D convolution layers as hidden layers for better insights
3.0 Training Model :
3.1 Splitting data into training and testing dataset
We gotta split our data into two parts, Training Data, Testing Data
We use Training dataset to train our neural network. Test dataset to provide an unbiased evaluation of a final model fit on the training dataset.
This is to find a sweet spot between underfitting and overfitting of our model.
3.2 Training the network
Now, we define number of epochs and check point conditions, these checkpoints will save our model locally if there’s an improvement.
Let’s now start training our model.
3.3 Plotting training and testing accuracies
This is will start training, now we shall monitor the accuracies and plot them as graph to understand the results.
As you can see our training accuracy has reached around 40, while validation accuracy (testing accuracy) is fluctuating.
By this results, we can say that our model still needs more data to understand insights to calculate emotion, Our Model ended up getting training and testing accuracy of 50 %.
Full code Available at : https://github.com/saitejdandge/Sentimental_Analysis_LSTM_Conv1D