Sentiment Analysis from Tweets using Recurrent Neural Networks

5 min readMay 7, 2020

How I builded a Deep Learning Model to detect sentiments through a simple Tweet.

In this post, We’ll see the theory behind my project from a really interest field of Machine Learning and Artificial Intelligence.

Some Terminologies

NN: Neural Network
AI: Artificial Intelligence
RNN: Recurrent Neural Networks
LSTM: Long-Short Term Memory
val_acc: Accuracy in validation set
val_loss: Loss in validation set

Natural Language Processing(NLP)

When you want to build applications related to: Generate Text, Speech Recognition or Text Recognition, you need to use NLP!

Basically, NLP is the capacity to process words, but not with the “Human Form”

We’ll talk more about this, in NN Approach.

Suppose you’re building a model to detect buzzwords in news…

You need to follow some steps before start to build the model.

I know, this is boring to so many Data Scientist, but, this is necessary bro =(

First, you need to take the data to after, input into your model.

But when we’re talking about NLP using Neural Networks, we need to adapt this data for our NN.

Adapting our Data to use into NN

To input words into our NN, we need to transform this word in numbers.

And why we need to transform the words into numbers and not just input their into the NN?

To answer this question, first I wanna you look to the below image:

As you can see, in any NN’s structure, the magic happens because the math =)

Number is the universal language of Articial Intelligence, doesn’t matter the field. Can be Algorithms, Reinforcement Learning or any other, the math
will always reign.

To tranform words in numbers, we can use a technique called Tokenization.

Tokenization

The Tokenization process basicaly takes a sentence and divide word per word like the image below:

After Tokenize all the words of our dataset, we need to use some other techniques to give a number for each letter.

Following this analogy, in the final, we’ll have a vector to represents each word of our data, and we’ll be able to speak our NN’s Language.

To do this, we’ll use other technique called “Word Embedding”.

Word Embedding

Word Embedding is very useful technique when we want to represent words using numbers.

This technique tranform words in vector and after, build a cluster with all the vectors representing our words.

This cluster can be better visualized using the Projector Tensorflow. Below, you can see a representation of the cluster in multi-dimensional space:

Image from (https://www.lewuathe.com/t-sne-visualization-by-tensorflow.html)

Each colored point represented in the image is a vector and this vector corresponds a word.

My friends seeing me visualizing my Dataset into Multidimensional plan

Rapid Tip:

Don’t worry, with all this techniques, they can do easily just using Tensorflow.

After Embedd our words, we need to do an additional change to input into our NN.

Keep calm, this’ll be easy to understand and apply!

This change is called “Pad Sequence”.

Pad Sequence

To remember what this means, just remember the “Zero Padding” from Convolutional NN’s

In the Convolutional NN’s, we use the “Zero Padding” to make convolutions from the corners of our images. We do that putting zero into the corners of our images. Like below:

Zero Padding (https://medium.com/machine-learning-algorithms/what-is-padding-in-convolutional-neural-network-c120077469cc)

If you wanna know more about Convolutional NN’s and their techniques, read this article.

We need to do that because the words of our Dataset have different lenghts, and our model needs to receive same lenght words to work.

And you do it too using pad_sequences from Tensorflow.

Our Model

To input the Embedd data into our model, you’ll a Embedding Layer:

Embedding Layer (https://labs.bawi.io/deep-learning-word2vec-and-embedding-3b00ff571cc1?gi=a2b571d979f1)

For this project, I’m personally recommend to use RNN. This why I’m tested using a classical Feed Forward NN and the results was terrible.

Besides that, when you’re working with sequential data, where the previous information matter, Feed Forward NN’s don’t will delivery good results to you.

The choice for RNN’s can explain in just one word: Memory

RNN’s are capable for store data from previous steps, this favors us too much when we’re working with Sequential Data.

RNN Structure (https://medium.com/towards-artificial-intelligence/whirlwind-tour-of-rnns-a11effb7808f)

But here, you can find a problem…

If you’re working with a big Dataset, with long sentences, RNN don’t will performs good because they can’t memorize the data for too much time.

And my suggestion for you, like I did, is use LSTM architeture.

LSTM Architeture

This is a variation from RNN and very powerful alternative when you need that your network is able to memorize information for a longer period of time.

LSTM is based in gates with functions of decide what information need pass to the next cell, and some other configurations.

LSTM (https://medium.com/deep-math-machine-learning-ai/chapter-10-1-deepnlp-lstm-long-short-term-memory-networks-with-math-21477f8e4235)

Using a LSTM, your problems with long data will solved!

Some results

In the Twitter Sentiment Analysis model, I tried different architetures using LSTM’s

My Best 2 Models Results:

LSTM single layer with Sigmoid Activation in the last one layer: val_loss: 0.1066 — val_accuracy: 0.8634
LSTM double layer with Sigmoid Activation in the last one layer: val_loss: 0.4794 — val_accuracy: 0.9537

I used the Sentiment140 Dataset for this project.

Final Considerations

The objective of this post was to introduce you in the Word of NLP Problems. I hope you enjoy and any consideration, question or other thing you can comment in this post!

The Twitter Sentiment Analysis Repository is on my GitHub, link below:

https://github.com/gabrielmayers/twitter_sentiment_analysis

My Social Medias:

Linkedin: https://www.linkedin.com/in/gabriel-mayer-779b5a162/

GitHub: https://github.com/gabrielmayers

Instagram: https://www.instagram.com/gabrielmayerl/