Finding Positivity on Twitter (The nerdy way)

Understanding Recurrent Neural Networks and LSTMs to do sentiment analysis and much more

8 min readOct 27, 2019

Positivity seems out of reach nowadays with so many twitter accounts spreading negativity. However, life doesn’t have to be so bad, and you can decide to find more positivity on twitter (the nerdy way). We can use deep learning to detect whether an account is mainly positive and negative. This article is going to be a long one, so buckle up and get ready for a thrilling ride!

Basics of Deep Learning and Neural Networks

If you do not know what Neural Networks are and how they work, you are in for a treat. Check out this video to learn about the vast and exciting world of neural networks. I will not be diving into the depths of neural networks in this piece to avoid repetitiveness.

I learned Deep Learning from a course co-taught by Luis Serrano on Udacity!

Intro to Recurrent Neural Networks

When you read a sentence, you understand each word based on the words or sentences that came before it. It would be impossible to understand each standalone word in the context of thought or and idea without being given any other words. It would be like forgetting the last word as soon as you start reading the next word. It sounds like a terrible way to read, doesn’t it?

Well, it is terribly inefficient. But do you know of anyone that actually reads like that? Probably not/ But, I do. Traditional Neural Networks. They can be thought of as having an exceptionally terrible memory. However, since humans thoughts have persistence, conventional neural networks find tasks like translation, language modelling, speech recognition to be inherently hard. What’s the answer? Recurrent Neural Networks are otherwise known as RNNs (nerdy people love their acronyms).

What makes RNNs so good?

RNNs have shown great success when dealing with sequential data. Sequential data refers to any type of data in which it is essential to know the order of the data. To predict the next word in a sentence, it is crucial to understand what the sentence preceding it was about for context and the type of word preceding it was (e.x verb, noun, pronoun, adjective, etc.) to follow grammatical rules.

The most basic representation of an RNN module. A part of the neural network, A, looks at some input, Xt, and outputs a value. Ht. A loop allows information to be passed from one step of the network to the next. Credit: Chris Olah

At the most basic level, RNNs have loops to transfer information from one step of the network to the next. If we examine RNNs further, we can see that they form a chain-like structure where the hidden state contains information from all the steps leading up to the current state. This hidden state is also passed into the network along with the specified input to take into consideration previous data. Since RNNs deal with a series of inputs, they do not have a constraint for the input, output, or the number of computational steps to go from one to the other.

Character Level Models

One of the most basic and fundamental uses of RNNs is to predict the next word in a sentence given a large chunk of data as input. The RNNs deployed in this cause usually use character-level models rather than word-level models. This allows the list of possible outputs to be far less (layers of 26 letters vs. millions of words in the English language and subcultures). Once given text as input, it will calculate a probability distribution of what the next letter will be. It will then choose the letter that is most likely to be the correct one and passes some values along to the next node in the network to predict the following letter and so on.

LSTMs

After first hearing about RNNs, it may seem that they are absolutely perfect. Just like everything else in this world, RNNs have their own shortcomings. One of the biggest problems with vanilla or plain RNNs is their inability to utilize long term memory. The next predicted word or character will be almost entirely be influenced by the past 10 words or so. This isn’t the ideal scenario when dealing with long passages of text, speech or any data which requires a network to remember information between the long gap of inputs.

What an LSTM module looks like (You can notice the 3 gates with the sigmoid activation layer in the module). Credit: Chris Olah

As always, the computer science gods (otherwise known as computer science academics) answer to these shortcomings of die-hard nerds. This time, they produced Long Short Term Memory networks capable of learning long-term dependencies. How do they achieve this? While traditional RNNs have one single neural network layer, LSTM networks have four additional layers in the form of a hidden state and gates. Each of these four neural network layers interacts with each other in such a way to preserve long term memory and improve overall performance over a traditional RNN.

Cell State

The neural layer dealing with the new long-term memory is the cell state. It passes the long-term memory from module to module, which gets slightly updated each time.

Hidden State

The hidden state is responsible for storing and updating the working memory. This memory is often referred to as short term memory and is the neural layer found in vanilla RNNs.

Forget/Remember Gate

While long term memory provides LSTM networks with enhanced capability, it can also be impractical to store all the long-term memory all the time. So, who decides what information is retained and what is discarded? This problem is solved by introducing the forget gate. Technically, the layer consists of a vector of numbers which assigns each set of long-term information with a value between 1 (keep it) and 0 (forget it completely).

Input (Save) Gate

The primary function of the input gate is to decide what new information should be added to the long-term memory (cell state). It saves a selective portion of the input and adds it to the cell state for future use.

Output (Focus) Gate

You may be wondering how the short term memory gets updated. Output gate to the rescue! The output gate transfers or focuses seem of the long-term memory that will be useful immediately.

What about finding Positivity?

That is where sentiment analysis or opinion mining comes in. We will be using sentiment analysis to detect positivity among twitter users by analyzing 200 of their latest tweets.

Sentiment Analysis using DL

You may have heard of sentiment analysis as it used by many companies to track customer satisfaction across social media, news, reviews and more. The basics of sentiment analysis are straightforward — identify whether the given data has an overall positive, negative or neutral sentiment. There are multiple methods of doing sentiment analysis with varying degrees of accuracy. However, the most exciting of which is using RNNs, which follows the standard deep learning pipeline:

Data Processing

While there can be many steps involved in training the model, I will breeze through data processing. To train the model, labelled data is necessary. This data usually takes the form of short passages of text which have a label indicating whether they have a positive or negative sentiment. After this has been done, you can create mini-batches of the data for more effective training and then let neural networks work their magic.

Training the Model

The hardest part of training the classifier is getting sufficient computation power (or use cloud GPUs, i.e. AWS, Google Cloud Platform, etc.) and selecting the network infrastructure. Selecting the model’s infrastructure involves choosing which layers you want in your model and in which order. You also have to choose parameters like input size, output size and the number of hidden layers.

Note: For the sentiment analysis model, I have decided to give with has LSTM layers instead of plain RNNs

What’s happening during the training process?

To keep simple, a whole lot of math. All of this results in the computer trying to find correlations between individual words, phrases, letters and their position in a sentence to predict how that will impact the sentiment. It starts off from scratch but then realizes whether it’s prediction was correct or incorrect by referencing the labels. If it was wrong, it will try and tweak individual weights within the various layers to get the desired output. Once it has a good idea of what keywords or phrases trigger a particular sentiment, it starts making better predictions.

Testing and Validation

After training over thousands (or even millions) of iterations, the network has a good idea of what keywords or phrases trigger a particular sentiment, it starts making better predictions. And before we know it, it starts spitting out the sentiment with 95 % accuracy when testing it with separate testing and validation data (the dataset gets split into training (most of the data), testing and validation data)

That wasn’t so hard after all. Now that we have our model. We can tokenize (data processing) any review and pass it through the model to get the sentiment.

The Results are In!

Sentiment Analysis of the following accounts (left to right): Donald Trump, Justin Bieber and Ninja (Tyle Blevins) run on PosiTweet (my own sentiment analysis tool)

If you look closely, you will see that many of the tweets were incorrectly classified, and that is simply because of the model having a relatively small dataset. I did not have enough computation power to train on the 1.4 million tweets dataset (GCP would get really mad 😂 at me for using their free plan and training nonstop for maybe even a couple of days). However, check out my GitHub for projects like teaching the computer to speak Shakespearean.

TL;DR

- Recurrent Neural Networks are preferred over other types of neural networks when dealing with sequential data.
- LSTMs are an improvement over classic RNNs as they can hold long term memory and are used most commonly
- You can use RNNs (/LSTM networks) to detect the sentiment in a particular piece of text.

If you liked this article, feel free to give it some 👏and comment on your thoughts in the comments! If you want to stay updated about new articles covering everything from self-growth to machine learning, follow me on Medium.

I would love to meet anyone new, connect with others and chat about literally anything. Feel free to reach out via my LinkedIn or by email: mehta.r.jatin@gmail.com! Don’t feel shy, just drop by (wow, that rhymes).