Natural Language Processing (NLP)

Sentiment Analysis with VADER and Twitter-roBERTa

Benchmarking of two different algorithms for short social media text analysis

6 min readOct 14, 2022

A Brief Intro to Sentiment Analysis

Sentiment analysis is an approach to identifying the emotional tone behind textual data. This helps organizations or businesses gather insights from unstructured text that comes from online sources such as surveys, social media channels, and comments. Various algorithms (models) are available for sentiment analysis tasks, and each has its pros and cons, such as:

Rule-based (lexicon-based): Such kinds of models have their own dictionaries (lexicons) of words or emojis with positive or negative weights. These algorithms count the number of positive and negative words in the given text. If the number of positives is more than the negatives, they return a positive sentiment. If both are equal, they return a neutral sentiment. Rules or dictionaries of words can be customized. And these kinds of algorithms do not require any model training.
Supervised Machine learning: These algorithms are fed with many labeled text data until they can learn patterns or the essence of the statement instead of clearly defined rules. However, for this approach labeled data is required, which is usually costly for practical applications!
Unsupervised Deep Learning: Such kinds of algorithms are able to learn patterns through multiple layers from unstructured and unlabeled data to perform sentiment analysis using various learning mechanisms, e.g. self-attention.

Selected Models and Data

One of the well-known rule-based algorithms is VADER from the NLTK package. According to developers, it is specifically attuned to sentiments expressed in social media. It is quite easy to implement yet a very powerful model. Another algorithm, I will talk about is a deep learning-based algorithm — Twitter-roBERTa from the TRANSFORMERS package. This model was pretrained with 124 million tweets and finetuned for sentiment analysis. Both of the models do not require any training, which means they can be directly applied to textual data out of the box.

Why specifically social media posts: because they are short, do not follow grammar rules, and contain many slang words and emojis, all of these make them more complicated.

Since the world economy is on the brink of recession and everyone is talking about it nowadays, Twitter is one of the most used social media platforms, using Twitter API we have collected relevant data for our analysis. I have selected tweets using the search term “economy”. We are going to analyze 1000 of these tweets that were posted between 01.10.2022 and 11.10.2022. (readers are referred to elsewhere for data collection details using Twitter API, e.g. here). Top 5 rows of my dataframe as in the following image (you can find the whole dataset in my GitHub repo) :

VADER Implementation

For VADER, clean your data as much as possible, i.e. remove mentions (usernames), URLs, hashtag symbols, and so on. As I said, it is quite easy to implement, check out the following code snippet:

After applying the above operations to our Twitter dataset, our dataframe become as following with preprocessed content for the VADER model and respective polarity score and sentiment label:

The polarity score is from -1 to 1, where -1 means most negative and 1 means most positive. Sentiment labels are assigned according to the polarity score: -1 to -0.25 => negative; -0.25 and 0.25 => neutral; and 0.25 to 1 positive.

roBERTa Implementation

Twitter-roBERTa implementation is a bit more complicated than VADER, particularly because it does not provide a compound score we need to calculate ourselves. In the code snippet below, I will show step by step guide on how to create a model, predict sentiment probabilities and calculate scaled (normalized) polarity scores it using the Tanh function:

Implementation of a BERT-based sentiment analyzer model from Transformers.

As a result, our dataset will take the following form:

Benchmarking

As I mentioned before, our comparison would be based on 1000 tweets related to the “economy” and with a minimal amount of preprocessing on the data (as implemented above).

Distribution of Polarities

Deep learning is amazing, is not it?! Yeah, we can create very sophisticated rules, but nothing can beat patterns learned from large amounts of data. As presented in the figure below, generally, VADER assigned a polarity score of 0. That means it was not able to find many positive or negative words in tweets. On other hand, roBERTa, which in reality catches the deep meaning of a text rather than individual words, found out that most of the tweets are smelling negative. Particularly in times, when everyone is pessimistic and the situation of the economy is not so promising, we all expect more negative tweets as roBERTa found out.

2. Fraction of Sentiment Labels

Even though the above distribution of polarities is pretty enough to describe the results, the pie chart below shows the difference more vividly. The chart is showing, out of 1000 tweets, how many of them are got labeled negative, neutral, and positive. According to the rule-based VADER model, positive and neutral tweets are equal amounts, whereas negative tweets are slightly higher. On the other side, according to the deep learning-based Twitter roBERTa model, almost two-thirds of tweets are negative, and very few amounts are positive. The latter predictions sound more realistic nowadays.

3. Time Required for Predictions

In the above two sections, we saw that accuracy of roBERTa is much higher and reflects more reality. But in the world of data science, accuracy is not alone as a determining factor. Since computational resources are limited, one of the requirements for a model is that it should be computationally light as much as possible, particularly for production.
It was already obvious from the implementation code snippets that VADER is much easier to apply compared to roBERTa. Moreover, in the following figure, you can see how much VADER is fast than roBERTa. On my personal laptop, VADER predicted sentiment of 1000 tweets only in 0.31 seconds. Contrary, roBERTa did the same task in 42.6 seconds (~130-fold slower).

Conclusion

There is no silver bullet, but depending on the task and computational resources available one can select VADER or roBERTa. However, with a minimalistic amount of data pre-processing and incredible accuracy, Twitter-roBERTa seems more promising, especially if the amount of data is not so big.

Jupyter notebook is available on my GitHub, including figure creation codes!