AI — Sentiment analyis on Twitter and Reddit using transformer model

Shyam BV
Code Sprout
Published in
4 min readAug 15, 2021

Introduction

Technical and fundamentals returns are for just on average. Community sentiment and meme stocks have provided crazy returns after Covid. We have seen and heard it multiple times. This article will focus on sentiment analysis using the transformer-based model on Twitter and Reddit comments on any particular stocks.

Photo by George Pagan III on Unsplash

Background

The current article is a follow-up or closely attached to the below article.

Here I will focus only on the sentiment part(Twitter image section from the below diagram).

Also, I am going to use a hugging face transformer-based model to perform the sentiment analysis.

Setup

We are going to involve with Twitter, Reddit, and transformers. Below are the packages we need with other standard packages.

pip install praw==7.4.0
pip install tweepy==3.10.0
pip install transformers==4.9.2

We also need a developer account with API keys for Twitter and Reddit. There are a lot of articles that show how to create it.

Fetch tweets and comments

Let's assume we have a stock in our mind, and we need to find the sentiment of the stock on Twitter and Reddit. As a first step, let's get the tweets.

Fetching tweets

In the below piece of the code, we connect to the Twitter account and find the tweets using the stock ticker and stock name.

Fetching Reddit

Similar to Twitter, let's go and get the Reddit comments from the wallstreetbets subreddit.

Now we have all the required tweets and comments to perform the sentiment analysis.

Create sentiment model

As a next step, we have to build our sentiment model for analyzing the tweets and sentiment. There are so many models we can use for sentiment analysis. If you wanted to optimize the results, please try using a different model or perform a fine-tuning on top of it.

We will use a pre-trained RoBERTa model, which is optimized for Twitter. Here is the quick intro detour about the RoBERTa model

RoBERTa

Introduced at Facebook, Robustly optimized BERT approach RoBERTa, is a retraining of BERT with improved training methodology, 1000% more data, and compute power.

RoBERTa removes the Next Sentence Prediction (NSP) task from BERT’s pre-training to improve the training procedure. It introduces dynamic masking so that the masked token changes during the training epochs. Larger batch-training sizes were also found to be more useful in the training procedure. Here is a quick comparison with other models.

Table comparison[Source}

Use sentiment model

Fine-tuned Twitter RoBERTa base sentiment model is trained to understand the emojis and slang used in the tweets.

From pre-trained model

If you run the above code, it will download the pre-trained model into the system, and you need to save the tokenizer for future predictions.

Return the sentiment
Sample Tesla tweet sentiment

Now we need to combine all the functions to get us the results

Code to check Sofi
Results of Sofi stock in Twitter
Results of Sofi stock in Reddit

We can also use different sentiment models such as BERTweets, RoBERTa-Retrained, FastText, etc.

Final thoughts

  1. We have performed sentiment analysis on Twitter tweets and Reddit forums.
  2. Code can be expanded easily to StockTwits and other micro-blogging websites.
  3. We have used RoBERTa model to perform the prediction. Different models can also be used.

References:

  1. https://towardsdatascience.com/bert-roberta-distilbert-xlnet-which-one-to-use-3d5ab82ba5f8
  2. https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment

Get Code

Please subscribe to my newsletter to get the complete working code for my articles and other updates.

--

--