US Election 2020 Sentiment Analysis

--

The 2020 US Presidential election takes place on November 3, 2020. It might be the most highly anticipated election in years and people in the US have had many different thoughts leads up to it and still do right now. After months of speculations as to who would be standing against Donald Trump, it ended up being Joe Biden.

I decided to use Twitter to collect my data and what I am trying to do is go through recent tweets of Donal Trump and Joe Biden and collecting replies on those tweets to determine the sentiments of people towards them.

Much can be drawn regarding how the election will play out by looking at the opinions expressed through Twitter. The objective of this project was to determine, analyze, and visualize the sentiment in tweets pertaining to the 2020 US Presidential Election. I used Textblob to perform sentiment analysis to determine where the tweet was positive, negative, or neutral. Finally, tweets were visualized using a WordCloud, which was useful l in understanding the common words used in the tweets.

Collecting data this way has its issues especially when there is a clear discrepancy between the number of followers one has over the other. Also the demographic is an issue. Most users on Twitter are in the younger age bracket and we miss out on the elderly who are very politically active. Also, not everyone is on the same platform.

Tools

  • Python — a programming language
  • Tweepy — a type of RESTful API specifically for Twitter
  • Textblob — processed textual data library tool (already trained on numerous textual data.)
  • Pandas — data manipulation and analysis library
  • NodeJS — backend
  • NumPy — scientific computing library
  • Matplotlib — plotting library
  • NLTK — symbolic and statistical natural language processing libraries
  • Regular Expression — parsing strings and modifying dataset library sequence of characters that form a search pattern
  • JSON — file type
  • Seaborn — Data visualization library based on Matplotlib
  • Wordcloud — library for a visual representation of textual data

Using Tweepy

Tweepy is open-sourced, hosted on GitHub and enables Python to communicate with Twitter platform and use its API.

Create a Twitter App and Authorize

You have to request access from Twitter. Register and create a Twitter Developer Account For API.

Then you have to go to Developer Portal and click on Projects & Apps and the key (circled in red) and copy “API Key”, “API Secret”, and “Bearer Token”.

After that is done we can start using Twitter’s API. We need to import and install Tweepy using the pip package manager. I also imported a csv

We need to then enter our Oauth keys and have to set up Twitter Authentication. The OAuthHandler passes in the credentials to allow access to Twitter’s API features.

To check if it is working I check it’s authenticated with the below code, which returns my name.

Scraping user’s Tweets from a specific Twitter handle. The name is the user, Trump and Biden’s, Twitter handle and then using the tweet_id which is the string after the username in the URL which is unique for every tweet. Using that I scrapped thousands of their recent tweets and stored them in a CSV file.

Sentimental Analysis

Sentiment Analysis is the process of ‘computationally’ determining whether a piece of writing is positive, negative, or neutral. Positive and negative features are extracted from each passivate and negative review respectively. It’s also known as opinion mining, deriving the opinion or attitude of a speaker. It uses TextBlob.

After collecting the tweets from both Trump and Biden. We can analyze it now but first, we need to clean it up.

The TweetAnalyzer class has three functions whose goal is to analyze and categorize content from tweets. The first step is to clean tweets by removing links and specialize characters from tweets using the Regular Expression library. Next, tokenize the tweets by split words from the body of text into individual words. Finally, remove the stop words from the token — words like “I”, “am”, “are”, and “you” are irrelevant in text analysis.

Sentimental Analysis with TextBlob

TextBlob is a Python (2 and 3) library for processing textual data. It provides a simple API for diving into common natural language processing (NLP) tasks such as part-of-speech tagging, noun phrase extraction, sentiment analysis, classification, translation, and more.

TextBlob provides you with Polarity which ranges from -1 to +1(negative to positive) and tells whether the text has negative sentiments or positive sentiments. Polarity tells about factual information.

Now that I had a CSV file saved with data for both Donald Trump and Joe Biden, I could review the dataset.

I checked the sentimentality of Trump’s and Biden’s dataset.

Then I tried to display a graph to make reading the data easier as infographics are more helpful to us in understanding data.

I will try to ignore the neutrals as a lot of times the neutrals were basically links and weren’t exactly accurate. But looking at the positives vs negatives I see that Trump has a higher number of negatives vs positive sentiment on Twitter while Biden has a lower number of negatives in comparison.

Trials and Errors

While going through trial and error of my code I did run into Twitter’s scrape limits. So I had to wait an hour or so to be able to run my code again. I also faced some issues storing the data set into a CSV as a utf-8 file. Wasn’t sure what it was as I cleared the code and rewrote it and it magically worked out. But all I did do was change the write function, w, into the CSV to an append a+, which I am sure doesn’t really change much. But sometimes in coding things fix themself and work out in weird ways.

Conclusion

I feel that Twitter is a great place to get some feel for people’s sentiments but like most social media platforms there are limitations. People are always tweeting away their feeling about topics much like our very own President Donal Trump does.

One thing I did notice is that the analyzer isn’t really capable enough to detect sarcastic comments properly. It isn’t surprising because it works on tokens of a sentence and classify accordingly. So, If a sentence contains a large number of positive words like “greatest”, “excellent” in a negative comment which is written in a sarcastic way, it will definitely classify it as a positive sentiment.

As I stated at the start of this article, according to statistics not everyone is on Twitter.

And the demographic of people on social media platforms like Twitter.

--

--