Data Science in the Real World

2020 US Presidential Election Twitter Sentiment Analysis

Using Tweepy, Textblob, NodeJS, Pandas, Numpy, Matplotlib, NLTK, re, JSON

Shiyan

Published in

Shiyan Boxer

6 min readSep 2, 2019

View

GitHub

Tools

Python — a programming language
Tweepy — type of RESTful API specifically for Twitter
Textblob — process textual data library
NodeJS — backend
Pandas — data manipulation and analysis library
NumPy — scientific computing library
Matplotlib — plotting library
NLTK — symbolic and statistical natural language processing libraries
Regular Expression — parsing strings and modifying dataset library sequence of characters that form a search pattern
JSON — file type

2020 US Presidential Election

The 2020 United States presidential election, scheduled for Tuesday, November 3, 2020, will be the 59th US election. The series of presidential primary elections and caucuses are held during the first six months of 2020. This nominating process is an indirect election, where voters cast ballots selecting a slate of delegates to a political party’s nominating convention, who then, elect their party’s presidential nominee.

Much can be drawn regarding how the election will play out by looking at the opinions expressed through Twitter. The objective of this project was to determine, analyze, and visualize the sentiment in tweets pertaining to the 2020 US Presidential Election. Raw text from tweets containing specific hashtags was streamed live from Twitter using the Tweepy API. The tweets were cleaned and tokenized using the Regular Expression library. Then, Textblob is used to perform sentiment analysis to determine where the tweet was positive, negative, or neutral. Finally, tweets were visualized using a WordCloud, which was useful l in understanding the common words used in the tweets.

Steps

Import libraries
Create a Twitter App and Authorize Twitter API
Authenticate
Stream tweets
Build Dataset
Sentiment Analysis
Analyze sentiment as positive, negative, or neutral
Plot

1. Important libraries and Tweepy API

import os
import tweepy
from textblob import TextBlob
from wordcloud import WordCloud
import pandas as pd
import numpy as np
import csv
import time
import re
import matplotlib.pyplot as plt
plt.style.use('fivethirtyeight')
import nltk
nltk.download('punkt') # https://www.nltk.org/data.html
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
from string import punctuation

2. Create a Twitter App and Authorize Twitter API

Create a Twitter App and pass security information that contains user credentials to variables in order to access Twitter API and fetch tweets.

1. Register and create a New App in Twitter Developer

Apply for access

Our new premium APIs bring the reliability, stability, and access of our enterprise APIs to our broader developer…

developer.twitter.com

2. Copy Acess Tokens

ACCESS_TOKEN = XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
ACCESS_TOKEN_SECRET = XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
CONSUMER_KEY = XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
CONSUMER_SECRET = XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

3. Save Acess Tokens in your Repository

# Authentication Keys https://developer.twitter.com/en/portal/projects/1268606526274580480/apps/18064440/keys

# Think of these as the user name and password that represents
# your Twitter developer app when making API requests. 
consumerKey = ' '
consumerSecret = ' '

# User-specific credentials used to authenticate OAuth 1.0a API requests. 
# They specify the Twitter account the request is made on behalf of.
accessToken = ' '
accessTokenSecret = ' '

3. Create an Authentication Object

# Create the authentication object
auth = tweepy.OAuthHandler(consumerKey, consumerSecret) 
    
# Set the access token and access token secret
auth.set_access_token(accessToken, accessTokenSecret) 

# Creating the API object while passing in auth information
api = tweepy.API(auth)

4. Stream Tweets

1. Define Constants

# Constants 
START_DATE = '2020-05-01'
TWEET_NUMBER = 100
SEARCH_WORD = '#Trump'
RATE_LIMIT = 180
SLEEP_TIME = 900/180 # 15 minutes = 900 seconds

1. Stream Tweets using Cursor Method

# Collect tweets using Cursor method # http://docs.tweepy.org/en/v3.5.0/cursor_tutorial.html  def buildTestSet():     tweet_list = []     tweets_fetched = tweepy.Cursor(api.search,                   q=SEARCH_WORD,                   lang='en',                   since=START_DATE).items(TWEET_NUMBER)       for tweet in tweets_fetched:         tweet_list.append({"text":tweet.text, "label":None})         print(tweet_list)              # Array where Test Set is stored     return tweet_list

5. Build the Data

# Build the test set
testDataSet = buildTestSet()

6. Train the Model

# Training the classifier
# Thanks to NLTK, it will only take us a function call to train the model as a Naive Bayes Classifier, 
# since the latter is built into the library:

NBayesClassifier=nltk.NaiveBayesClassifier.train(trainingFeatures)

7. Sentimental Analysis

1. Label Tweets

NBResultLabels = [NBayesClassifier.classify(extract_features(tweet[0])) for tweet in preprocessedTestSet] print(NBResultLabels)

2. Get the Majority Vote

# Get the majority vote

if NBResultLabels.count('positive') > NBResultLabels.count('negative'):
    print("Overall Positive Sentiment")
    print("Positive Sentiment Percentage = " + str(100*NBResultLabels.count('positive')/len(NBResultLabels)) + "%")
elif NBResultLabels.count('positive') < NBResultLabels.count('negative'): 
    print("Overall Negative Sentiment")
    print("Negative Sentiment Percentage = " + str(100*NBResultLabels.count('negative')/len(NBResultLabels)) + "%")
else: 
    print("Overall Neutral Sentiment")
    print("Neutral Sentiment Percentage = " + str(100*NBResultLabels.count('neutral')/len(NBResultLabels)) + "%")

2. Assign Positive, Negative, and Neutral Variables

# Assign possitve and negative variables

positive = NBResultLabels.count('positive')
negative = NBResultLabels.count('negative')
neutral = NBResultLabels.count('neutral')

print (positive)
print (negative)
print(neutral)

8. Plot the Results

# Plot
levels = ('Positive', 'Neutral', 'Negative')
y_pos = np.arange(len(levels))
performance = [positive, neutral, negative]

plt.bar(y_pos, performance, align='center', alpha=0.5)
plt.xticks(y_pos, levels)
plt.ylabel('Usage')
plt.title('Sentiment Analysis Results')

plt.show()

Conclusion

Our results suggest that Twitter is becoming a more reliable platform to gather the true sentiment of a certain topic. Comparing sentiment of tweets to reliable polling data shows a correlation as high as 84% using a moving average smoothing technique.

Key Terms

Twitter — A popular online news and social media platform with 330 million monthly users as of April 2019 (Statista). Users post and interact by retweeting and starting messages known as “tweets”. People express their opinions on certain topics in 280 characters or less.

Sentiment Analysis — The process of computationally identifying and categorizing opinions expressed in a piece of text, especially in order to determine whether the writer’s attitude towards a topic is positive, negative, or neutral (Google Dictionary).

Machine Learning (ML) — A method of data analysis that automates analytical model building. It is a branch of artificial intelligence based on the idea that systems can learn from data, identify patterns and make a decision with minimal human intervention (SAS).

Natural Language Processing (NLP) — A branch of artificial intelligence that helps computers understand, interpret and manipulate human language. NLP draws from many disciplines, including computer science and computational linguistics, in its pursuit to fill the gap between human communication and computer understanding (SAS).

Naive Bayes Classifier — Calculates the probability of a certain event happening based on the joint probabilistic distribution of certain other events to learn the correct labels from this training set and do a binary classification.

Resources

Shiyan Boxer

Hey, I’m Shiyan! I’m a Computer Engineering Innovation student at Queen’s University. I’m passionate fascinated by Artificial Intelligence, Design, Environmental Sustainability, and Entrepreneurship.

Personal Website

Shiyan Boxer

Hey, I’m Shiyan! I’m currently pursuing my second year of Computer Engineering in the Innovations stream at Queen’s…

shiyan-boxer.wixsite.com

Shiyan Boxer — Community Associate — Front Row Ventures | LinkedIn

Seeking an internship in the technology industry for Summer 2020. Computer Engineering student with business acumen in…

www.linkedin.com

GitHub

shiyanboxer — Overview

Computer Engineering Innovations @ Queen’s University Computer Engineering Innovations @ Queen’s University Popular…

github.com

Data Science in the Real World

2020 US Presidential Election Twitter Sentiment Analysis

Using Tweepy, Textblob, NodeJS, Pandas, Numpy, Matplotlib, NLTK, re, JSON

View

Tools

2020 US Presidential Election

Steps

1. Important libraries and Tweepy API

2. Create a Twitter App and Authorize Twitter API

1. Register and create a New App in Twitter Developer

Apply for access

Our new premium APIs bring the reliability, stability, and access of our enterprise APIs to our broader developer…

2. Copy Acess Tokens

3. Save Acess Tokens in your Repository

3. Create an Authentication Object

4. Stream Tweets

1. Define Constants

1. Stream Tweets using Cursor Method

5. Build the Data

6. Train the Model

7. Sentimental Analysis

1. Label Tweets

2. Get the Majority Vote

2. Assign Positive, Negative, and Neutral Variables

8. Plot the Results

Conclusion

Key Terms

Resources

Shiyan Boxer

Personal Website

Shiyan Boxer

Hey, I’m Shiyan! I’m currently pursuing my second year of Computer Engineering in the Innovations stream at Queen’s…

LinkedIn

Shiyan Boxer — Community Associate — Front Row Ventures | LinkedIn

Seeking an internship in the technology industry for Summer 2020. Computer Engineering student with business acumen in…

GitHub

shiyanboxer — Overview

Computer Engineering Innovations @ Queen’s University Computer Engineering Innovations @ Queen’s University Popular…

Written by Shiyan