How to Do Sentiment Analysis on a Twitter Account
Figure out ahead of time if this is a person you want to follow
In this article, I’ll show you how to get and analyze the sentiment of tweets from a Twitter user using sentiment analysis.
“Sentiment analysis is the measurement of neutral, negative, and positive language. It is a way to evaluate spoken or written language to determine if the expression is favorable (positive), unfavorable (negative), or neutral, and to what degree.” — Clarabridge
“Sentiment analysis: the process of computationally identifying and categorizing opinions expressed in a piece of text, especially in order to determine whether the writer’s attitude towards a particular topic, product, etc. is positive, negative, or neutral.”— Oxford English Dictionary
There are many methods and algorithms to implement sentiment-analysis systems, which can be classified as:
- Rule-based systems that perform sentiment analysis based on a set of manually crafted rules
- Automatic systems that rely on machine learning techniques to learn from data
- Hybrid systems that combine both rule-based and automatic approaches
The Twitter user whose tweets I’ll be analyzing is none other than Microsoft co-founder Bill Gates. I chose Bill Gates because he’s trying to make a positive impact on the world, so I suspect his tweets will also be mostly positive.
If you’d prefer not to read this article and would like a video representation of it, you can check out the YouTube video below. It goes through everything in this article with a little more detail and will help make it easy for you to start developing your own program — or you can use both as supplementary materials for learning.
I’ll start by stating what I want this program to do. This program will analyze the sentiment of a Twitter account. More specifically, it’ll analyze the tweets/posts of one of Microsoft founders, Bill Gates.
# Description : This is a sentiment analysis program that parses the tweets fetched from Twitter using Python
Import The Libraries and Packages
Next, import some of the libraries that’ll be used within this program.
# Import the libraries
from textblob import TextBlob
from wordcloud import WordCloud
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
Load the Data and Authenticate to Twitter
I’m using Google’s website to write this program, so I’ll be using Google’s library to upload the CSV file that contains my Twitter app keys. You can just input your keys directly into the variables if you want.
# Load the data
from google.colab import files
uploaded = files.upload()# Get the data
log = pd.read_csv("Login.csv")
Next, I’ll store the Twitter keys/API credentials in variables.
# Twitter Api Credentials
consumerKey = log["key"]
consumerSecret = log["key"]
accessToken = log["key"]
accessTokenSecret = log["key"]
Now, it’s time to create the authentication object. Set the access token and access token secret, and authenticate to Twitter.
# Create the authentication object
authenticate = tweepy.OAuthHandler(consumerKey, consumerSecret)
# Set the access token and access token secret
# Creating the API object while passing in auth information
api = tweepy.API(authenticate, wait_on_rate_limit = True)
Extract and Analyze the Data
It’s time to extract the tweets of a Twitter user. First, I’ll get the last 100 posts for the Twitter user Bill Gates and show only the five most recent tweets.
# Extract 100 tweets from the twitter user
posts = api.user_timeline(screen_name="BillGates", count = 100, lang ="en", tweet_mode="extended")
# Print the last 5 tweets
print("Show the 5 recent tweets:\n")
for tweet in posts[:5]:
print(str(i) +') '+ tweet.full_text + '\n')
DataFrame with a column called
Tweets that’ll contain the posts from the Twitter user, and then show the first five rows.
# Create a dataframe with a column called Tweets
df = pd.DataFrame([tweet.full_text for tweet in posts], columns=['Tweets'])
# Show the first 5 rows of data
The data is a little dirty because it contains the @ symbol, hyperlinks, RTs, and may contain hash tags, so I’ll clean it up by creating a function to remove these symbols from the tweets. Then, I’ll apply that function to the tweets and show the results.
# Create a function to clean the tweets
text = re.sub('@[A-Za-z0–9]+', '', text) #Removing @mentions
text = re.sub('#', '', text) # Removing '#' hash tag
text = re.sub('RT[\s]+', '', text) # Removing RT
text = re.sub('https?:\/\/\S+', '', text) # Removing hyperlink
# Clean the tweets
df['Tweets'] = df['Tweets'].apply(cleanTxt)
# Show the cleaned tweets
I want to add the tweets’ subjectivity and polarity to the
DataFrame. In order to do this, I’ll create two functions: one to get the tweets called
Subjectivity (how subjective or opinionated the text is — a score of 0 is fact, and a score of +1 is very much an opinion) and the other to get the tweets called
Polarity (how positive or negative the text is, — score of -1 is the highest negative score, and a score of +1 is the highest positive score).
Next I’ll store the results into two columns — one called
Subjectivity and the other called
Polarity — and show the results.
# Create a function to get the subjectivity
# Create a function to get the polarity
# Create two new columns 'Subjectivity' & 'Polarity'
df['Subjectivity'] = df['Tweets'].apply(getSubjectivity)
df['Polarity'] = df['Tweets'].apply(getPolarity)
# Show the new dataframe with columns 'Subjectivity' & 'Polarity'
Let’s see how well the sentiments are distributed. A good way to accomplish this task is by understanding the common words by plotting word clouds.
A word cloud (also known as text clouds or tag clouds) is a visualization, the more a specific word appears in the text, the bigger and bolder it appears in the word cloud.
Let’s visualize all the words in the data using the word-cloud plot. It looks like the word “health” appears a lot in Bill Gates past 100 tweets.
# word cloud visualization
allWords = ' '.join([twts for twts in df['Tweets']])
wordCloud = WordCloud(width=500, height=300, random_state=21, max_font_size=110).generate(allWords)
Create a function to compute the negative (-1), neutral (0), and positive (+1) analysis, and add the information to a new column called
Analysis. Then, show the results.
# Create a function to compute negative (-1), neutral (0) and positive (+1) analysis
def getAnalysis(score):if score < 0:
elif score == 0:
return 'Positive'df['Analysis'] = df['Polarity'].apply(getAnalysis)# Show the dataframe
Print the positive tweets in ascending order. The most positive tweet is the #1 tweet.
# Printing positive tweets
print('Printing positive tweets:\n')
sortedDF = df.sort_values(by=['Polarity']) #Sort the tweets
for i in range(0, sortedDF.shape ):
if( sortedDF['Analysis'][i] == 'Positive'):
print(str(j) + ') '+ sortedDF['Tweets'][i])
Print the negative tweets in descending order. The most negative tweet is the #1 tweet.
# Printing negative tweets
print('Printing negative tweets:\n')
sortedDF = df.sort_values(by=['Polarity'],ascending=False) #Sort the tweets
for i in range(0, sortedDF.shape ):
if( sortedDF['Analysis'][i] == 'Negative'):
print(str(j) + ') '+sortedDF['Tweets'][i])
Plot the polarity and subjectivity as a scatter plot. It looks like the majority of the tweets are positive, as many of the points are on the right side of the polarity at value 0.00.
for i in range(0, df.shape):
plt.scatter(df["Polarity"][i], df["Subjectivity"][i], color='Blue') # plt.scatter(x,y,color)
Print the percentage of positive tweets. About 81/100 of the tweets are considered positive tweets.
# Print the percentage of positive tweets
ptweets = df[df.Analysis == 'Positive']
ptweets = ptweets['Tweets']
round( (ptweets.shape / df.shape) * 100 , 1)
Print the percentage of negative tweets. About 9/100 tweets are considered negative tweets.
# Print the percentage of negative tweets
ntweets = df[df.Analysis == 'Negative']
ntweets = ntweets['Tweets']
round( (ntweets.shape / df.shape) * 100, 1)
Show the value counts. Now we can see 81 tweets are positive, 10 are neutral, and 9 are negative.
# Show the value counts
Visually show the value counts.
# Plotting and visualizing the counts
df['Analysis'].value_counts().plot(kind = 'bar')
That’s all folks!
If you’re also interested in reading more on machine learning to immediately get started with problems and examples, then I strongly recommend you check out “Hands-on Machine Learning with Scikit-Learn and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems.”
It’s a great book for helping beginners learn how to write machine learning programs and for understanding machine learning concepts.
Thanks for reading this article. I hope it’s helpful to you all!