Performing Twitter Sentiment Analysis
What is a sentiment analysis?
Well, a sentiment analysis is formally defined as
a process of computationally identifying and categorizing emotions and opinions expressed in a piece of text, especially in order to determine whether the writer’s attitude towards a particular topic, product, etc. is positive, negative, or neutral.
Some of the real-world applications of sentiment analysis are Social Media monitoring (how are people reacting to certain things on social media), Marketing (if people are liking your product or not by analyzing product reviews), Political analysis (which politician is more popular among the majority of people), among many others.
Why use Twitter to collect data?
Twitter is a great source of collecting and analyzing thousands of diverse opinions and emotions expressed by real people all over the world on diverse range of topics every single second of every single day. Besides being a great repository for gathering data published by real people, a tweet is ideal for sentiment analysis for two other reasons —
i) tweets are easy to collect and categorize
ii) tweets are smaller in length(140 characters), so will exhaust the memory relatively less.
Alright, how do I get started?
First, you need a computer, obviously. You need to have Python and its associated environment set up in your machine and a good text editor to write your script. If you are confused, check out my article for a watered down explanation of getting everything necessary installed.
Step 1: Register for Twitter API
If you have never heard of an API before, well, it is basically a set of functions that helps you create a program that can access the features or data of other services, for example Twitter in our case. First, you need to create a Twitter Application here.
- Click on Create New App and fill in the details
Step 2: Install our dependencies
For this project, we will use two libraries — tweepy and textblob. Tweepy is an easy-to-use Python library for accessing the Twitter API and Textblob is a library to process textual data and we will use it to measure the sentiment of our tweets. The installation process is fairly simple — use pip (Refer to my article if you don’t know how to install Python packages)
Step 3: Let’s start writing our script!
First we will need to import our two dependencies.
from textblob import TextBlob
Next, we need to authenticate our program with the Twitter API. If you go back to your Twitter Application and head over to “Keys and Access Tokens” tab, you will see these four entities (which I have screened out because they are my precious 😇) that you will need to import in your script to create your authentication channel.
To include them in your script, create four variables and assign them with your magical access strings.
consumer_key = 'YOUR CONSUMER KEY'
consumer_secret = 'YOUR CONSUMER SECRET'
access_token = 'YOUR ACCESS TOKEN'
access_token_secret = 'YOUR ACCESS TOKEN SECRET'
Next, we will use a method called OAuthHandler written inside the tweepy library and pass our consumer key and consumer secret as the parameters. For our project, we don’t need to know the details of these method (because bro, abstractions). Let’s just say, tweepy uses the method to perform its internal calculations. We have to then call the set_access_token method of our auth variable and pass our access tokens as arguments.
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token = (access_token, access_token_secret)
After that, we will create our magical variable that will help us use all the API services we will need for our project. We will call it api and assign the API method of tweepy which takes only one argument — auth.
api = tweepy.API(auth) #IT'S MAGIC IT'S MAGIC
Now that we have established our access to the Twitter API, we can start doing a bunch of things. For starters, we will search for tweets.
Okay so what should we look for? How about my favorite comedian Trump? 😂
So we will basically need to create a query and assign it Trump so that our program looks for all the tweets talking about Trump and fetches them for us. We can also assign the upper limit of the maximum number of tweets we want our program to stop at, how about we go with ten thousand! *evil laugh*
query = 'Trump'
max_tweets = 10000
Tweepy has introduced a new way of iterating through timelines, user lists, direct messages, etc, making the process of traversing through pagination easier — Cursor objects. You can read its docs to know more about what you can do with it. Through the Cursor of our tweepy, we will pass our api’s search method and the query to basically tell it what to look for, and additionally tell our program how many items (tweets) to find. We will assign all these to our variable public_tweets.
public_tweets = tweepy.Cursor(api.search, q=query).items(max_tweets)
We are almost done. To print all the tweets, we need to write a simple for-loop. I implemented two additional filters within my for-loop to only get tweets which are in English and exclude all the retweets (you can tweak the filters according to your preference).
for tweet in public_tweets:
if (tweet.lang == "en") and (not tweet.retweeted) and ('RT @' not in tweet.text): #my filters. MINE!
If you have done everything right, these should get you all the tweets pouring down in your console like a freakin’ sandstorm! Now what we need to do is we need to calculate the sentiment of each tweet as we fetch them. Time to remember our good old friend TextBlob. Text Blob has a method called sentiment which analyzes the text that is passed through it and calculates its Polarity (range[-1.0, 1.0], where -1 is very negative and 1 is very postive) and Subjectivity (range [0.0, 1.0], where 0.0 is very objective and 1.0 is very subjective). We will create a variable, analysis, and assign it our tweets passed through TextBlob. Now we can call the sentiment method and print them with our tweets. We will just add two additional lines at the end to our original for-loop to get their respective sentiments.
for tweet in public_tweets:
if (tweet.lang == "en") and (not tweet.retweeted) and ('RT @' not in tweet.text):
analysis = TextBlob(tweet.text)
We can use these sentiments to create beautiful visualizations like pie-chart and scatter-plot and others using amazing Python libraries like matplotlib, seaborn, etc., which could be the subject of a separate article on its own.
That’s all from me. Hope this article gently nudges the sail of your ship on an amazing journey! Peace 🙏