Using Twitter API for Tweets Sentiment Analysis

Harsh Khandewal
The Startup
Published in
4 min readJan 25, 2021

Every day a lot of data is being generated in different forms, and one of the forms is text; one good source of this text data is tweeted, where people actively share their thoughts. Our goal in this article is to use Twitter API to extract tweets and perform sentiment analysis on them.

Required Libraries and modules:

import matplotlib.pyplot as plt
%matplotlib inline
import re
import pandas as pd
import tweepy
from tweepy import OAuthHandler
from textblob import TextBlob
import csv
import string
import nltk
nltk.download('stopwords')
from nltk.corpus import stopwords
n_words= set(stopwords.words('english'))
from nltk.tokenize import word_tokenize
from nltk.stem.porter import PorterStemmer
from nltk.stem import WordNetLemmatizer
from wordcloud import WordCloud,STOPWORDS
porter = PorterStemmer()

lemmatizer = WordNetLemmatizer()
nltk.download('punkt')
nltk.download('wordnet')

How to use the Twitter API:

Click here to see how to generate Twitter API

you will be getting four types of keys:

      # I replaced my keys with "xxxxxxx" :)      consumer_key = 'xxxxxxxx'
consumer_secret ="xxxxxxxxxxx"
access_token = 'xxxxxxxxxx'
access_token_secret = 'xxxxxxxxxxx'

let's use these for authentication via python :)

def TwitterClient(): 
# keys and tokens from the Twitter Dev Console
consumer_key = 'xxxxxxxx'
consumer_secret ="xxxxxxxxxxx"
access_token = 'xxxxxxxxxx'
access_token_secret = 'xxxxxxxxxxx'


# attempt authentication
try:
# create OAuthHandler object
auth = OAuthHandler(consumer_key, consumer_secret)
# set access token and secret
auth.set_access_token(access_token, access_token_secret)
# create tweepy API object to fetch tweets
api = tweepy.API(auth)
except:
print("Error: Authentication Failed")

return api #now we can make request to twitter using this api

Preprocessing of TEXT

Preprocessing is a key that very much helps the model to understand things better, and it also helps in decreasing computational cost and time. But it also reduces the space where the model has to work, and we will come to this later.

Let's list the points which we want to focus on while cleaning the data:

  1. links: They don't take part in expressing any sentiments.
  2. Tags: Tags like # tags and @ tags is useless to us.
  3. Punctuation Marks
  4. Numerical values
  5. Stopwords: Words like and, the, etc. will not be contributing to sentiment.
  6. Word Normalization: like stemming, Lemmatization, etc

Let's take a look at what I meant with the word Normalization:

Stemming:

It is a process of converting words to their root forms, like cats are converted to cat, etc. We will be using Porter stemmer offered by NLTK.

Porter Stemmer is an algorithm that consists of some sets of rules and stripping the suffix of a word. This process is also known as suffix striping; sometimes, the root generated by this algorithm is not even a meaningful word in English. But it is used because of its speed and simplicity.

def clean(text):
# removing @ tags and links from the text
text= ' '.join(re.sub("(@[A-Za-z0-9]+)|([^0-9A-Za-z \t]) |(\w+:\/\/\S+)", " ", text).split())
# converting all letters to lower case and relacing '-' with spaces.
text= text.lower().replace('-', ' ')
# removing stowards and numbers
table= str.maketrans('', '', string.punctuation+string.digits)
text= text.translate(table)
# tokenizing words
tokens = word_tokenize(text)
# stemming the words
stemmed = [porter.stem(word) for word in tokens]
words = [w for w in stemmed if not w in n_words]

text = ' '.join(words)
return text

Analyzing the sentiment:

We will be using the TextBlob library for analyzing the sentiment or polarity of the tweets.

analysis = TextBlob(tweet)
senti= analysis.sentiment.polarity


if senti<0 :
emotion = "NEG"
elif senti>0:
emotion= "POS"
else:
emotion= "NEU"

Let's fetch the tweets using API and label with there emotion:

def get_tweets(query, count = 10): 

tweets = []

try:
# call twitter api to fetch tweets
fetched_tweets = api.search(q = query, count = count)
for tweet in fetched_tweets:
# cleaning the tweets
tweet= clean(tweet.text)
# getting the sentiment from textblob
analysis = TextBlob(tweet)
senti= analysis.sentiment.polarity
# labeling the sentiment
if senti<0 :
emotion = "NEG"
elif senti>0:
emotion= "POS"
else:
emotion= "NEU"

# appending all data
tweets.append((tweet, senti, emotion))

return tweets
except tweepy.TweepError as e:
# print error (if any)
print("Error : " + str(e))

:

# getting the api access
api = TwitterClient()
# calling function to get tweets, count is the number of tweets.
tweets = get_tweets(query = "Farmer's Protest", count = 200)
df= pd.DataFrame(tweets, columns= ['tweets', 'senti', 'emotion'])
# droping retweets
df= df.drop_duplicates()

Congrats, we have successfully analyzed the sentiment or polarity of tweets using Twitter API.

Let's take a look at the most used words in Positive emotion expressed. We will be creating a word cloud using wordcloud library.

def wordcloud_draw(data, color = 'black'):
words = ' '.join(data)
cleaned_word = " ".join([word for word in words.split()
and not word.startswith('#')
and word != 'rt'
])
wordcloud = WordCloud(
background_color=color,
width=2500,
height=2000
).generate(cleaned_word)
# using matplotlib to display the images in notebook itself.
plt.figure(1,figsize=(13, 13))
plt.imshow(wordcloud)
plt.axis('off')
plt.show()
df_pos = df[ df['emotion'] == 'POS']
df_pos = df_pos['tweets']
wordcloud_draw(df_pos, 'white')

Get the code here :)

Thanks for Reading; share if you like. See you in the next story ✌️.

Further Readings:

Sentiment Analysis of a Tweet With Naive Bayes.

--

--

Harsh Khandewal
The Startup

Hi! My name is Harsh Khandelwal. I am a computer science student at NIT- Tiruchirappalli, India. I have good experience in data science.