Using Twitter API for Tweets Sentiment Analysis

Published in

The Startup

4 min readJan 25, 2021

Every day a lot of data is being generated in different forms, and one of the forms is text; one good source of this text data is tweeted, where people actively share their thoughts. Our goal in this article is to use Twitter API to extract tweets and perform sentiment analysis on them.

Required Libraries and modules:

import matplotlib.pyplot as plt
%matplotlib inline
import re 
import pandas as pd
import tweepy 
from tweepy import OAuthHandler 
from textblob import TextBlob 
import csv
import string
import nltk
nltk.download('stopwords')
from nltk.corpus import stopwords
n_words= set(stopwords.words('english'))
from nltk.tokenize import word_tokenize
from nltk.stem.porter import PorterStemmer
from nltk.stem import WordNetLemmatizer 
from wordcloud import WordCloud,STOPWORDS
porter = PorterStemmer()

lemmatizer = WordNetLemmatizer() 
nltk.download('punkt')
nltk.download('wordnet')

How to use the Twitter API:

Click here to see how to generate Twitter API

you will be getting four types of keys:

      # I replaced my keys with "xxxxxxx" :)      consumer_key = 'xxxxxxxx'
      consumer_secret ="xxxxxxxxxxx"
      access_token = 'xxxxxxxxxx'
      access_token_secret = 'xxxxxxxxxxx'

let's use these for authentication via python :)

def TwitterClient(): 
      # keys and tokens from the Twitter Dev Console 
      consumer_key = 'xxxxxxxx'
      consumer_secret ="xxxxxxxxxxx"
      access_token = 'xxxxxxxxxx'
      access_token_secret = 'xxxxxxxxxxx'

      # attempt authentication 
      try: 
          # create OAuthHandler object 
          auth = OAuthHandler(consumer_key, consumer_secret) 
          # set access token and secret 
          auth.set_access_token(access_token, access_token_secret) 
          # create tweepy API object to fetch tweets 
          api = tweepy.API(auth) 
      except: 
          print("Error: Authentication Failed") 

      return api #now we can make request to twitter using this api

Preprocessing of TEXT

Preprocessing is a key that very much helps the model to understand things better, and it also helps in decreasing computational cost and time. But it also reduces the space where the model has to work, and we will come to this later.

Let's list the points which we want to focus on while cleaning the data:

links: They don't take part in expressing any sentiments.
Tags: Tags like # tags and @ tags is useless to us.
Punctuation Marks
Numerical values
Stopwords: Words like and, the, etc. will not be contributing to sentiment.
Word Normalization: like stemming, Lemmatization, etc

Let's take a look at what I meant with the word Normalization:

Stemming:

It is a process of converting words to their root forms, like cats are converted to cat, etc. We will be using Porter stemmer offered by NLTK.

Porter Stemmer is an algorithm that consists of some sets of rules and stripping the suffix of a word. This process is also known as suffix striping; sometimes, the root generated by this algorithm is not even a meaningful word in English. But it is used because of its speed and simplicity.

def clean(text):
  # removing @ tags and links from the text
  text= ' '.join(re.sub("(@[A-Za-z0-9]+)|([^0-9A-Za-z \t]) |(\w+:\/\/\S+)", " ", text).split()) 
  # converting all letters to lower case and relacing '-' with spaces.
  text= text.lower().replace('-', ' ')
  # removing stowards and numbers
  table= str.maketrans('', '', string.punctuation+string.digits)
  text= text.translate(table)
  # tokenizing words 
  tokens = word_tokenize(text)
  # stemming the words 
  stemmed = [porter.stem(word) for word in tokens]
  words = [w for w in stemmed if not w in n_words]

  text = ' '.join(words)
  return text

Analyzing the sentiment:

We will be using the TextBlob library for analyzing the sentiment or polarity of the tweets.

analysis = TextBlob(tweet)
senti= analysis.sentiment.polarity

if senti<0 :
  emotion = "NEG"
elif senti>0:
  emotion= "POS"
else:
  emotion= "NEU"

Let's fetch the tweets using API and label with there emotion:

def get_tweets(query, count = 10): 

      tweets = [] 

      try: 
          # call twitter api to fetch tweets 
          fetched_tweets = api.search(q = query, count = count) 
          for tweet in fetched_tweets:
            # cleaning the tweets
            tweet= clean(tweet.text)
            # getting the sentiment from textblob
            analysis = TextBlob(tweet)
            senti= analysis.sentiment.polarity
            # labeling the sentiment
            if senti<0 :
              emotion = "NEG"
            elif senti>0:
              emotion= "POS"
            else:
              emotion= "NEU"
            # appending all data
            tweets.append((tweet, senti, emotion))
            
          return tweets
      except tweepy.TweepError as e:
          # print error (if any) 
          print("Error : " + str(e))

# getting the api access
api = TwitterClient() 
# calling function to get tweets, count is the number of tweets.
tweets = get_tweets(query = "Farmer's Protest", count = 200)df= pd.DataFrame(tweets, columns= ['tweets', 'senti', 'emotion'])
# droping retweets
df= df.drop_duplicates()

Congrats, we have successfully analyzed the sentiment or polarity of tweets using Twitter API.

Let's take a look at the most used words in Positive emotion expressed. We will be creating a word cloud using wordcloud library.

def wordcloud_draw(data, color = 'black'):
    words = ' '.join(data)    cleaned_word = " ".join([word for word in words.split()
                                and not word.startswith('#')
                                and word != 'rt'  
                            ])
    wordcloud = WordCloud(
                      background_color=color,
                      width=2500,
                      height=2000
                     ).generate(cleaned_word)    # using matplotlib to display the images in notebook itself.
    plt.figure(1,figsize=(13, 13))
    plt.imshow(wordcloud)
    plt.axis('off')
    plt.show()df_pos = df[ df['emotion'] == 'POS']
df_pos = df_pos['tweets']
wordcloud_draw(df_pos, 'white')

Get the code here :)

Thanks for Reading; share if you like. See you in the next story ✌️.

Further Readings:

Sentiment Analysis of a Tweet With Naive Bayes.

Using Twitter API for Tweets Sentiment Analysis

How to use the Twitter API:

Preprocessing of TEXT

Written by Harsh Khandewal