Step by step guide to twitter sentiment analysis

using python and twint library

Meera Maurya
Nerd For Tech
4 min readMay 2, 2021

--

Twint is an advanced python library which is used to scrape tweets from twitter, it does not require any authentication credentials to connect to twitter.

Photo by Lindsay Henwood on Unsplash

Let’s begin, here I am using jupyter notebook to write and execute the code.

First of all, you need to install twint library(I installed it via anaconda prompt), use the below code to get the correct and updated version of twint.

Import all the necessary libraries

Make an object of twint and pass the parameters to it, Let’s search for the tweets containing the word ‘covid india’ on twitter.

‘Since ’and ‘Until ’ is used to give the date range of the tweets. If you want to save the searched tweets into a pandas dataframe(df) then include c.Pandas = True. twint.run.Search(c) is used to pull the data from twitter.

After, we have scraped the data (4648, 37), let’s have look at the columns

columns extracted

We have ‘date’ column, using this we can extract some new columns like year, month and day and can use them accordingly.

Once we have our columns ready, lets pre-process the tweets(i.e removing urls,username and stopwords) as these do not add value to the sentiment. For this we will make a function and call it using lambda method. Here, I am using a stopwords.txt file which contains the list of stopwords to be removed.

After this we can check , how the processed tweet look like, let’s check for one record

Tweet: India in crisis as new COVID cases break global record https://t.co/bLt0NV73w7
Processed tweet : india crisis new covid cases break global record

So, at this stage we have everything we need. Now, let’s classify these processed tweets into different sentiments. Here textblob library is used to achieve the same. Use pip install textblob to install the library.

The ‘polarity’ column will have numerical values, let’s create a new column mapping the values to the words ‘positive,negative or neutral’ sentiments

Now, our table looks like this:

Let’s do some analysis on the prepared data

Plot the sentiment count,with the percentage, below is the code

sentiment count plot

We can figure out that out of all the tweets on ‘covid crisis india’ which were tweeted between 18-Apr-21 to 24-Apr-21, 25.24% of them are negative.

Let’s check day wise number of the tweets( we have data for one week)

we can see there are more numbers of tweets from people on friday. Now, lets check about the most used words in these tweets by making a word cloud out of the processed tweets. This can be done using the ‘wordcloud’ library from python.

word cloud

From, the word cloud we can make out that people are talking about the hospital crisis,govt efforts, oxygen crisis,poor,etc. These are very helpful in getting first hand information of the “what is being talked about”.

So, here we saw how we can generate a analysis report based on the tweets. this was a simple introduction to sentiment analysis using the python, twint and textblob. We can always go deeper and create a intense report on the analysis including other things also.

Thanks for reading this article, hope you enjoyed learning it!

--

--

Meera Maurya
Nerd For Tech

Not a writer, but yes.. love to share the things I know. I am here to share my thoughts and understanding on technical topics.