Sentiment Analysis of Twitter Timelines

Dmitry Rastorguev
4 min readSep 25, 2017

--

This post will show and explain how to build a simple tool for Sentiment Analysis of Twitter posts using Python and a few other libraries on top. Full code is available on GitHub.

The basic flow of data for the purpose of this analysis is as follows (the relevent Python libraries are in brackets):

  • Pull tweets via Twitter API (tweepy)
  • For each tweet run the sentiment analysis (TextBlob)
  • Save the underlying data and the score to the database (sqlite3)
  • Pull all results together and save the output as a CSV file (csv)

Stage 1

To gain access to Twitter’s API you will need to register here. Once your registration is complete, update the credentials in crds.py :

consumer_key = ‘YOUR TWITTER CONSUMER KEY’ 
consumer_secret = ‘YOUR TWITTER CONSUMER SECRET’
access_token = ‘YOUR TWITTER ACCESS TOKEN’
access_token_secret = ‘YOUR TWITTER ACCESS TOKEN SECRET’

These details will then need to be imported into twittertosql.py and the login process will be done with the following script:

import tweepy
from crds import *
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth)

Unfortunately, Twitter’s API only allows to make 15 calls every 15 minutes (so 1 call a minute) with a maximum of 3,200 latest tweets (source) for a single Twitter profile. This means that it is necessary to use a database in order to store the data so it can be used for further analysis without having to go back to Twitter’s API.

Each API call only returns 200 tweets, resulting in pagination, which Tweepy is able to assist with the following code: tweepy.Cursor(api.user_timeline, id=twittertargetprofile, count=200).pages(20). API call returns a list of tweets so it is important to loop through each one. Selected keys from the JSON response are then identified with the date of the tweet being slightly modified:

from datetime import datetimefor tweet in page: 
tweetid = tweet.id
tweetdate = datetime.strptime(str(tweet.created_at)[:10],’%Y-%m- %d’).strftime(‘%d-%m-%Y’)
tweettext = tweet.text

Stage 2

Next stage consists of the sentiment analysis using TextBlob library and its sentiment property. This returns an output for polarity between -1 (very negative) and 1 (very positive). This score is then rounded to 4 decimal points.

from textblob import TextBlobpolarity = round(TextBlob(tweettext).sentiment.polarity,4)

Stage 3

With the results available it is time to store them in a database. Fortunately, SQLite is the perfect choice with no further installations required. For each new profile it is important to create a new table. To ensure that no duplicates of tweets are saved in the database, it is important to save column for tweet’s id as a primary key. The overall script is as follows:

import sqlite3
conn = sqlite3.connect(‘twitterprofiles.db’)
c = conn.cursor()
c.execute(‘CREATE TABLE ‘+targettwitterprofile+’ (dbtweetid int PRIMARY KEY, dbtweetdate date, dbtweettext text, polarity real)
...
...
...
c.execute(“insert into “+targettwitterprofile+” values (?,?,?,?)”, (tweetid, tweetdate, tweettext, polarity))
...
...
...
conn.commit()
c.close()

The last line in the script above is responsible for saving the revelant data points for each tweet to the database and, once all tasks are complete, for closing the connection.

Finally, to ensure API calls to Twitter are made every minute, time.sleep(65) pauses the script for 65 seconds. PrettyTable library was also utilised in order to assist with the visualisation of the data collected and stored to the database.

Stages 1, 2 and 3 are form together twittertosql.py script available here.

Stage 4

With the underlying data and sentiment result stored to the database, the final stage is to carry out the actual analysis. In this case, an average sentiment score is calculated per each day. It is then sorted by time and the output is saved a csv file. The final script for this stage is available here.

Sample Output

Below is an example of a visual display of the results, created as part of this analysis. It is very clear that this selected profile contained very few negative tweets. An average score for the period analysed was just below 0.2, as shown by the purple line.

Conclusion

Overall, this simplistic methodology allows to get up and running within an hour. Such analysis could then be easily extended to other profiles with the main constraint being the number of permitted Twitter API calls over time.

With the data collected and aggregated, further statistical analysis (most likely time series) would be required in order to identify any potential patterns.

Further opportunities for improvement might be:

--

--