Mining Twitter for sentiment analysis using Python 🐍

4 min readOct 16, 2018

From June 2020, I will no longer be using Medium to publish new stories. Please, visit my personal blog if you want to continue to read my articles: https://vallant.in.

When you need data from Twitter for sentiment analysis, there are many ways to get it. You can scrape Twitter website or use a simpler technique, which involves connecting to Twitter API using Python.

The second technique is the one that we’ll be using today. It’s more convenient because Twitter offers a lot of ways to get information from their servers. You can look for tweets using hashtags, keywords, places, usernames etc. Besides, you can use the same filters from Twitter search to refine your data capture.

In order to show you how to use Python to access Twitter, I will be using again the Brazilian elections example. This year, we, Brazilians, choose our new president. The campaign is very polarised with two candidates who are at the same time the most loved and hated by Brazilian population.

Let’s say we will harvest Twitter to find tweets who mention both candidates: Haddad and Bolsonaro. How could we look for text on Twitter that will help us to build a Machine Learning model that will analyse sentiment? Let’s do it.

Creating a Twitter app

Before anything, you must have a developer access to Twitter and create an app that will be used to connect to Twitter API. Doing this used to be simpler, but now, Twitter is reviewing each developer access request. You have to submit a request here: https://developer.twitter.com/.

After you have been accepted, it’s time to create an app. You can consult Twitter documentation to find how you can do it. Once you have finished, take note of your credentials (and don’t forget to protect them). You will need:

The API key
The API secret key
The access token
The access token secret

All of this can be found on your app details, under the “Keys and tokens” option.

Using Tweepy to connect to Twitter API

First thing to do: import Pandas, Tweepy and Jsonpickle. Then, you will have to create 4 variables, each one for the key you collected for your app.

Tweepy is a library that will help you to connect to Twitter API with no hassle. It supports access to Twitter via OAuth. Jsonpickle is another library used to deal with JSON files — which we’ll be using here.

After defining your variable keys, let’s create a function that will connect to Twitter. We’ll assign this function to a variable called api.

What we need now is a way to access Twitter search and return all tweets which correspond to what we are looking for. We need, then, to save these tweets to a local file that we’ll transform into a Pandas DataFrame.

So, let’s start by building a function that will:

Create a json file that will hold all the tweets
Access Twitter API, query it and return the tweets
Save the tweets into the file we just created

The function will accept as parameters:

filepath: where the file should be saved and it’s name
api: the api object we created earlier
query: the query that will be used by Twitter to retrieve the tweets
max_tweets: your developer account has a limit of how many requests you can do each 15 minutes. So, this option will only set a limit of how many tweets we can receive, but it can be affected by the limits imposed by Twitter to your account and by how many tweets are available for your search criteria.
lang: The language of the tweets. We’ll use Portuguese.

This is our function:

And this is our query. I am using only hashtags that are related to both candidates (for or against them).

And that’s it! You should have your tweets saved on a file now. I usually like to backup the file before transforming it on a data frame. So, do it now.

Open your file now. You will see something like this:

Each line in our file is an actual tweet saved in a JSON format. So, our next step is to build a function that:

Will open our JSON file
Will iterate trough the file and will decode each line so we can read the JSON
Will extract the information on the JSON and will convert it to lists
The lists will be merged into a Pandas DataFrame

Something that’s really useful about getting tweets this way is that you have access to metadata that each tweet carries. A list with the metadata available can be found here: https://developer.twitter.com/en/docs/tweets/data-dictionary/overview/tweet-object.html. You can select the ones that are most important for you.

We are done! Just test your data frame using the head() function. If you want, you can save it for later use. You just need the to_csv() function provided by Pandas.

Mining Twitter for sentiment analysis using Python 🐍

Creating a Twitter app

Using Tweepy to connect to Twitter API

Written by Wilame