Mining Twitter for sentiment analysis using Python šŸ

Wilame
4 min readOct 16, 2018

--

From June 2020, I will no longer be using Medium to publish new stories. Please, visit my personal blog if you want to continue to read my articles: https://vallant.in.

When you need data from Twitter for sentiment analysis, there are many ways to get it. You can scrape Twitter website or use a simpler technique, which involves connecting to Twitter API using Python.

The second technique is the one that weā€™ll be using today. Itā€™s more convenient because Twitter offers a lot of ways to get information from their servers. You can look for tweets using hashtags, keywords, places, usernames etc. Besides, you can use the same filters from Twitter search to refine your data capture.

In order to show you how to use Python to access Twitter, I will be using again the Brazilian elections example. This year, we, Brazilians, choose our new president. The campaign is very polarised with two candidates who are at the same time the most loved and hated by Brazilian population.

Letā€™s say we will harvest Twitter to find tweets who mention both candidates: Haddad and Bolsonaro. How could we look for text on Twitter that will help us to build a Machine Learning model that will analyse sentiment? Letā€™s do it.

Creating a Twitter app

Before anything, you must have a developer access to Twitter and create an app that will be used to connect to Twitter API. Doing this used to be simpler, but now, Twitter is reviewing each developer access request. You have to submit a request here: https://developer.twitter.com/.

After you have been accepted, itā€™s time to create an app. You can consult Twitter documentation to find how you can do it. Once you have finished, take note of your credentials (and donā€™t forget to protect them). You will need:

  • The API key
  • The API secret key
  • The access token
  • The access token secret

All of this can be found on your app details, under the ā€œKeys and tokensā€ option.

Using Tweepy to connect to Twitter API

First thing to do: import Pandas, Tweepy and Jsonpickle. Then, you will have to create 4 variables, each one for the key you collected for your app.

Tweepy is a library that will help you to connect to Twitter API with no hassle. It supports access to Twitter via OAuth. Jsonpickle is another library used to deal with JSON files ā€” which weā€™ll be using here.

After defining your variable keys, letā€™s create a function that will connect to Twitter. Weā€™ll assign this function to a variable called api.

What we need now is a way to access Twitter search and return all tweets which correspond to what we are looking for. We need, then, to save these tweets to a local file that weā€™ll transform into a Pandas DataFrame.

So, letā€™s start by building a function that will:

  • Create a json file that will hold all the tweets
  • Access Twitter API, query it and return the tweets
  • Save the tweets into the file we just created

The function will accept as parameters:

  • filepath: where the file should be saved and itā€™s name
  • api: the api object we created earlier
  • query: the query that will be used by Twitter to retrieve the tweets
  • max_tweets: your developer account has a limit of how many requests you can do each 15 minutes. So, this option will only set a limit of how many tweets we can receive, but it can be affected by the limits imposed by Twitter to your account and by how many tweets are available for your search criteria.
  • lang: The language of the tweets. Weā€™ll use Portuguese.

This is our function:

And this is our query. I am using only hashtags that are related to both candidates (for or against them).

And thatā€™s it! You should have your tweets saved on a file now. I usually like to backup the file before transforming it on a data frame. So, do it now.

Open your file now. You will see something like this:

Each line in our file is an actual tweet saved in a JSON format. So, our next step is to build a function that:

  • Will open our JSON file
  • Will iterate trough the file and will decode each line so we can read the JSON
  • Will extract the information on the JSON and will convert it to lists
  • The lists will be merged into a Pandas DataFrame

Something thatā€™s really useful about getting tweets this way is that you have access to metadata that each tweet carries. A list with the metadata available can be found here: https://developer.twitter.com/en/docs/tweets/data-dictionary/overview/tweet-object.html. You can select the ones that are most important for you.

We are done! Just test your data frame using the head() function. If you want, you can save it for later use. You just need the to_csv() function provided by Pandas.

--

--