Scraping Tweets using Twitter API’s

Manoj Nain
Analytics Vidhya
Published in
3 min readJul 4, 2020

Web scraping is an essential skill for any data scientist to gather the data from various resources. Twitter is a great tool to gather tons of quality data. One thing where I find twitter data very helpful is in sentiment analysis. Twitter makes it really easy to gather publicly available data using its APIs.

This article will show how to scrape tweets related to COVID 19 from twitter using twitter API’s. Twitter has user-friendly APIs to easily access publically available data. If you like to learn through videos go to this link: https://www.youtube.com/watch?v=L7p1O-3a-Wc&t=82s

To create API’s follow below steps:

  1. Please go to link to sign up for your Twitter if you don’t have an account.
  2. Create New Application
    Go to: https://dev.twitter.com/apps/new
    Enter your Application Name, Description and your website address. You can leave the callback URL empty.
    Copy the consumer key (API key) and consumer secret from the screen into your application.
  3. Create an Application

4. After creating application you can copy your secret key as shown below:

5. To get access token click on “Create my Access Token” and copy them:

Now that you have APIKey, SecretAPIkey, Access token, and secret access token save them in a text file to access them securely. To access the keys use ConfigParser:

import configparser config = configparser.RawConfigParser() config.read(filenames = '/path/twitter.txt')

This will create an object(config) and read the keys securely. This is important as we do not want to expose our keys to others.

We will be using Tweepy library to extract data from twitter. Read more about Tweepy here link.

Install and import tweepy:
!pip install tweepy import tweepy as tw

Now we need to access our API keys from the config object. We can do that by using .get method:

accesstoken = config.get('twitter','accesstoken') accesstokensecret = config.get('twitter','accesstokensecret') apikey = config.get('twitter','apikey') apisecretkey = config.get('twitter','apisecretkey')

Next step is do authentication using OAuthHandler:

auth = tw.OAuthHandler(apikey,apisecretkey) auth.set_access_token(accesstoken,accesstokensecret) api = tw.API(auth,wait_on_rate_limit=True)

Now we have successfully authenticated and connected with twitter using APIs.

Next step is to define the search word (twitter hashtag) and date from which we want to scrape tweets from:

search_word = '#coronavirus' date_since = '2020-05-28'

To scrape tweets create a tweepy cursor ItemIterator object and add parameters i.e API object, search word, date since, language etc.

tweets = tw.Cursor(api.search,q = search_word, lang ='en',since = date_since).items(1000)

Now we have got the tweets related to Coronavirus in tweets object. To get the details of these tweets we will write a for loop and grab details like geo, tweet text, user name, user location. To read more follow this link

tweet_details = [[tweet.geo,tweet.text,tweet.user.screen_name,tweet.user.location]for tweet in tweets]

Output:

That is all on how to scrape tweets from twitter.

--

--

Manoj Nain
Analytics Vidhya

Data Scientist | Machine Learning | Artificial Intelligence