Scraping Tweets using Twitter API’s
Web scraping is an essential skill for any data scientist to gather the data from various resources. Twitter is a great tool to gather tons of quality data. One thing where I find twitter data very helpful is in sentiment analysis. Twitter makes it really easy to gather publicly available data using its APIs.
This article will show how to scrape tweets related to COVID 19 from twitter using twitter API’s. Twitter has user-friendly APIs to easily access publically available data. If you like to learn through videos go to this link: https://www.youtube.com/watch?v=L7p1O-3a-Wc&t=82s
To create API’s follow below steps:
- Please go to link to sign up for your Twitter if you don’t have an account.
- Create New Application
Go to: https://dev.twitter.com/apps/new
Enter your Application Name, Description and your website address. You can leave the callback URL empty.
Copy the consumer key (API key) and consumer secret from the screen into your application. - Create an Application
4. After creating application you can copy your secret key as shown below:
5. To get access token click on “Create my Access Token” and copy them:
Now that you have APIKey, SecretAPIkey, Access token, and secret access token save them in a text file to access them securely. To access the keys use ConfigParser:
import configparser config = configparser.RawConfigParser() config.read(filenames = '/path/twitter.txt')
This will create an object(config) and read the keys securely. This is important as we do not want to expose our keys to others.
We will be using Tweepy library to extract data from twitter. Read more about Tweepy here link.
Install and import tweepy:!pip install tweepy import tweepy as tw
Now we need to access our API keys from the config object. We can do that by using .get
method:
accesstoken = config.get('twitter','accesstoken') accesstokensecret = config.get('twitter','accesstokensecret') apikey = config.get('twitter','apikey') apisecretkey = config.get('twitter','apisecretkey')
Next step is do authentication using OAuthHandler:
auth = tw.OAuthHandler(apikey,apisecretkey) auth.set_access_token(accesstoken,accesstokensecret) api = tw.API(auth,wait_on_rate_limit=True)
Now we have successfully authenticated and connected with twitter using APIs.
Next step is to define the search word (twitter hashtag) and date from which we want to scrape tweets from:
search_word = '#coronavirus' date_since = '2020-05-28'
To scrape tweets create a tweepy cursor
ItemIterator object and add parameters i.e API object, search word, date since, language etc.
tweets = tw.Cursor(api.search,q = search_word, lang ='en',since = date_since).items(1000)
Now we have got the tweets related to Coronavirus in tweets object. To get the details of these tweets we will write a for loop and grab details like geo, tweet text, user name, user location. To read more follow this link
tweet_details = [[tweet.geo,tweet.text,tweet.user.screen_name,tweet.user.location]for tweet in tweets]
Output:
That is all on how to scrape tweets from twitter.