Absolute Beginners Guide to Tweepy and the Twitter Search Archive APIs

Katia Ossetchkina
8 min readApr 15, 2022

--

Twitter data is noteworthy for its ability to connect users to causes and public figures that they care about. Through Twitter near real-time feedback can be provided through the platform, making this data incredibly valuable for businesses and government bodies.

Main Page from the Twitter Developer Portal 🚀

Fortunately, Twitter developed several API connectors for programmers* like you and me :) By signing up for a Twitter Developer Account, you will have access to several API end-points, allowing you to tap into core elements of Twitter including tweets, retweets, mentions, users, likes, times, custom queries and much more for free!

This tutorial will cover how to sign up for the Twitter Developer API account, a quick 5-min guide to using the API with the Python Tweepy library, an example of using the Twitter Search Full Archive API to query for mentions and some resources to begin a sentiment analysis.

Pre-requisites:

  • A Twitter account with a valid phone-number
  • A tool to run your Python code! I will be using Jupyter Notebooks through Anaconda

Signing Up for a Twitter Developer Account (~2 minutes)

  1. Navigate to the Twitter developer account page: https://developer.twitter.com/en/products/twitter-api and click on the “Sign Up” button. You will need to login with your Twitter credentials.

2. Have the following information ready: a verified phone number, your Twitter use-case and accept the developer agreement. You will receive an email verifying your Twitter developer account:

Confirmation Email from Twitter 🎉

That was easy :)

Signing up Elevated Access for a Twitter Developer Account (~5 minutes + Approval Process)

The Elevated Access Twitter Developer is a free account will give you access to the Twitter Premium APIs, which include:

  • 30 Day Archive search API
  • Full Archive search API
  • Account Activity API

IMPORTANT: if you only want to search for Tweets on particular user profiles without any querying, then skip this step! However, if you want to access information like mentions or query for hashtags, you will need to apply for Elevated Access.

In the Developer Portal, click on “View Products” to apply for Essential and Elevated access:

Twitter Developer portal, click on View products

Apply for the Elevated Access, which is free :) Note, the more detail you can add about your intended use (which particular accounts you are interested to monitor and query for, the type of data from Twitter), the faster the review and approval process will take:

Apply for Elevated Access screen

If you plan to only scrape data for the Sentiment Analysis, then you only need to provide information for the first option: “Are you planning to analyze Twitter data?”

Example text for the Twitter Elevated Access intended use page

Confirm your information and sign the “Terms” on the final page. Once approved, you will receive an email confirmation. If you are a student, mentioning this information will speed up the Elevated Access approval process, which can take up to 48 hours.

Getting Started with the Twitter Developer API (~5–10 min)

  1. Create your first project inside the Twitter Developer Portal:
Click on Create Project

Once you named for first project, create an app within the project:

Give your app a meaningful name

After creating the name, you will be provide with the API Key, API Key Secret and Bearer token. Save a copy of these somewhere secure! These will be used to set up the live connection to Twitter in Python:

Please save these somewhere safe for future you :)

2. Let’s open our Jupyter notebook! 🎉🎉🎉

A world of possibility awaits! 🎉

Begin by pip installing the following libraries if not already installed: pip install tweepy , pip install pandas , pip install json .

Once everything is installed, import the libraries like so:

3. Let’s make a new cell in our notebook, and copy-paste the consumer API secret and token from earlier into string variables like this:

4. Create a new cell, where we will add 4 lines of code. Create a variable callback_uri which will allow you to later pass standard query strings later in your requests.

callback_uri = 'oob'

Now, let’s use tweepy’s OAuthHandler to create a connection to Twitter using your consumer_api_key, consumer_api_secret and callback_uri:

auth = tweepy.OAuthHandler(consumer_api_key,consumer_api_secret, callback_uri) 

Add this code to create a redirect_url for a secure link:

redirect_url = auth.get_authorization_url()

Print the redirect_url using the print method:

print(redirect_url)

Once you print this url, a link should appear below the cell of your Jupyter notebook. Click on it to authorize the connection and get a unique one-time pin from Twitter.

Your cell should look like this:

Redirect URL as Output in Jupyter — Click on this Link
Once redirected, click on the Blue Button to Authorize your App
Save your one-time pin! Note: every time you re-run this block of code, a new pin is needed.

5. Make a new cell, and store your pin variable. I will be using the input() method to make to simplify future copy-pasting a bit:

6. Using the pin, an access token will be created if the pin is valid, like so:

7. Last step: In a new cell, let’s initialize the API endpoint using the Tweepy method API, with our authorization token:

Well done! You are now ready to start using the Twitter API’s :)

Your code so far should look something like this (splitting the code into this many cells is not necessary, but recommended for trouble-shooting):

Using the Twitter APIs: Search Full Archive to Retrieve all Mentions for a Time-Frame (~5 min)

In Twitter, users mention an account by using the @ symbol for tagging like so: @MyTestUser

Note: Elevated access is needed for all Premium APIs, which includes the Search Full Archive endpoint.

In the Twitter Developer Portal, create an environment for the Search Full Archive API. On the left-hand menu, navigate to Dev Environments.

  • Click on “Search Tweets: Full Archive / Sandbox”. Set up your environment and link to your app.
  • Give your app a label and save this name. This will be important when running the API query:
Example of Environment Setup for Search Full Archive where label = dev

Let’s go back to Jupyter Notebooks. In my query, I am interested to see all people mentioning the account @TO_WinterOps (you can search for hashtags or multiple items for your queries), from January 1st 2022 to March 31st 2022. I am creating an object tweets_3_months to store the results and use the following arguments for my query (for more arguments, read here):

  • label = ‘dev’ (name of my environment from Developer Portal)
  • fromDate = “202201010000” (string in format yyyymmddhhmm — note your query will not execute if not passed in this format!)
  • toDate = “202203310000”
  • maxResults = 100 (int maximum value allowed by the API call)

Run this cell of code! Your variable will return a raw JSON, that sort of looks like this when printed:

[Status(_api=<tweepy.api.API object at 0x7fe4cb917400>, _json={'created_at': 'Wed Mar 30 16:00:35 +0000 2022', 'id': 1509198916096561154, 'id_str': '1509198916096561154', 'text': 'RT @TO_WinterOps: Expressways, major roads, residential roads and trails are being salted where required.  #CityofTOWinterAlert @311Toronto', 'source': '<a href="https://mobile.twitter.com" rel="nofollow">Twitter Web App</a>', 'truncated': False, 'in_reply_to_status_id': None, 'in_reply_to_status_id_str': None, 'in_reply_to_user_id': None, 'in_reply_to_user_id_str': None, 'in_reply_to_screen_name': None, 'user': {'id': 238984493, 'id_str': '238984493', 'name': 'TO Transportation', 'screen_name': 'TO_Transport' ...

To get a more useful Tweet output, use this printer function for the first 5 results:

A complete list of all the Tweet attributes with this API is documented on Tweepy documentation.

Sample output from the printer function:

Printed Output from Raw Twitter JSON File

Now, we are not done yet! Unfortunately, this API has a limit of 100 Tweets at a time to limit web traffic, meaning that even if you provided a date range, the API will start collecting Tweets from the end of your date range until it either hits your start date or the 100 limit.

We can check the oldest date in our JSON API pull like so:

tweets_3_months[-1].created_at
Result for the Oldest Date from Twitter Pull

In this case, we see that the oldest date is 2022-02-18 and we wanted 2022–01–31. To overcome this, we can create a while loop that will update the to_date field with the latest value, and a new list all_mentions will store this updated information for us:

Now we can finally put our Twitter pull in all_mentions into a data frame which we will call mentions_df, using enumerate to parse the JSON attributes we want to include.

For this example we are going to store the following for each Tweet:

  • id: unique id generated of the Tweet
  • created_at: UTC timestamp of creation date
  • favorite_count: how many users favorited this Tweet
  • retweet_count: how many users re-tweeted this Tweet
  • text.encode(“utf-8”).decode(“utf-8”): text of the Tweet. Note, that the Search Full Archive API currently has a limit of 128 characters
  • entities: all mentions and hashtags stored in a dictionary
  • screen_name: name of user who created the Tweet

And our data frame should look like this:

Mentions Dataframe

And that’s all folks :) You are now ready to begin analyzing some Twitter data in Python.

Your final code should look like this:

Next Steps: Sentiment Analysis

Our complete project can be found at this GitHub repository, where we perform a complete sentiment analysis of all the @TO_WinterOps mentions during January to February 2022, and the correlation of this data to the severity of the weather event. The Vader lexicon under the MIT license was used to perform that analysis.

--

--

Katia Ossetchkina

Masters of Applied Science and Engineering at University of Toronto