Using APIs in Python for Data Collection (Part 1)

Nathan Laundry
A Tinkerer’s Journal
7 min readMar 31, 2023

--

Photo by Quino Al on Unsplash

What is an API and Why are they Useful?

APIs, or Application Programming Interfaces, are useful to anyone who wants to access and use data or services that are provided by other applications or systems. APIs set a standard way for different applications and systems to communicate and share data with each other. This way, we don’t have to build every app or dataset from scratch.

APIs are the bread and butter of data collection. There’s tons of APIs that we, as programmers, can access to gather data to create new datasets for analysis or create apps. Some examples of useful APIs are:

  1. Twitter API: This API allows you to collect data from Twitter, including tweets, user profiles, and search results. This could be useful for social media analysis, sentiment analysis, and trend analysis.
  2. OpenAI API: This API allows you to provide text input to the API and receive back a variety of responses, such as generated text, summarized text, or translated text. This allows you to leverage the powerful ChatGPT and other OpenAI models programmatically.
  3. NASA API: This API allows you to access data from NASA’s various missions and programs, including images, videos, and scientific data. This could be useful for space exploration, scientific research, and educational applications.

Example of Using the Twitter API for Social Psychology Research

Suppose a psychologist wants to study how social outrage affects mental health.

To do this, they could use the Twitter API to collect tweets related to a recent social outrage event, such as a controversial political decision or a high-profile court case. They could then analyze the language used in these tweets to determine the level of outrage expressed by the users.

Here’s how the psychologist could use the Twitter API to collect the tweets:

  1. First, they would need to apply for a Twitter Developer account and create a new Twitter app.
  2. They would then need to use the Twitter API to search for tweets containing keywords related to the social outrage event. For example, they could search for tweets containing a specific hashtag or the names of people involved in the controversial decision.
  3. Once they have collected the tweets, they could use natural language processing (NLP) techniques to analyze the language used in the tweets. For example, they could use sentiment analysis to determine the overall sentiment of the tweets (positive, negative, or neutral), or they could use topic modeling to identify the key themes and topics discussed in the tweets.
  4. Finally, they could use statistical analysis to explore the relationship between the level of outrage expressed in the tweets and measures of mental health, such as self-reported stress or depression.

Okay, but how do APIs work?

When we work with another organization’s API to get data or use a service they provide, we say we’re making an API Call.

When we make an API call there’s 4 important components to keep in mind: Your program making the call, the request your program makes, the Server that answers the call, and the response object they send back.

  1. Your program making the call: This is like you picking up the phone and dialing a phone number to call a friend. In the context of an API call, your program is the code that initiates the call and sends the request to the API server.
  2. The request your program makes: This is like the message that you leave on your friend’s voicemail asking for information. In the context of an API call, the request is the message that your program sends to the API server asking for data.
  3. The Server that answers the call: This is like your friend listening to your voicemail and gathering the information you requested. In the context of an API call, the server is the computer that receives your request, processes it, and sends back a response with the data you requested.
  4. The response object they send back: This is like your friend calling you back and telling you the information that you asked for. In the context of an API call, the response is the message that the server sends back to your program with the data you requested.

API Code Example

In this section we’re going to do a detailed walkthrough of some python code that makes an API call to the random-user api. This is a free API for teaching purposes, but the process for calling it is very similar to the process you might use to make calls to Reddit, Twitter, Scopus (for scholarly articles), OpenAI’s API, and more.

If you’d prefer, a Full Tutorial is also available in a .ipynb file in this github repository. You can run it via github codespaces or by cloning the repository.

Making an API Call

To make an API call you need to know two things:

  • Where you’re sending it
  • What you’re asking it for

We call the place we’re sending it the endpoint and what we’re asking it the list of parameters.

import requests
import json

# API endpoint and parameters
url = 'https://randomuser.me/api/' # Where we're sending our API Call
params = {'nat': 'us'} # What we're asking it for

# Make the API call
response = requests.get(url, params=params)

Explaining the Call

The first line of code sets the address we’re making the call to. Instead of a phone number, we use a URL.

We set the address in the url variable:

url = 'https://randomuser.me/api/'

Parameters
In the next line we set the paramaters. In this case, we’re asking the randomuser API to provide us a random user whose nationality is the US.

params = {'nat': 'us'}

Sending the Call
The requests.get() method is when we make the call. Note that we pass the requests.get method both the address (url in this case), and the params. This way the requests.get() method knows where to make the call and what for.

requests.get(url, params=params)

Receving the Answer

The response variable is where we hold the data that was sent back to us by the request. Think of it like your friend calling you back and giving you the information you asked for.

response = requests.get(url, params=params)

Translating the Data

We get the data back in the form of a json object that looks this:

{
"results": [
{
"gender": "female",
"name": {
"title": "Mrs",
"first": "Gina",
"last": "Johnson"
},
"email": "gina.johnson@example.com",
"picture": {
"large": "https://randomuser.me/api/portraits/women/0.jpg",
"medium": "https://randomuser.me/api/portraits/med/women/0.jpg",
"thumbnail": "https://randomuser.me/api/portraits/thumb/women/0.jpg"
},
"nat": "US"
}
]
}

Then it’s best to translate it into something your python program can more easily use. When coding in python you’ll most likely use a data structure called a dictionary. To translate it we do the following:

# Parse the response as JSON
data = json.loads(response.text)

The json.loads method takes the response text, assumes it’s in json format, and translates it into a python dictionary.

Finally, you have the data you asked for from your API call ready to use in the rest of your program.

Accessing the Data we Want from the JSON Response Object

Now we can access specific items from the Dictionary we called ‘data’. We made a request to the random-user api which gives us a fake user’s data. Let’s walk through how to access these data properly.

Dictionaries Example

First we need to know how a dictionary works. A dictionary in Python is a collection of key-value pairs, where each key is associated with a corresponding value. The key is used to retrieve the value from the dictionary.

For example we might have a dictionary like this

ages = {'Alice': 25, 'Bob': 30, 'Charlie': 35}

Here, the dictionary is called ages. The keys are: Alic, Bob, Charlie, and the values are 25, 30, 35. If we want to extract a specific value we use the key like so:

ages['Alice']

In this case, we’re using the key: Alice, to get access to its value: 25.

Now back to our User Dictionary

Let’s access the first_name, last_name, and email. The user dictionary is more complex than the ages dictionary we used in the example. It contains other dictionaries inside itself. To deal with that we just use keys one after the other.

# Extract the relevant information
first_name = data['results'][0]['name']['first']
last_name = data['results'][0]['name']['last']
email = data['results'][0]['email']

# Print the results
print(f"Name: {first_name} {last_name}")
print(f"Email: {email}")
#### Data Extraction Code Explanation

The data variable in the code is a dictionary that contains the user’s information. The code extracts the relevant information by using dictionary keys to navigate through the dictionary object and access the specific pieces of information that are needed.

For example, the line first_name = data['results'][0]['name']['first'] accesses the person's first name by navigating through the dictionary keys in the dictionary object. The data['results'][0] part of the code accesses the first result in the results list, which contains the person's information. The ['name']['first'] part of the code accesses the first key in the name dictionary, which contains the person's first name.

Similarly, the lines last_name = data['results'][0]['name']['last'] and email = data['results'][0]['email'] extract the person's last name and email address, respectively.

Once the relevant information has been extracted, it is printed to the console using the print() function.

Summing Up

In this tutorial we explained what an API is, why they’re valuable, and walked through a basic API call to the random-user API. We also talked about the nature of the data we receive back from API calls.

APIs are a way for different programs to talk to each other and share information. By using APIs we can avoid writing code to do things that other programmers have already done for us.

For example, let’s say you wanted to create an app that shows you tweets from Twitter. Instead of starting from scratch and building a whole new system, you can use Twitter’s API to access their data and show it in your app.

There are tons of APIs available that can give us access to all sorts of data. I’ll be following this tutorial with an example of how to use the Reddit API to collect data on various interest groups.

See you in the next one!

Cheers,

Nathan Laundry from the Intelligent Adaptive Interventions lab at UofT

Check out my Newsletter #GuidingQuestions here

--

--

Nathan Laundry
A Tinkerer’s Journal

Sustainable productivity | Tech Tinkering | Occasional Poetry