Multiprocessing API Calls: A Simple Guide

Akash Gupta
Plumbers Of Data Science
3 min readApr 10, 2023

As data volumes are increasing exponentially day by day, processing large datasets in a shorter amount of time is becoming critical. For instance, consider the case of making API calls to a third-party data provider. API calls are essential when we need to extract information from external sources. However, when we need to extract information from multiple sources, API calls can become a bottleneck.

Fortunately, Python has a multiprocessing module that allows you to use multiple processors to speed up the processing of data. It allows us to make multiple API calls simultaneously.

In this blog, we will discuss how to make API calls concurrently using Python’s multiprocessing module. We will use the requests library to make API calls and the PySpark library to process data.

Step 1: Import Required Libraries

First, we need to import the required libraries. We will be using the following libraries:

  • requests: to make API calls
  • json: to handle JSON responses
  • pyspark: to process data
  • multiprocessing: to make API calls concurrently
  • itertools: to chain the results of API calls
import json
import pyspark
import multiprocessing as mp
from requests import get as getRequest
from pyspark.sql.functions import col,explode
from pyspark.sql import functions as f
from itertools import chain
from pyspark.sql.types import *

Step 2: Get IDs from a CSV File

We will read IDs from a CSV file using PySpark’s read.csv() function. We will filter the IDs based on the Country column and return a list of IDs.

# Reading input data
IDs = spark.read.csv(Source_File_Path, header = True).filter(col("Country").isin(Country)).rdd.map(lambda x: x[3]).collect()

Step 3: Generate API Token

Before making API calls, we need to generate an API token using an Access Identifier and an Authorization Key. We will use the requests library to make a POST request to the Authentication URL and get the API token.

url = Authentication_URL
param = {
"accessIdentifier": Authentication_Access_Identifier,
"privateKey": Authentication_PK
}
access_token_response = requests.post(
url,
headers = {'Content-type': 'application/json'},
data = json.dumps(param)
)
access_token_text = json.loads(access_token_response.text)
auth_token = access_token_text['token']

Step 4: Make API Calls

Now, we will define a function that will make an API call for a given ID and return the JSON response. We will use the requests library to make a GET request and pass the ID and API token in the headers.

We will then define another function that will call the above function for each ID in the ID list and append the JSON response to a list.

def getData(url,ID,token):
base_url = url+ID
headers = {
"Content-Type": 'application/json',
"Authorization": f"JWT {str(token)}"
}
try:
response = getRequest(base_url, headers=headers)
except Exception as e:
return {}
if response != None and response.status_code == 200:
jsonResponse = response.json()
return jsonResponse
return {}
def getIDData(ID):
global url
token = auth_token
_dataList = []
jsonDataBatch = getData(url,ID,token)
_dataList.append(jsonDataBatch)
return _dataList

Step 5: Implement Multiprocessing

Finally, we will use Python’s multiprocessing module to make API calls concurrently. We will create a multiprocessing pool and use its map() function to call the function defined in Step 4 for each ID in the ID list. The results of the API calls will be returned as a list of lists, which we will chain together to get a final list.

pool = mp.Pool(mp.cpu_count())
results = pool.map(getIDData, [ID for ID in IDs])
pool.close()
_actualDataList = list(chain(*results))

That’s it! Now we have our results in _actualDataList. We can now use PySpark to transform and analyze the data as required.

Conclusion

In this blog, we have discussed how to make API calls concurrently using Python’s multiprocessing module. By making multiple API calls simultaneously, we can significantly improve the performance of our code.

--

--

Akash Gupta
Plumbers Of Data Science

Data Engineering with a Sense of Humor: ओ bug कल आना!