How make async calls to OpenAI’s API.

3 min readFeb 25, 2024

A guide to make asynchronous calls to OpenAI API with rate limiting.

Ever faced the challenge of sluggish response times when calling OpenAI’s API? Making one call at a time might seem straightforward, but in real-world applications, this could lead to significant delays. Imagine waiting x seconds for each of the 1000 prompts — the script’s efficiency takes a hit. In this tutorial, our goal is to enhance the efficiency of your OpenAI API calls.

We’ll delve into making asynchronous calls using asyncio and explore how to implement effective retry strategy, particularly when encountering the OpenAI API rate limit.

Before we dive into the implementation details, let’s first understand the libraries involved — Asyncio and Backoff. Then, we’ll explore step-by-step how to transform your OpenAI API calls.

What is Asyncio?

Asyncio is a Python library that provides asynchronous I/O. It relieves you of the burden of worrying about locks and threads when writing concurrent programs. The foundation of asyncio is the idea of coroutines, or functions that can be interrupted and then resumed.

What is backoff?

Backoff is a python library that provides decorators that can used to wrap a function and retry until a specified condition is met.

A tutorial to make async calls to OpenAI

Step 1: Adding all the dependencies.

Make a new folder with any name of your choice. In this case we can name it OpenAITutorial.

Lets create a virtual environment and install the dependecies.

#Create the virtual environment
python3 -m venv venv

#Activate the virtual environment
source venv/bin/activate

#Install asyncio
pip3 install asyncio

#Install backoff
pip3 install backoff

Step 2: Now let’s make a aysnc function in python

We will make a coroutine for calling the API. You can add this code in a file called async_tutorial.py

# Import necessary libraries
import asyncio
import openai

# Set up OpenAI client with your API key
client = openai.AsyncOpenAI(api_key='Your API key')

# Define the coroutine for making API calls to GPT
async def make_api_call_to_gpt(prompt, model="gpt-3.5-turbo"):
    messages = [{"role": "user", "content": prompt}]
    response = await client.chat.completions.create(
        model=model,
        messages=messages,
        temperature=0,
    )
    
    return response.choices[0].message.content

The make_api_call_to_gpt coroutine function is designed to facilitate asynchronous calls to the OpenAI API. This function accepts two parameters:

prompt: This parameter signifies the text prompt that you intend to process using the OpenAI API. It serves as the input content for generating a response.
model: This parameter is optional and allows you to specify the GPT model for processing the prompts. By default, it is set to “gpt-3.5-turbo,” but you can customize it based on your specific requirements.

Step 3: Concurrent OpenAI API Calls with Asyncio

async def main():
    prompts = ["Hello, ChatGPT!", "How does async programming work?"]

    # Create a list to store the results of asynchronous calls
    results = []

    # Asynchronously call the function for each prompt
    tasks = [make_api_call_to_gpt(prompt) for prompt in prompts]

    # Gather and run the tasks concurrently
    results = await asyncio.gather(*tasks)
    print(results)

# Run the main asynchronous function
asyncio.run(main())

The line asyncio.run(main()) initiates the main function. Within this function, a list of asynchronous calls for each prompt is created using make_api_call_to_gpt. The key step is asyncio.gather(*tasks), which concurrently executes these asynchronous calls.

Step 3. Using backoff to stay within the rate limit:

OpenAI has rate limit in place for obvious reasons to protect misuse of API and to restrict API usage so that everyone has access to the API without making the server slow.

Backoff library solves just this problem by wrapping the function which will retry the wrapped function until a condition is met. The code block that we have in the step 2 can be decorated with this

@backoff.on_exception(backoff.expo, some-exception)

Here on_exception is used to retry the function when a specified exception is raised. backoff.expo uses the exponential decay logic for the retry mechanism. In place of some-exception, we will have to define the exception that is going to be raised when we reach our OpenAI API rate limit.

So in our case we will have something like this:


import asyncio
import openai
import backoff

client = openai.AsyncOpenAI(api_key='Your API key')

@backoff.on_exception(backoff.expo, openai.RateLimitError)
async def make_api_call_to_gpt(prompt, model="gpt-3.5-turbo"):
    messages = [{"role": "user", "content": prompt}]
    response = await client.chat.completions.create(
        model=model,
        messages=messages,
        temperature=0,
    )
    
    return response.choices[0].message.content

The exception that we are catching is openai.RateLimitError exception that is raised by OpenAPI.

You can checkout the code at Github.

Happy coding!

Lets connect on LinkedIn: aman-tech

If you enjoyed this tutorial, please click the 👏 button and share to help others find it! Feel free to leave a comment below.