Improving OpenAI GPT Responses Time

3 min readSep 7, 2023

When developing applications with OpenAI models, the speed of response from the API can often be a critical factor and a potential hindrance to progress. This is a shared concern among numerous developers who work on solutions utilizing OpenAI.

In the following article, we will explore several strategies to enhance the response time of GPT

Asynchronization in Python

Imagine you have a big toy box filled with all kinds of toys. Usually, when you want to play with a toy, you take it out and play with it one by one, right? That’s like how most things work in Python — one thing happens at a time. But sometimes, in special situations, you might want to play with more than one toy at the same time! That’s where “asynchronization” in Python comes in. It’s like having a magical hand that can play with different toys at once without waiting for one toy to finish before starting with another. This is super helpful when you have lots of things to do, like talking to your friend on the phone while watching your favorite show on TV. Asynchronization in Python helps you do many things together without getting stuck, just like a clever multitasking friend!

Important Concepts in Asynchronization

Python uses a library called asyncio to implement asynchronization
Instead of waiting for a task to finish, you can mark it as ‘await,’ and Python will move on to the next task, effectively allowing multiple tasks to run simultaneously.
To make a function asynchronous, you use the async keyword before the def keyword, like this: async def my_function():
When you want to wait for a function to finish its execution asynchronously, you use the await keyword before the function call, like this: await some_function()

Implementation of the Code

import asyncio
import openai
import nest_asyncio
nest_asyncio.apply()

openai.api_key = "<api_key>"
engine = "<engine>"

In first, we need to install all the necessary packages that are required. Asyncio for asyncrinozation and openai to make openai calls

async def gpt_call():
    response = await openai.ChatCompletion.acreate(
        engine=gpt4,  # Replace with your engine name
        messages=[
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": "Does Azure OpenAI support customer managed keys?"},
            {"role": "assistant", "content": "Yes, customer managed keys are supported by Azure OpenAI."},
            {"role": "user", "content": "Do other Azure AI services support this too?"}
        ]
    )
    return response['choices'][0]['message']['content']

This is a sample openai call we make. It uses chat-based model (gpt-35-turbo/gpt-4). We uses OpenAI’s ChatCompletion acreate method in case of asycn call

Difference with openai call using loop vs async method

There you could see if we make 20 api calls in loop, then it takes 3 min 30 sec whereas if we group multiple async calls together and leverage concurrency, it reduces the response time drastically (12.9 sec)

Code — openAI-project/GPT API Calling (Sync vs Async).ipynb at main · anurag899/openAI-project · GitHub

A few points to keep in mind

While it’s possible to enhance performance through asynchronous calls, it’s essential to be mindful of the rate limitations imposed on the number of calls per minute or per token for OpenAI API requests. It is advisable to review the rate limit associated with a particular model before determining the most efficient async call strategy. Additionally, there are alternative methods available that can help minimize response times

Avoid Redundant Response — Use well crafted prompts to be concise and brief in the response
Shift from GPT-4 to GPT-3.5 — If your downstream tasks can be handled by the GPT-35, then you could utilze it for those tasks
Multiple instances of OpenAI — Create multiple instances of OpenAI and divert your api-calls based on the rate limit and availability

Hope you have find this article helpful. You want more content likes this, kindly feel free to follow (https://www.linkedin.com/in/anurag-mishra-660961b7/)

Improving OpenAI GPT Responses Time

Asynchronization in Python

Implementation of the Code

A few points to keep in mind

Written by Anurag Mishra