LangChain and OpenAI API — Beyond “RateLimitError” — a solution.

Soumen Sardar
3 min readJun 5, 2023

--

Photo by Silas Köhler on Unsplash

LangChain is a cutting-edge framework that is transforming the way we create language model-driven applications. Through the integration of sophisticated principles, LangChain is pushing the boundaries of what can be accomplished using conventional APIs, ushering in a new era of possibilities.

OpenAI is at the forefront of cutting-edge artificial intelligence research and development. Their language models, particularly GPT (Generative Pre-trained Transformer), have revolutionized the field of natural language processing (NLP) and have found diverse applications in various domains. In this article, we will explore OpenAI’s language models, specifically GPT, and delve into their capabilities, impact, and the potential they hold for the future.

GPT (Generative Pre-trained Transformer) has gained significant attention for its remarkable language generation capabilities. With the release of the OpenAI API, developers now have the opportunity to tap into this powerful language model and leverage its capabilities in their own applications. But a “RateLimitError” occurs when you have exceeded the allowed rate limit. This indicates that you have submitted an excessive number of tokens or requests within a specific time frame, leading our services to temporarily restrict further submissions from your end.

In this blog, we will take an example of LangChain + OpenAI text-embedding example and try to solve “RateLimitError”.

Understand RateLimitError

Rate limits are measured in two ways: RPM (requests per minute) and TPM (tokens per minute). A mist common model ada accepts 200 tokens per minute. Free trial users, TEXT & EMBEDDING, 3 RPM, 150,000 TPM. https://platform.openai.com/docs/guides/rate-limits/what-are-the-rate-limits-for-our-api

Install Libraries

pip install langchain openai tiktoken

Use LangChain and OpenAI for Text Embedding

A basic example to perform text-embedding.

from langchain.embeddings import OpenAIEmbeddings
from openai.error import RateLimitError

# create a OpenAIEmbeddings() object
caption = "A quick brown fox jump over the lazy dog"
key = "sk-x12x12x12x12xcc12522***" # OPENAI API KEY
try:
model = OpenAIEmbeddings(openai_api_key=key, model="ada", max_retries=1)
embedding = model.embed_query(caption)
# As of now it returns 1536 dimensional vector
print(embedding.shape) # (1536, )
except RateLimitError as e:
# This is what we want to solve
...

Reduce Tokens:

To reduce number of tokens per request we can remove stop words and punctuations:

def clean(text):
import re
# remove punctuation, space, linefeed, etc.
text = text.strip().lower()
text = " ".join(filter(lambda word: word not in STOP_WORDS, text.split(" ")))
text = re.sub(r'[^\w\s\']', ' ', text)
text = re.sub(r'[\s\n]+', ' ', text)
return text.strip()
print(clean("A quick brown fox jump over the lazy dog")) # quick brown fox jump lazy dog
...
embedding = model.embed_query(clean(caption))
...

Problem:

Although we have reduced the number of tokens using clean(). But still, we are bounded by RPM and TPM limits. As per the OpenAI recommendation, we can time.sleep() between API calls. https://platform.openai.com/docs/guides/rate-limits/how-do-rate-limits-work

Solution:

You can borrow an API key from your colleague or friend and use it. Sorry to disappoint you!!

A little help:

As you have two credentials you can now use two credentials alternatively to access the API service. OpenAICredentialManager() can help by doing this seamlessly. First, create an openapi.apikey file (a simple CSV file) and save your credentials.

sk-x12x12x12x12xcc125221c22344***,soumen
sk-x12x12x12x12xcc125221c22345***,suman

I have designed a credential manager where you can provide the openapi.apikey file and seamlessly access the OpenAI API services. It will automatically detect and load active API key from the pool available in openapi.apikey file. The class is designed as simple python iterable.

emb_db = [] # a list to save 
text_loader = # is a generator that yields cleaned texts
# create a credential manager object
cred_man = OpenAICredentialManager("./openai.apikey")
cm = iter(cred_man)
key, nickname = next(cm)
# pick caption text one by one
for caption in text_loader:
while True:
model = OpenAIEmbeddings(openai_api_key=key, model="ada", max_retries=1)
try:
if cred_man.is_limit_exhausted(nickname):
raise RateLimitError("Rate limit exhausted for {}".format(nickname))

# single
embedding = model.embed_query(caption)
emb_db.append(caption, embedding)
# time.sleep(60 / rpm)
break
except RateLimitError:
cred_man.set_limit_exhausted(nickname)
key, nickname = next(cm)

Discussion:

I am yet to publish this work to pip, and conda. Please left your comment to gain access to this OpenAICredentialManager().

Thanks for giving your precious time to this post. I hope this is helpful and could make our life easier. If you made it so far and find this article helpful please consider a like, and comment your thoughts. Please share if you like it.

Regards,

Soumen Sardar

--

--

Soumen Sardar

Soumen Sardar is Data Science Lead, AI at Smiths Detection. B.Tech CSE, and MS in Data Science, LJMU. PG ML Certification Stanford Online, Diploma in DL IIITB