Integrating ChatGPT/OpenAI

Lessons learned from connecting to OpenAI: use your own data via prompts, think async, collaborate for prompt engineering

Ronny Roeller
NEXT Engineering
3 min readMar 25, 2023

--

Photo by Ecole polytechnique

We lately integrated OpenAI into our application. Here are some of the lessons we learned along the way:

1. Use your own data via prompts

As of today, the latest OpenAI models operates only on public data from the internet. Hence, we can’t ask OpenAI directly about our own proprietary data.

To nonetheless analyze our data, we inject it via the prompt:

Find the most surprising items in below list of user feedback:

- I love OpenAI.
- This is magical.
- Is this for real?

Before you get too excited, there are a couple of catches.

First, the amount of data that can be injected via prompts is limited. Most OpenAI models allow a maximum of ~4,000 tokens, which translates to roughly 12,000 characters for the prompt and response combined.

To calculate how much data we can include, we subtract from the 12,000 characters the length of the prompt and then reserve another 20% for the response. For a 700 character prompt, this gives us ~9,000 characters of data to inject. If we require more data, we split the input data and create multiple requests to OpenAI.

This brings us to the second catch: OpenAI can get quickly pretty expensive. A GPT-4 query with 4,000 tokens costs a wopping $0.24 at the moment. If we need lots of input data, we fall back to cheaper models like GPT-3.5-turbo for which the same 4,000 tokens costs less than a $0.01.

And there is a final catch: OpenAI might train their models on your prompts, which could return your data in the queries of other users. Here is a great article discussing the risks of leaking out confidential data.

2. Think async and prepare for timeouts

The OpenAI endpoints are synchronize. Yet, that’s a bit misleading as requests might actually take minutes to return. Even more challenging, there is huge variance in response times: the same call might return in a couple of seconds or in minutes. Unsurprisingly, we were soon running into all kinds of timeouts in our stack, from AppSync to Lambda.

To prevent these timeouts, we added an asynchronize wrapper around all OpenAI calls. For example: if an AppSync endpoint triggers a OpenAI request, we immediately return a handle to the caller. Once we receive the OpenAI response, we store the result in DynamoDB. The client can then query AppSync for the state of the handle, and receive the OpenAI response via DynamoDB.

3. Collaborate for prompt engineering

Crafting good prompts turned out to be rather challenging because it requires a) an in-depth understanding of the business domain to get sensible results as well as b) a solid grasp on engineering to receive the results in a machine-readable format.

We therefore have somebody from the business side create the initial version of our prompts via the OpenAI playground/ChatGPT. Afterwards, an engineer adjusts the prompt to make the responses machine-readable. From there, we iterated until we’re happy with the prompt.

To receive machine-readable responses, we instruct OpenAI to return results as JSON (consider a more compact format if you expect long responses):

Find the most surprising items in below list of user inputs.

Return the result as a JSON object, following the format template.
---BEGIN FORMAT TEMPLATE---
{
"title": ${TITLE},
"summary": ${SUMMARY}
}
---END FORMAT TEMPLATE---
---BEGIN USER INPUTS---
{{DATA}}
---END USER INPUTS---

This prompt returns nearly always JSON — but not always. Sometimes OpenAI decides to add further explanation before or after the JSON. We’re using a simple RegEx to strip those unwanted texts away (/[^{]*({.*})[^}]*/).

Happy coding!

--

--

Ronny Roeller
NEXT Engineering

CTO at nextapp.co # Product discovery platform for high performing teams that bring their customers into every decision