How to leverage ChatGPT’s new JSON mode to build better user experiences

Sean Grindal
5 min readNov 15, 2023

--

With OpenAI’s first developer conference came a new feature for ChatGPT: JSON Mode. If you’re interested in building powerful AI tools you need to be leveraging this new powerful feature.

Motivation — Why JSON is better

In my experience, JSON response are objectively better than regular string responses for almost every application.

Let’s demonstrate this with the example of a language-translation app. First, let’s ask ChatGPT a standard translation question.

This response is pretty good, but it’s not ideal. It’s not easy to skim, and it also doesn’t offer any extra functionality. For example, how do I correctly pronounce Konnichiwa anyways? And what if I want to use this response to create a flash card for future review. With this format I’d have to copy sections of the response into some other parts of my potential application manually.

Yet, this certainly doesn’t need to be the case. The above response is just stringing together four discrete bits of data with natural language:

  1. The initial phrase
  2. Romanized translation
  3. Translation in Japanese writing
  4. Extra Context

Considering this is the the only information I care about, ideally the response should just look like:

Better yet, we aren’t limited to just one response format. In my experience, the newly released GPT-4 Turbo can reliably choose which prompt-specified JSON formats is best for the current question. You can then leverage those different formats for different questions styles.

So now let’s say we are going from Japanese → English. We can force the response into a similar but slightly different format:

  1. The initial Japanese phrase
  2. English Translation
  3. Extra Context

That could look like the following on the frontend.

These same benefits can apply in parts to almost any possible application, because the basic principles of what’s going on here continue to apply.

Most language model responses are a combination of some useful data with a bunch of natural language junk tossed in. JSON allows a developer to cut out all the garbage and just show the data the user wants in an optimized format.

This formatted data can then also be used for other purposes in the interface, like performing additional operations of sections of the response. With this above response as an example, I can add one button to my application that turns this response into flash card. It can do that because the data is already parsed. Overall, this pattern leads to much cleaner and powerful UX for interfacing with AI language models.

Finally, in the case a question doesn’t conform to our preset formats, we can always just fallback to a standard looking response, losing nothing in the process.

Now that the motivation is out of the way, let’s talk implementation.

Backend

First we need to our language model to respond in only JSON. For the article I’m going to be using the GPT 4 Turbo. We can do that with the following config.

const response = await openai.chat.completions.create({
model: 'gpt-4-1106-preview',
stream: true,
response_format: {
type: 'json_object',
},
messages: [...]
});

Next we need to feed the model a prompt that tells it what JSON formats to enforce. The following is an example prompt that specifies formats for different cases.

You are a Japanese tutor and will be helper users answer questions about Japanese. You will have a few formats to respond with but you will always respond in JSON.

If the user asks you to translate something into Japanese, you will respond with JSON keys “type”, “phrase”, “romaji”, “kana”, and “context”.For example, you will use this format if the user asks “How would I say “hello” in Japanese?”. The “type” key will always be “to-japanese” for this kind of response.

If the user asks for you to translate Japanese into English, you will also respond in JSON, but with only have keys “type”, “phrase’, “translation”, and “context”. For example, use this format if they ask quesitons like, “What does o-genki desu ka?” mean. The “type” key will always be “to-english” for this kind of response.

If a user asks you any other general question that doesn’t fall into the above categories, respond in JSON with just the “types”, and “answer” keys. For example, if they ask, Can you please explain the “wa” particle to me, you will respond in this format. The “type” key will always be “general” for this kind of response.

Pay close attention to the last case, which acts as a backup. On the frontend we can then use whatever display format we choose by reading the type and changing the UI component accordingly.

This format expands to an arbitrary number of cases and formats. Note that this prompt is shortened for readability, in practice I would provide more examples and cases*. If you’re interested in writing good prompts, checkout OpenAI’s tips. Prompts are powerful so you want to get them right.

Now we have a bit of a problem. JSON mode only ensures that the final response will be parsable JSON. However, the JSON response is streamed just like any regular string response, so it won’t always be valid JSON. It might sometimes look like the following.

`{"fruit": "tangerine", "origin": "United Sta`

That means we can’t parse the JSON with a simple JSON.parse() until the entire response has been received. This is obviously less than ideal, but solving for this isn’t much trouble. For now let’s pass the response onto the frontend, and solve this there.

Frontend

Now that we are receiving our incomplete JSON strings, we need to parse it into usable data. We can do that using a function capable of converting incomplete JSON strings into a valid objects. In code:

const incomplete = `{"fruit": "tangerine", "origin": "United Sta`
const usableData = parse(incomplete)
console.log(usableData)

/*
{
fruit: 'tangerine',
origin: 'United Sta'
}
*/

Then we just run that parse function on every AI stream update. Writing this function is non-trivial so I’m using an NPM package wrapped with my own fallbacks which works well for this purpose. You can find that parsing package here.

Great, now we have usable data! Finally we can then support each of the following cases in the frontend from the different type values of the parsed response object. Just make sure sure to provide fallbacks incase the JSON can’t be parsed or the type key isn’t present.

To see the parsing in action, I spun up a Japanese learning app which uses the examples discussed here, called Lingoza. Check it out and you should see the different JSON formats get parsed live.

Thanks for reading and I hope you’ve find some of this helpful. If you want to chat with me further, you can connect with me on Twitter.

Happy building!

--

--