Mastering GPT-4o API — turn mumbo jumbo into a neat JSON scheme!

Bogusz Stefańczyk
6 min readJul 16, 2024

--

In recent months LLMs started slowly flipping the world of programming on its head. Some text and image processing tasks, which are virtually indescribable in programming language, can now be be solved with just a few English sentences. But as soon as we plug it into ChatGPT, words start flowing out of it, we face a new challenge — reliably parsing the output text into variables we can use downstream in our code.

Why ChatGPT outputs are not code-friendly

Lets start with a practical example — let’s say we’re building a diet building app, and we need a list of ingredients given a dish name — luckly ChatGPT knows all the recepies, so lets ask it!
Note: examples use openai Python library and require setting an OpenAI API key first. Learn how to set it up and details of the API at https://platform.openai.com/docs/api-reference/introduction

PROMPT = """
You're a meal planner. Provided dish name, output list of ingredients.
"""
response = openai.ChatCompletion.create(model="gpt-4o-2024-05-13", messages=[
{"role": "system", "content": PROMPT},
{"role": "user", "content": "garlic pasta"},
])
print(response['choices'][0]["message"]["content"])

Sure! Here are the ingredients typically needed for baked beans:

1. Navy beans (or any small white beans)
2. Bacon or salt pork (optional for added flavor)
3. Onion
4. Brown sugar
5. Molasses
6. Ketchup or tomato paste
7. Mustard (usually yellow or Dijon)

Feel free to adjust quantities according to your taste and dietary preferences!

Nice — we have the ingredients list! But in this format its basically useless — trying to parse it into something we can manage in Python is basically impossible.
How much better it would be if we could just get a JSON that we can directly parse! Let’s try giving it an example to follow:

PROMPT = """
You're a meal planner. Provided dish name, output list of ingredients following this example:
{
"ingredients": [
{"name": "carrot", "amount": 250, "unit": "g"},
{"name": "tomatoes", "amount": 300, "unit": "g"},
] ,
}
"""
response = openai.ChatCompletion.create(model="gpt-4o-2024-05-13", messages=[
{"role": "system", "content": PROMPT},
{"role": "user", "content": "baked beans"},
])
print(response['choices'][0]["message"]["content"])

{
“ingredients”: [
{“name”: “canned beans”, “amount”: 400, “unit”: “g”},
{“name”: “onion”, “amount”: 1, “unit”: “medium”},
{“name”: “garlic cloves”, “amount”: 2, “unit”: “pieces”},
{“name”: “tomato paste”, “amount”: 2, “unit”: “tbsp”},
{“name”: “mustard”, “amount”: 1, “unit”: “tsp”},
{“name”: “water”, “amount”: 120, “unit”: “ml”}
]
}

Voilà! Seems like we got the job done, right? Let’s take a closer look.

First — we have no control over what the units will be. Just in this example we’ve got: g, medium(?), pieces, tbsp, tsp & ml.
I don’t even wanna think about what else it could come up with. Imagine you want your app to prepare a shopping list where we add up amounts of the same ingredients for multiple dishes — I have no formula for adding 1 medium onion to 200g of onion.

Second — we have no guarantees on the output format for different prompts. I run a little experiment where the user input instead of “baked beans” was “baked beans, with cooking instructions included” and the output changed to:

{
“ingredients”: […] ,
“instructions”: “Preheat oven to 350°F (175°C). In a large oven-safe pot, sauté the chopped onion and minced garlic over medium heat until softened. […]”
}

This is definitely not something we want for a production application.

What are ChatGPT tools

The key to solving these issues is using a simple yet powerful mechanism called tools
Tools
are available in API of both OpenAI’s ChatGPT and Google’s Gemini, so this solution can be easily applied with both.

If you’re using ChatGPT Plus, you might better recognize tools as Plugins. Those are additional APIs or actions that the assistant can choose to execute to get more information, browse the web, execute code etc.

ChatGPT (https://chat.openai.com/) assistant using Web Browsing tool.
ChatGPT (https://chat.openai.com/) assistant using Web Browsing tool.

Under the hood, the LLM model that ChatGPT is using gets a list of tools it can use, each with a name, description and allowed parameters. The model then chooses if it wants to output regular text or if it wants to call one of the tools, specifying its name and building list of parameters.
It’s important to note that the ChatGPT model doesn’t actually execute any code, sends requests or calls an API. All it does is outputs text saying “call tool X with params Y, Z”. Whats important for us, is that this text obeys a strict schema and parameters definition, so we can reliably read it from Python.

How to use ChatGPT tools to enforce output schema

Tools are designed to give LLMs capabilities of executing actions beyond text generation. So how can we use it to generate text but in a predefined schema? We need to trick it into thinking its executing an action!

One way to do this is to describe the tool as a “callback” it needs to make in order to provide the response. Think of it as a callback in programming languages, you pass a function that needs to be called when response is ready.

Lets start with the basics — how does one define a tool when calling ChatGPT API? Tools are basically a list of Python objects following OpenAI schema: https://platform.openai.com/docs/api-reference/chat/create#chat-create-tools

A single tool definition consists of:

  • name — think of it like a name of a function
  • description — defining the purpose of the function along with why and when it should be called
  • parameters — defining what data needs to be passed to the function, along with its types, allowed values and constraints

Parameters are defined as JSON schema objects — a very powerful & robust system with dozens of options.
https://json-schema.org/understanding-json-schema/reference/type

Last step is to enforce using the tool — in OpenAI API this can be done by either passing tool_choice='required' to allow enforcing calling any of the defined tools, or passing a specific function to force using a single one.

Example of enforcing response schema using tools

Going back to the example, lets define a tool that we will pass to the API:

PARAM_SPEC = {
"type": "object",
"properties": {
"name": { "type": "string", "description": "Name of the ingredient",},
"unit": { "type": "string", "enum": ["g", "piece", "ml", "tbsp", "tsp"], "description": 'Enum unit of amount',},
"amount": { "type": "number", "description": "Amount of the ingredient used", },
},
"required": ["name", "unit", "amount"],
}
TOOLS = [{
"type": "function",
"function": {
"name": "list_of_ingredients_callback",
"description": "Send list of ingredients back to the user",
"parameters": {
"type": "object",
"properties": {
f"ingredient_{i}": PARAM_SPEC
for i in range(10) # max number of ingredients - see note below
}
},
},
}]

The function is called list_of_ingredients_callback. It accepts multiple ingredient_X paremeters, each with 3 keys: name (string), amount (number) and unit (enum with 3 values).
NOTE: ingredient_X is a hack to work around OpenAI API spec problems. Normally, you would use a single parameters ingredients which would accept a list of objects. You can do this, but then the enum value of the parameter will not be respected, and model will continue outputting whatever it likes. + this way you can limit the maximum number of results in an array.

Now let’s once again execute the prompt, this time with the tool:

response = openai.ChatCompletion.create(model="gpt-4o-2024-05-13", messages=[
{"role": "system", "content": PROMPT},
{"role": "user", "content": "baked beans + add an instruction how to cook it"},
],
tools=TOOLS,
tool_choice='required')

tool_call = response['choices'][0]['message']['tool_calls'][0]['function']
print(tool_call['name'])
print(json.loads(tool_call['arguments']))

list_of_ingredients_callback
{‘ingredient_0’: {‘name’: ‘canned beans’, ‘unit’: ‘g’, ‘amount’: 800}, ‘ingredient_1’: {‘name’: ‘onion’, ‘unit’: ‘g’, ‘amount’: 150}, ‘ingredient_2’: {‘name’: ‘tomato sauce’, ‘unit’: ‘ml’, ‘amount’: 400}, ‘ingredient_3’: {‘name’: ‘brown sugar’, ‘unit’: ‘tbsp’, ‘amount’: 2}, ‘ingredient_4’: {‘name’: ‘molasses’, ‘unit’: ‘tbsp’, ‘amount’: 1}, ‘ingredient_5’: {‘name’: ‘mustard’, ‘unit’: ‘tsp’, ‘amount’: 1}, ‘ingredient_6’: {‘name’: ‘bacon’, ‘unit’: ‘g’, ‘amount’: 100}, ‘ingredient_7’: {‘name’: ‘salt’, ‘unit’: ‘tsp’, ‘amount’: 1}, ‘ingredient_8’: {‘name’: ‘black pepper’, ‘unit’: ‘tsp’, ‘amount’: 0.5}}

As we can see, the model called the tool and the schema matches exactly what we specified in tool configuration.

Summary

LLMs text-based interface can be cumbersome when interacting through programming interface.

Luckily, both ChatGPT and Gemini interface provide mechanisms to help with that! Tools are a life-saver when it comes to making these APIs manageable by providing a fixed, reliable JSON-based schema.

Thats not the end of their limitations! By leveraging the tool_choice parameter, you can let LLM decide on whether to output raw text (for example an error message), or choose to call one or multiple of your predfined callbacks, and react based on its choice. This really opens a whole new world of possibilities in human-computer interaction.

--

--