DATA STORIES | GENAI | KNIME ANALYTICS PLATFORM

Exploring the Benefits of OpenAI’s Structured Outputs in KNIME

Learn what Structured Outputs are, why they matter and see them in action in 5 use cases

Martin Dieste

Published in

Low Code for Data Science

12 min readAug 14, 2024

In this article, I will walk you through the recently announced feature of structured outputs by OpenAI and how it can significantly benefit various use cases in KNIME.

We will explore what structured outputs are, their importance in leveraging large language models (LLMs), and practical applications within KNIME.

Understanding Structured Outputs

What Are Structured Outputs?

Structured outputs allow developers to define a specific response format, known as JSON schemas, that the large language model (LLM) must follow when generating responses. For instance, if we require the LLM to always include a field labeled “key” in its response, it will consistently output the data within the specified format (using curly brackets for JSON formatting).

Why Is This Important?

In applications involving LLMs, take a chat application that accesses current weather data as an example, structured outputs are crucial. The LLM needs to reliably extract data from user queries (e.g., location and date) and format this information in a way that’s compatible with external APIs. Historically, achieving this level of reliability was challenging.

Performance Improvements in Structured Outputs

The idea of forcing a large language model to respond in a specific format is not new. Historically, the first attempts were made by simple prompt engineering but the performance was very poor. As time progressed and as models started to get fine-tuned for this use case, results improved and later on also features like JSON mode were introduced to make things like tool calling possible. The below image was taken from the OpenAI blog post about this topic. On the right hand side, you can see that the green bar in the bar chart shows 100% compliance if the new feature structured outputs is used (with strict set to true). This is a significant improvement from the early days, when models like GPT-4–0613 were only 40% or less reliable.

This new level of reliability will allow developers to use LLMs more comfortably and confidently in their applications.

Performance development — LLMs responding in pre-defined format.

Practical Use Cases in KNIME

Since the release of the Structured Outputs feature, I’ve been trying to identify potential use cases and to implement them as minimal viable products in KNIME. And I’m happy to say that I was rather successful.

I also recorded a video and published it on my YouTube Channel — feel free to watch and subscribe, if you want to see the below use cases in action:

Download the workflow showcasing the five use cases below from the KNIME Community Hub.

Use Case 1: Chat Application with Structured Outputs

To start with, I implemented one of the examples provided in the OpenAI blog post. It’s a chat application that allows question answering. The key point here is that the answer is split into two parts. One is the final answer, and the second one is a list of reasoning steps that show exactly how the large language model produced at the final answer. It has been implemented as a data app with a composite view, and in the composite view you have input fields for things like an OpenAI API key, a model selection, a system message as well as a user query.

User Interface — Chat App that provides final answer and separate reasoning steps.

Once the request is sent to the API via the update button, a response is returned from the large language model in accordance with a JSON schema that requires two fields: the “answer” and the “reasoning_steps” containing the list of reasoning steps. I used the following schema inside the data app:

response_format = {
    "type": "json_schema",
    "json_schema": {
        "name": "reasoning_schema",
        "strict": True,
        "schema": {
            "type": "object",
            "properties": {
                "reasoning_steps": {
                    "type": "array",
                    "items": {
                        "type": "string"
                    },
                    "description": "The reasoning steps leading to the final conclusion."
                },
                "answer": {
                    "type": "string",
                    "description": "The final answer, taking into account the reasoning steps."
                }
            },
            "required": ["reasoning_steps", "answer"],
            "additionalProperties": False,
        }
    }
}

The corresponding response from the LLM, provided it matches the defined response format, is parsed into the following to tables that are shown in the data app:

User Interface — Table containing Final Answer.

User Interface — Table containing reasoning steps.

I felt quite positive after implementing this very simple example, but I also recognized that this data app was quite rigid as it did not allow to easily modify the provided schema. That’s a weakness that I tried to remedy in my second use case.

Use Case 2: Extracting Structured Data from Unstructured Sources

I decided to try and build a component that is used for data processing within the workflow, rather than for interaction in a data app. I also wanted to take up the challenge of building in the flexibility to provide a custom schema that is not hard-coded into my Python code. That’s why I decided to remove the interactivity and UI offered by component composite views. Instead, the component of this second use case relies on Configuration nodes, whose settings are exposed in the component configuration dialog.

The required inputs remain largely the same. However, in this case, there is one additional field, that is the valid JSON schema. The user can provide a schema, and the component will output a table that contains a column for each defined parameter under the “properties” object.

You may notice that the input query field containing the user query (e.g., unstructured data like text) is missing. This is because the user query with text is provided as an input table using the Table Creator node.

I created the dummy data as shown in the table above and the goal was to extract the customer name, the contract value, the date and the item that was purchased. I also tried to trick the model by using different terms to refer to the contract value. For example, in the first row I referred to it as “contract price”, whereas in the second row as “contract value”. I did the same for the products that were purchased. In the first row, I referred to it as “purchased”, whereas in the second row as “product”.

After running the data through the new modified component, I was quite pleased with the outcome as it was very accurate and reliable. I was especially surprised that the dates provided were parsed in the same format, making it easy to work with it in potential downstream nodes.

As the artificial data I’ve created was very, very simplistic, I decided to use the same component again on data for the Just KNIME It Season 3 — Challenge 13, where we had to extract contract data from 100 investment agreements in PDF format.

Input Data Excerpt — Just KNIME it Season 3 — Challenge 13.

Output Data — Component used on two longer PDF documents from Just KNIME it Season 3 — Challenge 13.

The Structured Output feature worked again very well also on a larger corpus of documents. So I was ready to start exploring the third use case.

Use Case 3: Workflow Routing Based on Structured Outputs

Next, I wanted to explore the feature to constrain the values that a model can respond with, using a defined list provided in the schema. This is achieved by adding the “enum” attribute to the object defined in the JSON schema, and then providing a list of response options.

I decided to simply reuse my component from the previous example and to redefine the schema.

Schema with “enum” parameter which limits the response options for the attribute “documentType”.

I found different examples of invoices and purchase orders online from different datasets (17 in total), and I tried to make OpenAI classify them as “invoice” or “purchase orders” for me.

This worked very well and the model did not make a single error. I then used the new Expression node with the in-built switch function to add a new column named Routing that would show a 0 if it’s an invoice, 1 if it’s a purchase order, and 2 if it’s anything else in case the model hallucinates.

Routing workflow data based on LLM classification.

The above screenshot illustrates the rest of the workflow where I then iterated over each row inside a Chunk Loop using the routing column as a variable to define which branch a Case Switch Start node should activate. The top port leads to processing an invoice, the middle port leads to processing a purchase order, and the bottom port processes an exception in case the LLM doesn’t stick to the provided schema. The latter case did not happen a single time in this example and in all other attempts that I’ve run.

Use Case 4: Email Categorization and Summarization

After experimenting a fair bit with topics that I personally would not really be using, I decided to challenge myself and build something that I may actually benefit from.

I used the KNIME Email Processing extension to connect to my YouTube channel’s Gmail account, generated some artificial emails using ChatGPT, and sent them to my Gmail account from my business account.

After fetching these emails, it occurred to me that just feeding emails that people sent to me into the OpenAI API may not be the best idea I have ever had. So from a data privacy perspective, I decided to find a solution to only send anonymized data.

Luckily, since the release of KNIME version 5.3, there is an extension available that is called KNIME Presidio extension. And Microsoft Presidio just does that. It analyzes a corpus of input text for named entities, identifies them, and then anonymizes any of these identified named entities.

Once the Presidio Analyzer identifies named entities, it can easily be connected to a Presidio Anonymizer to replace entities with placeholder values:

Example — original Email (left) vs. Anonymised Version (right).

With the data privacy concerns alleviated, I was ready to modify the schema of my data processing component and send the anonymized emails their way to OpenAI. My goal was to categorize the emails based on their content and to also extract a concise summary. I reflected these changes in the system message and in the JSON schema that I entered into the component.

Component configuration — Email Classification and Summarization.

This once again worked without any errors or problems, so very quickly it was time for de-anonymization. Luckily, the Presidio extension once again makes this very easy by connecting the gray output port of the Presidio Anonymizer to the gray input port of the Presidio Deanonymizer. That is, providing the table with the anonymized data to the Deanonymizer:

For future development, I’m thinking of combining this use case with use case 3, and to define certain downstream processing that is conditional on the email classification. For example, a lead could be sent to a branch that extracts customer name and contact details using this component with a different schema and then inserts it into my CRM database.

Use Case 5: LLM Routing for Cost-Effective Query Handling

Sticking to the pattern of increasing complexity, I decided to try if it’s possible to replicate what I’ve read a certain Python package is doing.

I’ve heard about RouteLLM, which takes the user input and assesses how difficult it is to answer the question. Then, based on that assessment, RouteLLM picks the best-suited model out of a range of options for answering the user request. Apparently, using this approach it was possible to achieve 90% of the quality of GPT-4o at only 20% of the costs. That’s what I call a very fair trade-off.

The KNIME team kindly created the interface of a chat app and made it available on the KNIME Community Hub. I decided to start working with that and to modify it to build in an LLM-routing logic.

Once again I was able to reuse my component from the previous examples and just to modify the system prompt and the JSON schema.

System Prompt and Schema for LLM routing.

I added this component inside of the chat app so that any input from a user would first be routed to GPT-4o-mini for assessment to determine which model is best suited. In the system prompt, I gave some guidance as to when to use which model. In total, I gave four model options: GPT-4o-mini as the most powerful, and then three locally hosted models, one coding model (codeqwen), one 8 bn (Llama3.1) parameter model, and then a smaller 4 bn (phi3) parameter model.

Here’s a brief explanation of the modifications that I made inside the chat app component. I numbered the changes I made in the screenshot above.

A user request is sent for assessment to GPT-4o-mini using my component, the system prompt and schema illustrated above.
A mapping table containing model names and corresponding base URLs —i.e., standard OpenAI endpoint for GPT-4o-mini and local Ollama endpoint for the three locally hosted models.
The mapping table is filtered for the model chosen by GPT-4o-mini, and values are turned into flow variables.
A base_url flow variable is used to parameterize the base URL in the OpenAI Authenticator node.
A model flow variable is used to parameterize the model ID in the OpenAI Chat Model Connector node.
The user request is then fed into the Chat Model Prompter node — this sends a request to the chosen model to generate the actual response.
Combine the selected model and its AI-generated response before visualizing it to make it clear to the user, which model is providing the answer.

With all these modifications working as expected, it was now time to experiment inside the chat app and to see if I could find four questions that would be routed to the four different models. The outcome can be seen in the next four images below.

Coding Question answered by locally hosted coding model.

Simple Math question answered by 4b parameter model Phi3.

JSON Extraction handled by Llama3.1 8b model.

Reasoning Problem solved by gpt-4o-mini.

I was very surprised by how well this actually worked. These questions were inspired by Matthew Berman, a YouTuber who is doing videos on large language models. He has a set of questions that he puts each model through to define and assess their performance.

My prompt engineering skills are obviously fairly limited. That said, I’m quite proud that I managed to make this work and got the routing implemented into the chat app.

Final thoughts

Let me conclude this article by saying that it was a lot of fun to experiment with this new feature of OpenAI and I see a lot of future potential for development considering the success I had with the five use cases above.

Let me know your thoughts and I’m also interested to hear about use cases that you might be able to build using my components.

Resources

You can find these demo examples on my KNIME Community Hub and I will list all other relevant links below:

Demo Workflow: https://hub.knime.com/-/spaces/-/~GBQWdd9yTscPA8SA/current-state/
Just KNIME It S3-C13_v2: https://hub.knime.com/-/spaces/-/~UBtVCdlMWG7UGZNQ/current-state/
OpenAI Blog: https://openai.com/index/introducing-structured-outputs-in-the-api/