Extract Structured Output from Unstructured Texts with LLM Tool-Calling

Joshua Phuong Le

Published in

MITB For All

11 min readJun 19, 2024

1. INTRODUCTION

a. What is Tool-Calling

Tool calling (or function calling) is a feature of a generative large language model (LLM) to produce outputs that match the user’s defined schema. A function usually requires the input arguments to be populated with the correct datatypes (the schema is satisfied). Hence if the LLM is able to produce these arguments, it actually has produce the output according to the function’s schema. Note that although the word “calling” is there, the tool is not invoked yet by the LLM. Instead, the LLM just tries to populate the arguments required by the function.

b. Why is Tool-Calling Useful?

Based on this background, it is obvious that this ability is useful when you want to build agents that can interact with external tools and APIs. Another extremely useful application is to parse unstructured information to a structure. The benefits are two-fold:

We are very used to seeing large language models generate free-form responses that are conversational and natural. However, unstructured or free-form texts are challenging to extract and parse the relevant pieces of information in appropriate data formats, as required by other applications or processes.
In addition, this approach will help with reducing the hallucination when LLMs get too “creative” and generate more than what we need.

c. Why Frameworks like LangChain is Useful

Many leading LLM providers are exposing APIs that enable reliable tool calling, such as Anthropic, Cohere, Google, Mistral, and OpenAI. They support variants of tool calling features in their APIs. These features typically allow requests to the LLM to include available tools and their schemas. The LLM can then generate responses that include calls to these tools.

Due to this difference in syntax and the requirement to format the tools to the required schema (usually in a JSON format, see subsequent section), it is more convenient to use a common framework like LangChain to act as a universal “wrapper” around these LLMs, and use more “Pythonic” way to define the schema (Pydantic). This makes switching between LLMs more seamless and less prone to error.

In this article, I will explore the tool-calling ability of Mistral AI models in producing structured information from free-form texts. You can do the same thing with other LLM providers like OpenAI and Cohere just by using different Python bindings than the one for Mistral below.

For the list of models with tool-calling support and LangChain integration, refer to this link.

2. SAMPLE PROBLEM STATEMENT

Let’s imagine we are working for a restaurant owner. The owner has developed a customer feedback system that allows customers to write free-form reviews. The owner wants to know which area is more crucial to improve (i.e., receiving mostly negative reviews), and which area is doing a good job (i.e., receiving mostly positive review). These areas are primarily food, service, pricing, ambience, or other aspects that some customers may mention.

The problem is that reviews do not follow any specific structure. Sometimes they mentioned food, sometimes it is about the service, as you can see from the examples below. The data is the YELP review obtained from Kaggle.

sample_text = """
Was it worth the 21$ for a salad and small pizza? Absolutely not! Bad service. Maybe the guys grandma died I don't know. I want to tell you what really made me mad about the experience. We order the small pizza and salad and the guys could have cared less and took our $ and we sat down. We were looking around and hmm, there's a sign saying "x large pizza and large salad only 23$". Wow that would have been nice if the guy told us that. I left hungry, mad and unsatisfied. 
To the owner: teach your employees the value of upselling and telling the specials. Something so small can affect a customers experience negatively. 
And your salads are severely overpriced 
Won't go back unless I'm desperate.
"""

sample_text_2 = """
Drop what you're doing and drive here. After I ate here I had to go back the next day for more.  The food is that good.
This cute little green building may have gone competely unoticed if I hadn't been driving down Palm Rd to avoid construction.  While waiting to turn onto 16th Street the "Grand Opening" sign caught my eye and my little yelping soul leaped for joy!  A new place to try!
It looked desolate from the outside but when I opened the door I was put at easy by the decor, smell and cleanliness inside.  I ordered dinner for two, to go.  The menu was awesome.  I loved seeing all the variety: poblano peppers, mole, mahi mahi, mushrooms...something wrapped in banana leaves.  It made it difficult to choose something.  Here's what I've had so far: La Condesa Shrimp Burro and Baja Sur Dogfish Shark Taco.  They are both were very delicious meals but the shrimp burro stole the show.  So much flavor.  I snagged some bites from my hubbys mole and mahi mahi burros- mmmm such a delight.  The salsa bar is endless.  I really stocked up.  I was excited to try the strawberry salsa but it was too hot, in fact it all was, but I'm a big wimp when it comes to hot peppers. The horchata is handmade and delicious.  They throw pecans and some fruit in there too which is a yummy bonus!
As if the good food wasn't enough to win me over the art in this restaurant sho did!  I'm a sucker for Mexican folk art and Frida Kahlo is my Oprah.  There's a painting of her and Diego hanging over the salsa bar, it's amazing.  All the paintings are great, love the artist.
"""

For the first example, we want to produce something like the JSON-like structure below, where there is a short one-sentence summary for the owner to understand the context, the rating of different restaurant aspects, as well as other things besides these aspects that is good for the owner to know.

{'summary': 'Customer was unhappy with the food, service, and pricing, and felt the salads were overpriced.',
 'food': 'negative',
 'service': 'negative',
 'price': 'negative',
 'ambience': 'none',
 'other': 'The customer suggests teaching employees about upselling and informing customers about specials.'}

From this you can see that a LLM will not only be asked to do the structure parsing, but also summarization and classification at the same time.

3. TOOL CALLING WITH MISTRAL MODELS

Luckily for us, the problem above is quite easily solvable by modern LLMs. First of all, we define a Pydantic model to serve as a convenient interface to build tools for the LLM. Each attribute of the class is treated as an argument for the LLM to fill in later during invocation. The field descriptions tell the LLM what to do for each field, so you can “embed” the reasoning instruction here. The enum argument is for you to limit the options that the LLM is allowed to make (essentially making it a classification task for each field). You can also specify if the field is optional (with Optional) or required (by default).

In addition, we also define a prompt template that provides the initial conditioning for the model as well as the placeholder for any review text under user_input field.

from langchain_mistralai.chat_models import ChatMistralAI
from langchain_core.pydantic_v1 import BaseModel, Field


MISTRAL_API_KEY = os.getenv("MISTRAL_SMALL_API_KEY")

class ReviewInfo(BaseModel):
    """Information extracted from the text."""
    summary: str = Field(
        description="A one-sentence summary of the review, maximum 50 words."
    )
    food: str = Field( 
        description="Classify the customer sentiment regarding the food of the restaurant in the review as positive, negative, or neutral. If there is no mention of food, return none.",
        enum=["positive", "negative", "neutral", "none"]
    )
    service: str = Field( 
        description="Classify the customer sentiment regarding the service of the restaurant in the review as positive, negative, or neutral. If there is no mention of service, return none.",
        enum=["positive", "negative", "neutral", "none"]
    )
    price: str = Field( 
        description="Classify the customer evaluation regarding the pricing of the restaurant in the review as positive, negative, or neutral. If there is no mention of pricing, return none.",
        enum=["positive", "negative", "neutral", "none"]
    )
    ambience: str = Field(
        description="Classify the customer sentiment regarding the ambience of the restaurant in the review as positive, negative, or neutral. If there is no mention of ambience, return none.",
        enum=["positive", "negative", "neutral", "none"]
    )
    other: Optional[str] = Field(
        description="Extract any other useful information from the review text to the business owner. If there is no other information, return an empty string."
    )

review_prompt = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            "You are an expert extraction algorithm, specialized in restaurant review and customer sentiment analysis."
            "Only extract relevant information from the text as specified by the provided JSON schema. Do not generate any new information or exrta characters outside of the JSON schema."
        ),
        ("human", "{user_input}"),
    ]
)

a. Quick Implementation with built-in wrapper

langChain implements a quick wrapper called .with_structured_output()to solve this problem. Before we call it, let’s take a look at what is under the hood.

llm = ChatMistralAI(model="open-mistral-7b")
print(llm.with_structured_output(schema=ReviewInfo).first.kwargs)

The output is the JSON schema that aligns with what Mistral expects (reference link). The Pydantic model details are populated under the tools of the LLM as a single function called ReviewInfo, the same name as the class name, and different fields such as summary and food are stored under the parameters.properties key of the function specifications.

Additionally, the field datatypes, whether or not it is optional, and the allowed options for each field, are all parsed nicely to a format compatible with the LLM API.

An important point is that the tool_choice argument is set to “any”. This forces the LLM to always use this tool.

From all of the points above, you can see that you can keep working with Python and LangChain does the heavy (and tedious) job of creating the JSON format.

{
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "ReviewInfo",
        "description": "Information extracted from the text.",
        "parameters": {
          "type": "object",
          "properties": {
            "summary": {
              "description": "A one-sentence summary of the review, maximum 50 words.",
              "type": "string"
            },
            "food": {
              "description": "Classify the customer sentiment regarding the food of the restaurant in the review as positive, negative, or neutral. If there is no mention of food, return none.",
              "enum": [
                "positive",
                "negative",
                "neutral",
                "none"
              ],
              "type": "string"
            },
            "service": {
              "description": "Classify the customer sentiment regarding the service of the restaurant in the review as positive, negative, or neutral. If there is no mention of service, return none.",
              "enum": [
                "positive",
                "negative",
                "neutral",
                "none"
              ],
              "type": "string"
            },
            "price": {
              "description": "Classify the customer evaluation regarding the pricing of the restaurant in the review as positive, negative, or neutral. If there is no mention of pricing, return none.",
              "enum": [
                "positive",
                "negative",
                "neutral",
                "none"
              ],
              "type": "string"
            },
            "ambience": {
              "description": "Classify the customer sentiment regarding the ambience of the restaurant in the review as positive, negative, or neutral. If there is no mention of ambience, return none.",
              "enum": [
                "positive",
                "negative",
                "neutral",
                "none"
              ],
              "type": "string"
            },
            "other": {
              "description": "Extract any other useful information from the review text to the business owner. If there is no other information, return an empty string.",
              "type": "string"
            }
          },
          "required": ["summary", "food", "service", "price", "ambience"]
        }
      }
    }
  ],
  "tool_choice": "any"
}

Now the schema is nicely produced, what happens if you invoke the LLM as part of the chain? Mistral documentation gives a easy to understand summary:

The goal in this step is not for the Mistral model to run the function directly. It’s to 1) determine the appropriate function to use , 2) identify if there is any essential information missing for a function, and 3) generate necessary arguments for the chosen function.

Let’s use the sample_text in place of the user_input placeholder of the prompt we defined above and invoke the chain with it.

chain = review_prompt | llm.with_structured_output(schema=ReviewInfo)
review_structured = chain.invoke({"user_input": sample_text})
review_structured.dict()

The output is what we expect: the LLM supports function calling, hence it can “parse” the information in the text to the structure we want.

{'summary': 'The customer was dissatisfied with the service, pricing, and food quality.',
 'food': 'negative',
 'service': 'negative',
 'price': 'negative',
 'ambience': 'none',
 'other': 'The customer suggests that the staff should be trained in upselling and communicating specials.'}

b. With Native Tool Calling

However, we are more interested in how the tool is used behind the scene. If you use the method with_structured_output() above, it is not clear, as things are abstracted away from you. Let’s go to an alternative route of explicit tool calling and examine each step:

Firstly, we make the Pydantic schema a tool and bind it to the LLM with .bind_tools() method to create a version of the LLM equipped with the tool. What it does is reformatting the Pydantic model to what the LLM expects (Mistral format as above), and passes the format to the model whenever it is invoked.
Next, we fill the prompt template (with the system instruction) with the data from the sample text to form a full-fledged query.
Finally we can invoke the tool-equipped LLM object with the query and get back the AIMessage object.
Note that I switched to another model “Mistral small”, which also supports tool-calling. For the list of supported models, please check Mistral documentation.

tools = [ReviewInfo]
llm = ChatMistralAI(model="mistral-small-2402")
llm_with_tools = llm.bind_tools(tools)
query = review_prompt.format(user_input=sample_text)
ai_message = llm_with_tools.invoke(query)
ai_message.additional_kwargs['tool_calls'][0]

Here, we can examine the response: the Pydantic schema resides under the arguments of the function. Note that we only gave the LLM one tool so we take the index zero of tool_calls field. What it means is that our LLM has performed the 3 steps stated above in the documentation:

Determine the appropriate function to use: ReviewInfo
Identify if there is any essential information missing for a function: no
Generate necessary arguments for the chosen function: shown under the arguments key

Note that the free response under “other” is slightly different that what you saw previously. This is of course due to the indeterministic nature of generative content allowed inside this field. The other fields are “controlled” by the enum argument so that the model only choose one from the given options.

{'id': 'ORAHkXsiy',
 'function': {'name': 'ReviewInfo',
  'arguments': 
   '{
     "summary": "The reviewer was disappointed with the experience due to poor service, expensive salads, and lack of information about specials.",
     "food": "negative",
     "service": "negative",
     "price": "negative",
     "ambience": "none",
     "other": "The reviewer mentioned a missed upselling opportunity and suggested improving customer service."
   }'
  }
}

4. CONCLUSION

As you can see, tool-calling is extremely helpful not only in more advanced agentic applications, but also in simpler but potentially more commonly seen information extraction + reasoning tasks. It also helps reduce the issue of hallucination, which often happens when you simply use natural language to tell the LLM to output the schema. I hope you found this simple sharing useful.

5. References

Tool Calling with LangChain

https://python.langchain.com/v0.2/docs/concepts/#tools

Tool/function calling | 🦜️🔗 LangChain

Function calling | Mistral AI Large Language Models

https://platform.openai.com/docs/guides/function-calling

https://superface.ai/blog/llm-function-calling

https://community.sap.com/t5/artificial-intelligence-and-machine-learning-blogs/function-calling-llms-for-sap-structured-outputs-and-api-calling/ba-p/13698273

Disclaimer: All opinions and interpretations are that of the writer, and not of MITB. I declare that I have full rights to use the contents published here, and nothing is plagiarized. I declare that this article is written by me and not with any generative AI tool such as ChatGPT. I declare that no data privacy policy is breached, and that any data associated with the contents here are obtained legitimately to the best of my knowledge. I agree not to make any changes without first seeking the editors’ approval. Any violations may lead to this article being retracted from the publication.