Solving Math Problems with LLMs: Part 1.1

Deep Dive into Instructor’s Inner Workings

4 min readAug 30, 2024

Introduction

In our previous article, we introduced Instructor and its role in obtaining structured outputs from Large Language Models (LLMs) for our use case of solving math problems. Now, let’s take a deeper dive into how Instructor works its magic. We’ll explore the step-by-step process of how Instructor generates schemas for function calling, passes them to the LLM, and how Pydantic validates the output.

Step 1: Defining the Pydantic Model

The journey begins with defining a Pydantic model. This model serves as the blueprint for the structure we want our LLM output to follow. Let’s use our MathSolutionmodel as an example

from pydantic import BaseModel, Field

# Define the MathSolution model
class MathSolution(BaseModel):
    answer: str = Field(..., description="The final numerical answer to the problem")
    step_by_step: str = Field(..., description="A detailed, step-by-step explanation of how to solve the problem")
    python_code: str = Field(..., description="Python code that implements the solution and returns the answer")

Step 2: Generating the Function Schema

When we use Instructor to make a request to the LLM, it doesn’t just send our prompt as-is. Instead, it generates a function-like schema based on our Pydantic model above. Here’s how this process works:

Instructor introspects our Pydantic model using Pydantic’s built-in model_json_schema() method.
It then wraps this schema in a function-like structure that the LLM can understand.

Here’s what the generated schema looks like for the MathSolution class:

{
    "properties": {
        "answer": {
            "description": "The final numerical answer to the problem",
            "title": "Answer",
            "type": "string"
        },
        "step_by_step": {
            "description": "A detailed, step-by-step explanation of how to solve the problem",
            "title": "Step By Step",
            "type": "string"
        },
        "python_code": {
            "description": "Python code that implements the solution and returns the answer",
            "title": "Python Code",
            "type": "string"
        }
    },
    "required": [
        "answer",
        "step_by_step",
        "python_code"
    ],
    "title": "MathSolution",
    "type": "object"
}

This schema tells the LLM exactly what structure we expect the output to have, including the types and descriptions of each field.

Step 3: Sending the Request to the LLM

Instructor then sends this function schema along with our prompt to the LLM. Behind the scenes, Instructor uses OpenAI’s function calling feature to achieve this. Here’s how a typical request might look:

import instructor
from openai import OpenAI

client = instructor.patch(OpenAI())

response = client.chat.completions.create(
    model="gpt-4o-mini",  # or whichever model you're using
    response_model=MathSolution,
    messages=[
        {"role": "system", "content": "You are an expert mathematics tutor."},
        {"role": "user", "content": prompt}
    ]
)

By specifying the response_model , Instructor is essentially telling the LLM: “Please format your response according to this structure.” Instructor patches the OpenAI client to handle the response_model parameter, which it uses to generate the appropriate function call for the LLM.

LLM Response

The LLM, understanding the function schema, structures its response accordingly. The response might look something like this:

{
"answer": "Length = 14 units, Width = 11 units",
"step_by_step": "1. Let the width of the rectangle be represented by 'w' units.\n2. According to the problem, the length is 3 units longer than the width, so we can express the length as 'l = w + 3' units.\n3. The formula for the perimeter (P) of a rectangle is given by: P = 2l + 2w.\n4. The problem states that the perimeter is 26 units, so we can set up the equation: 2(w + 3) + 2w = 26.\n5. Simplifying this equation:\n   - 2w + 6 + 2w = 26\n   - 4w + 6 = 26\n   - 4w = 26 - 6\n   - 4w = 20\n   - w = 20 / 4\n   - w = 5 units (this is the width)\n6. Now, substitute w back into the equation for length:\n   - l = w + 3 \n   - l = 5 + 3 = 8 units (this is the length)\n7. Therefore, the dimensions of the rectangle are: Length = 8 units, Width = 5 units.",
"python_code": "def find_rectangle_dimensions(perimeter):\n    # Check if the perimeter is even, as the dimensions must be integers\n    if perimeter % 2 != 0:\n        return \"Perimeter must be an even number for integer dimensions.\"\n    # Let the width be 'w'\n    # The formula for perimeter of rectangle: P = 2(l + w)\n    # Since length l = w + 3 (from the problem statement)\n    # We have: 2(w + 3 + w) = perimeter\n    # Simplifying: 2(2w + 3) = perimeter\n    # 4w + 6 = perimeter\n    # Rearranging gives:\n    w = (perimeter - 6) / 4\n    # Calculate the width\n    w = (perimeter - 6) / 4\n    # If width is negative, return an error\n    if w < 0:\n        return \"Invalid dimensions for the given perimeter.\"\n    # Calculate length\n    l = w + 3\n    return (l, w)\n\n# Example usage:\nresult = find_rectangle_dimensions(26)\nresult"
}

Step 4: Parsing and Validating with Pydantic

Instructor takes the LLM’s response and attempts to parse it into our MathSolution Pydantic model. Here’s what happens:

1. Instructor passes the LLM output to the Pydantic model constructor.
2. Pydantic performs validation:
— It checks that all required fields are present.
— It ensures each field’s value matches the specified type (string in this case).
— If we had any additional validators defined in our model, they would be run at this stage.

If the validation succeeds, Instructor returns a MathSolutionobject. If it fails, Pydantic raises a validation error, which Instructor can catch and handle.

Conclusion

Understanding the inner workings of Instructor reveals the elegance of its approach to structuring LLM outputs. By leveraging Pydantic’s powerful data validation capabilities and the function calling feature of advanced LLMs, Instructor provides a robust solution to one of the key challenges in working with LLMs: obtaining consistent, structured outputs.

[GitHub link to reproducible example]