Solving Math Problems with LLMs: Part 1

Structured Outputs and Effective Prompting

6 min readAug 29, 2024

Introduction

One of the fascinating applications of using LLMs is to solve mathematical problems. This article, the first in a four-part series, explores how we can leverage LLMs to solve mathematics problems, focusing on structured outputs, effective prompting techniques and generating python code execution.

Our journey is inspired by the LLM Zoomcamp 2024 Competition, where participants are challenged to solve high school mathematics problems from the ЕГЭ (Unified State Exam) in Russia using LLMs. The competition’s goal is to develop systems that can solve these problems, showcasing the potential of LLMs. Additionally, we explored techniques effectively pulling step by step solutions and python execution code which could be helpful further in educational and problem-solving contexts.

4-part series:

1st part : Structured Outputs and Effective Prompting — This part focuses on using the Instructor library to create structured outputs from LLMs, ensuring consistent and parseable responses for math problems. We’ll cover how to define a Pydantic model for math solutions and craft effective prompts to guide the LLM.

~~Based on the feedback from the audience this got extended to additional article.

2nd part: Executing Python Code Safely — This section will delve into the process of generating Python code from LLM responses and executing it safely. We’ll cover techniques for sandboxing, error handling, and ensuring the generated code meets our requirements.

3rd part: Multithreading for Efficient Problem Solving — In this part, we’ll explore how to use multithreading to solve multiple math problems concurrently, improving the overall efficiency of our system.

4th part: Error Handling and Retry Mechanisms — The final part will focus on implementing robust error handling and retry mechanisms. We’ll explore how to deal with API errors, incorrect LLM outputs, and execution errors in generated code, ensuring our system is resilient and reliable.

The Challenge of Consistent Outputs

One of the primary challenges when working with LLMs is obtaining consistent, structured outputs. LLMs are trained on vast amounts of text data and can generate human-like responses, but they don’t inherently produce responses in a specific format or structure. This can lead to inconsistencies in output format, making it difficult to parse and use the generated information programmatically.

For our math problem solver, we need more than just the final answer. We want:

1. The numerical answer(s) to the problem
2. A step-by-step explanation of how to solve the problem
3. Python code that implements the solution

To put it simply, we’re asking the LLM to solve a math problem and give us the answer, explain how it got there, and write a computer program to solve similar problems. But because LLMs are designed to be flexible, they might give this information in different ways each time. Our job is to make sure we always get the information in the same, easy-to-use format.

Instructor: Harnessing the Power of Pydantic

To address this challenge, we turn to a powerful library called Instructor. Instructor is a Python library that leverages Pydantic, a data validation library, to enforce structure on LLM outputs.

Why Instructor?

Instructor acts as a bridge between the free-form text generation of LLMs and the structured data we need for our application. It allows us to define Pydantic models that specify the exact structure we want our LLM outputs to follow. When we make a request to the LLM through Instructor, it ensures that the response adheres to our defined structure.

The benefits of using Instructor include:

1. Consistency: Every response follows the same structure, making it easy to process programmatically.
2. Type Safety: Pydantic models provide runtime type checking, reducing the risk of type-related errors. This means that Pydantic checks if the data it receives matches the types we’ve specified in our model. For example, if we expect a number but get text, Pydantic will catch this error, helping us avoid issues later in our code.
3. Validation: We can add custom validators to ensure the content meets our specific requirements.

Defining Our Math Solution Model

Let’s define a Pydantic model for our math problem solutions:

from pydantic import BaseModel, Field

class MathSolution(BaseModel):
  answer: str = Field(…, description="The final numerical answer to the problem")
  step_by_step: str = Field(…, description="A detailed, step-by-step explanation of how to solve the problem")
  python_code: str = Field(…, description="Python code that implements the solution and returns the answer")

This model defines the structure we want our LLM to follow when providing solutions. Let’s break down each field:

- answer: This field will contain the final numerical answer to the problem. We use a string type to allow for fractional answers or multiple values.
- step_by_step: This field will contain a detailed explanation of how to solve the problem, which is crucial for understanding the reasoning behind the solution.
- python_code: This field will contain executable Python code that solves the problem, allowing us to verify the solution programmatically.

By using this model with Instructor, we ensure that every response from our LLM will contain these three elements, properly labeled and structured.

The `description` parameter in each Field is crucial for guiding the LLM. Here’s how it works:

1. When Instructor sends a request to the LLM, it includes these descriptions in the prompt.
2. The LLM uses these descriptions to understand what kind of content should go in each field.
3. This helps the LLM structure its response correctly, ensuring each part of the answer matches what we’re expecting.
4. If the LLM’s response doesn’t match our model (e.g., missing a field or wrong data type), Instructor will raise an error, allowing us to handle it appropriately.

So, the `description` parameter serves as both a guide for the LLM and a form of documentation for our code, making it clear what each field represents.

Crafting Effective Prompts

With our model in place, the next crucial step is crafting an effective prompt. The prompt is the instruction we give to the LLM, guiding it to produce the desired output. For our math problem solver, we need a prompt that:

1. Clearly states the problem to be solved
2. Instructs the LLM to provide the answer, step-by-step explanation, and Python code
3. Provides any necessary context or constraints

Here’s an example of how we might structure our prompt:

def create_math_prompt(problem_text: str) -> str:
  return f"""
  Solve the following high school mathematics problem:
  {problem_text}
  Provide your solution in the following format:
  1. The final numerical answer to the problem
  2. A detailed, step-by-step explanation of how to solve the problem
  3. Python code that implements the solution and returns the answer
  Ensure that your Python code is executable and follows these guidelines:
  - Use only Python's built-in functions and the math module
  - Include comments explaining each step
  - Handle potential edge cases or invalid inputs
  - Return the final answer as the last line of the function
  Remember, this is a high school level problem, so advanced mathematical concepts or libraries should not be necessary.
  """

This prompt does several important things:

1. It clearly presents the problem to be solved.
2. It explicitly requests the three components we need (answer, explanation, and code).
3. It provides guidelines for the Python code, ensuring it’s useful and executable.
4. It reminds the LLM of the context (high school level mathematics), which helps prevent overly complex solutions.

Putting It All Together

Now that we have our Pydantic model and our prompt function, we can use Instructor to make a request to our LLM. Here’s how it might look:

import instructor
from openai import OpenAI

# Initialize the OpenAI client with Instructor
client = instructor.patch(OpenAI())

def solve_math_problem(problem_text: str) -> MathSolution:
  prompt = create_math_prompt(problem_text)
  response = client.chat.completions.create(
  model="gpt-4", # or whichever model you're using
  response_model=MathSolution,
  messages=[
  {"role": "system", "content": "You are an expert mathematics tutor."},
  {"role": "user", "content": prompt}
  ])
  return response
# Example usage

problem = "A rectangle has a length that is 3 units longer than its width. If the perimeter of the rectangle is 26 units, what are the dimensions of the rectangle?"
solution = solve_math_problem(problem)
print(f"Answer: {solution.answer}")
print(f"\nStep-by-step solution:\n{solution.step_by_step}")
print(f"\nPython code:\n{solution.python_code}")

In this setup:

1. We use Instructor to patch the OpenAI client, enabling structured outputs.
2. Our `solve_math_problem` function takes a problem text, creates a prompt, and sends it to the LLM.
3. We specify our `MathSolution` model as the `response_model`, ensuring the output follows our defined structure.
4. We include a system message to set the context for the LLM.

Conclusion and Next Steps

In this article, we’ve explored how to use Instructor and effective prompting to get structured outputs from LLMs for solving math problems. This approach provides consistency and reliability in LLM responses, making it easier to build robust applications.

However, solving math problems at scale introduces new challenges. In the next article of this series, we’ll dive into ways to effectively execute LLM generated python code, store execution errors and use it for retry mechanism.

Check out Part 2: Executing Python Code Safely that will cover process of running generated python code at scale!

[GitHub link to reproducible example]