Building a Self-Healing SQL Query Generator Agent with Pydantic.ai and Groq

Plaban Nayak
The AI Forum
Published in
19 min readDec 21, 2024
Self Healing SQL Agent Workflow

Introduction

In today’s data-driven world, the ability to query databases efficiently is crucial. However, writing SQL queries can be challenging, especially for those not well-versed in database operations. This article introduces a self-healing SQL query generator that converts natural language requests into SQL queries, making database interactions more accessible and robust.

What is Pydantic AI?

Pydantic AI is an agentic framework designed to simplify the process of creating and managing AI agents. It builds upon the solid foundation of the Pydantic library, which is widely used for data validation and type checking in Python projects. With over 2 million downloads per month, Pydantic has proven its reliability and efficiency in the developer ecosystem.

Key Features and Advantages

1. Ease of Use

One of the standout features of Pydantic AI is its simplicity. Creating an agent is straightforward, requiring just a few lines of code. You can define an agent by specifying the model name and a system prompt, making it accessible even for those new to AI development.

2. Built on Vanilla Python

Unlike some other frameworks that abstract away much of the underlying code, Pydantic AI is built on vanilla Python. This approach gives developers full control and visibility into the inner workings of their agents, making it easier to customize and debug when needed.

3. Dependency Injection

Pydantic AI incorporates best practices from software engineering, including dependency injection. This feature allows for more flexible and maintainable code, especially in larger projects.

4. Integration with Observability Tool

For those concerned with observability, Pydantic AI offers seamless integration with Logfire. This tool allows for comprehensive tracing of prompts, costs, and other important metrics related to your AI agents.

5. Function Calling and Structured Output

Pydantic AI excels in function calling and generating structured output. You can easily define object models for your agent’s responses, ensuring that the output matches your specific requirements. This is particularly useful when working with APIs or other systems that expect data in a certain format.

6. Cost Tracking

Understanding the resource usage of your AI agents is crucial, especially in production environments. Pydantic AI makes this easy by providing built-in cost tracking features, allowing you to monitor token usage and associated costs for each interaction.

Technology Stack

This implementation leverages several modern technologies:

  • Pydantic.ai: For building AI agents with strong type safety
  • Groq: A fast and efficient LLM provider
  • SQLite: Lightweight, serverless database
  • aiosqlite: Async SQLite driver for Python
  • Python 3.11+: For modern async/await support

Workflow Description

The SQL Agent follows a sophisticated workflow:

  • User Input: Receives natural language query requests
  • Query Generation: Converts requests to SQL using LLM
  • Validation: Ensures query syntax and structure
  • Execution: Tests the query against the database
  • Self-Healing: Retries with error context if execution fails
  • Result Delivery: Returns results or detailed error messages

Code Walkthrough

1. Core Components

The implementation consists of several key components:

# Models for type safety
class Success(BaseModel):
type: str = Field("Success", pattern="^Success$")
sql_query: Annotated[str, MinLen(1)]
explanation: str

class InvalidRequest(BaseModel):
type: str = Field("InvalidRequest", pattern="^InvalidRequest$")
error_message: str

Defining dependencies

@dataclass
class Deps:
conn: aiosqlite.Connection
db_schema: str = DB_SCHEMA

2. Agent Configuration

sqlagent = Agent(
openai_model,
deps_type=Deps,
retries=3,
result_type=Response,
system_prompt=("""You are a proficient Database Administrator having expertise in generating SQL queries. Your task is to convert natural language requests into SQL queries for a SQLite database.
You must respond with a Success object containing a sql_query and an explanation.

Database schema:
{DB_SCHEMA}

Format your response exactly like this, with no additional text or formatting:
{{
"type": "Success",
"sql_query": "<your SQL query here>",
"explanation": "<your explanation here>"
}}

Examples:
User: show me all users who have published posts
{{
"type": "Success",
"sql_query": "SELECT DISTINCT users.* FROM users JOIN posts ON users.id = posts.user_id WHERE posts.published = TRUE",
"explanation": "This query finds all users who have at least one published post by joining the users and posts tables."
}}

User: count posts by user
{{
"type": "Success",
"sql_query": "SELECT users.name, COUNT(posts.id) as post_count FROM users LEFT JOIN posts ON users.id = posts.user_id GROUP BY users.id, users.name",
"explanation": "This query counts the number of posts for each user, including users with no posts using LEFT JOIN."
}}

If you receive an error message about a previous query, analyze the error and fix the issues in your new query.
Common fixes include:
- Correcting column names
- Fixing JOIN conditions
- Adjusting GROUP BY clauses
- Handling NULL values properly

If you cannot generate a valid query, respond with:
{{
"type": "InvalidRequest",
"error_message": "<explanation of why the request cannot be processed>"
}}

Important:
1. Respond with ONLY the JSON object, no additional text
2. Always include the "type" field as either "Success" or "InvalidRequest"
3. All queries must be SELECT statements
4. Provide clear explanations
5. Use proper JOIN conditions and WHERE clauses as needed
""")
)

3. Query Execution and Retry Logic

async def query_database(prompt: str, conn: aiosqlite.Connection) -> Response:
max_retries = 3
last_error: Optional[str] = None

for attempt in range(max_retries):
try:
result = await agent.run(prompt, deps=deps)
success, error = await execute_query(conn, result.sql_query)
if success:
return result

last_error = error
prompt = f"""
Previous query failed with error: {error}
Please generate a corrected SQL query for the original request: {prompt}
"""
except Exception as e:
last_error = str(e)
continue

Code Implementation

  1. Install required dependencies
%pip install 'pydantic-ai-slim[openai,groq,logfire]'
%pip install aiosqlite

2. Setup API Keys

from google.colab import userdata
import os
os.environ["OPENAI_API_KEY"] = userdata.get('OPENAI_API_KEY')
os.environ["GROQ_API_KEY"] = userdata.get('GROQ_API_KEY')

3. Import required dependencies

from pydantic_ai.models.openai import OpenAIModel
from pydantic_ai.models.groq import GroqModel
import aiosqlite
import asyncio
from typing import Union, TypeAlias, Annotated,Optional,Tuple
from dataclasses import dataclass
from pydantic import BaseModel, Field
from annotated_types import MinLen
from pydantic_ai import Agent, ModelRetry, RunContext

Google Colab runs a Jupyter notebook backend, which inherently has an active event loop to manage interactions and outputs. Therefore, any attempt to initiate another event loop without patching will lead to conflicts and errors.

Role of nest_asyncio

  • Patching the Event Loop: The nest_asyncio library modifies the behavior of asyncio to allow for nested calls to its functions. By calling nest_asyncio.apply(), we essentially instruct your code to use the already running event loop instead of trying to create a new one. This enables us to execute asynchronous tasks seamlessly within the Colab environment
  • Usage: To implement this in our Colab notebook, we simply need to import nest_asyncio and apply it right after your imports:
import nest_asyncio
nest_asyncio.apply()

4. Instantiate LLM

openai_model = OpenAIModel('gpt-4o-mini')
groq_model = GroqModel("llama3-groq-70b-8192-tool-use-preview")

5. Define DB_SCHEMA

# Define the schema for our example database
DB_SCHEMA = """
CREATE TABLE IF NOT EXISTS users (
id INTEGER PRIMARY KEY AUTOINCREMENT,
name TEXT NOT NULL,
email TEXT UNIQUE NOT NULL,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

CREATE TABLE IF NOT EXISTS posts (
id INTEGER PRIMARY KEY AUTOINCREMENT,
user_id INTEGER,
title TEXT NOT NULL,
content TEXT,
published BOOLEAN DEFAULT FALSE,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
FOREIGN KEY (user_id) REFERENCES users(id)
);
"""

6. Define Agents

Agents are PydanticAI’s primary interface for interacting with LLMs.

In some use cases a single Agent will control an entire application or component, but multiple agents can also interact to embody more complex workflows.

The Agent class has full API documentation, but conceptually you can think of an agent as a container for:

  • A system prompt — a set of instructions for the LLM written by the developer
  • One or more function tool — functions that the LLM may call to get information while generating a response
  • An optional structured result type — the structured datatype the LLM must return at the end of a run
  • A dependency type constraint — system prompt functions, tools and result validators may all use dependencies when they’re run
  • Agents may optionally also have a default LLM model associated with them; the model to use can also be specified when running the agent
sqlagent = Agent(
openai_model,
deps_type=Deps,
retries=3,
result_type=Response,
system_prompt=("""You are a proficient Database Administrator having expertise in generating SQL queries. Your task is to convert natural language requests into SQL queries for a SQLite database.
You must respond with a Success object containing a sql_query and an explanation.

Database schema:
{DB_SCHEMA}

Format your response exactly like this, with no additional text or formatting:
{{
"type": "Success",
"sql_query": "<your SQL query here>",
"explanation": "<your explanation here>"
}}

Examples:
User: show me all users who have published posts
{{
"type": "Success",
"sql_query": "SELECT DISTINCT users.* FROM users JOIN posts ON users.id = posts.user_id WHERE posts.published = TRUE",
"explanation": "This query finds all users who have at least one published post by joining the users and posts tables."
}}

User: count posts by user
{{
"type": "Success",
"sql_query": "SELECT users.name, COUNT(posts.id) as post_count FROM users LEFT JOIN posts ON users.id = posts.user_id GROUP BY users.id, users.name",
"explanation": "This query counts the number of posts for each user, including users with no posts using LEFT JOIN."
}}

If you receive an error message about a previous query, analyze the error and fix the issues in your new query.
Common fixes include:
- Correcting column names
- Fixing JOIN conditions
- Adjusting GROUP BY clauses
- Handling NULL values properly

If you cannot generate a valid query, respond with:
{{
"type": "InvalidRequest",
"error_message": "<explanation of why the request cannot be processed>"
}}

Important:
1. Respond with ONLY the JSON object, no additional text
2. Always include the "type" field as either "Success" or "InvalidRequest"
3. All queries must be SELECT statements
4. Provide clear explanations
5. Use proper JOIN conditions and WHERE clauses as needed
""")
)

7. Initiate the DB connection

async def init_database(db_path: str = "test.db") -> aiosqlite.Connection:
"""Initialize the database with schema"""
conn = await aiosqlite.connect(db_path)

# Enable foreign keys
await conn.execute("PRAGMA foreign_keys = ON")

# Create schema
await conn.executescript(DB_SCHEMA)

# Add some sample data if the tables are empty
async with conn.execute("SELECT COUNT(*) FROM users") as cursor:
count = await cursor.fetchone()
if count[0] == 0:
sample_data = """
INSERT INTO users (name, email) VALUES
('John Doe', 'john@example.com'),
('Jane Smith', 'jane@example.com');

INSERT INTO posts (user_id, title, content, published) VALUES
(1, 'First Post', 'Hello World', TRUE),
(1, 'Draft Post', 'Work in Progress', FALSE),
(2, 'Jane''s Post', 'Hello from Jane', TRUE);
"""
await conn.executescript(sample_data)
await conn.commit()

return conn

8. Helper function to execute the query

from typing import Tuple, Optional
async def execute_query(conn: aiosqlite.Connection, query: str) -> Tuple[bool, Optional[str]]:
"""
Execute a SQL query and return success status and error message if any.
Returns: (success: bool, error_message: Optional[str])
"""
try:
async with conn.execute(query) as cursor:
await cursor.fetchone()
return True, None
except Exception as e:
return False, str(e)

9. Helper function to invoke SQL Agent and process the query

async def main():
# Ensure GROQ API key is set
if not os.getenv("GROQ_API_KEY"):
raise ValueError("Please set GROQ_API_KEY environment variable")

# Initialize database
conn = await init_database("test.db")

try:
# Example queries to test
test_queries = [
"show me all users and the number of posts posted",
"find users who have published posts",
"show me all draft posts with their authors",
"what is the count of users table",
"show me the title of the posts published",
"show me the structure of the posts",
"show me the names of all the users"
]

for query in test_queries:
print(f"\nProcessing query: {query}")
result = await query_database(query, conn)
print(f"\nProcessing query result: {result}")
if isinstance(result, InvalidRequest):
print(f"Error: {result.error_message}")
else:

print("\n✅ Generated SQL:")
print(result.data.sql_query)
print("\n✅ Explanation:")
print(result.data.explanation)
print("\n✅ Cost:")
print(result._usage)

# Execute the query to show results
try:
async with conn.execute(result.data.sql_query) as cursor:
rows = await cursor.fetchall()
print("\n📊 Results:")
for row in rows:
print(row)
except Exception as e:
print(f"Error executing query: {query}")
continue


print("\n" + "="*50)

finally:
await conn.close()

10. Invoke the SQL Agent

asyncio.run(main()) 


##################################RESPONSE####################################
Processing query: show me all users and the number of posts posted

Processing query result: RunResult(_all_messages=[ModelRequest(parts=[SystemPromptPart(content='You are a proficient Database Administrator having expertise in generating SQL queries. Your task is to convert natural language requests into SQL queries for a SQLite database.\nYou must respond with a Success object containing a sql_query and an explanation.\n\nDatabase schema:\n{DB_SCHEMA}\n\nFormat your response exactly like this, with no additional text or formatting:\n{{\n "type": "Success",\n "sql_query": "<your SQL query here>",\n "explanation": "<your explanation here>"\n}}\n\nExamples:\n User: show me all users who have published posts\n {{\n "type": "Success",\n "sql_query": "SELECT DISTINCT users.* FROM users JOIN posts ON users.id = posts.user_id WHERE posts.published = TRUE",\n "explanation": "This query finds all users who have at least one published post by joining the users and posts tables."\n }}\n\n User: count posts by user\n {{\n "type": "Success",\n "sql_query": "SELECT users.name, COUNT(posts.id) as post_count FROM users LEFT JOIN posts ON users.id = posts.user_id GROUP BY users.id, users.name",\n "explanation": "This query counts the number of posts for each user, including users with no posts using LEFT JOIN."\n }}\n\n If you receive an error message about a previous query, analyze the error and fix the issues in your new query.\n Common fixes include:\n - Correcting column names\n - Fixing JOIN conditions\n - Adjusting GROUP BY clauses\n - Handling NULL values properly\n\nIf you cannot generate a valid query, respond with:\n{{\n "type": "InvalidRequest",\n "error_message": "<explanation of why the request cannot be processed>"\n}}\n\nImportant:\n1. Respond with ONLY the JSON object, no additional text\n2. Always include the "type" field as either "Success" or "InvalidRequest"\n3. All queries must be SELECT statements\n4. Provide clear explanations\n5. Use proper JOIN conditions and WHERE clauses as needed\n', part_kind='system-prompt'), UserPromptPart(content='show me all users and the number of posts posted', timestamp=datetime.datetime(2024, 12, 21, 16, 25, 29, 420161, tzinfo=datetime.timezone.utc), part_kind='user-prompt')], kind='request'), ModelResponse(parts=[ToolCallPart(tool_name='final_result_Success', args=ArgsJson(args_json='{"type":"Success","sql_query":"SELECT users.id, users.name, COUNT(posts.id) AS post_count FROM users LEFT JOIN posts ON users.id = posts.user_id GROUP BY users.id, users.name","explanation":"This query retrieves all users along with a count of the posts they have made. It uses a LEFT JOIN to include users even if they have not posted anything, ensuring that users with a post count of zero are still displayed."}'), tool_call_id='call_hFetDeIfPDo92NQNtyRJBuAS', part_kind='tool-call')], timestamp=datetime.datetime(2024, 12, 21, 16, 25, 29, tzinfo=datetime.timezone.utc), kind='response'), ModelRequest(parts=[ToolReturnPart(tool_name='final_result_Success', content='Final result processed.', tool_call_id='call_hFetDeIfPDo92NQNtyRJBuAS', timestamp=datetime.datetime(2024, 12, 21, 16, 25, 31, 17975, tzinfo=datetime.timezone.utc), part_kind='tool-return')], kind='request')], _new_message_index=0, data=Success(type='Success', sql_query='SELECT users.id, users.name, COUNT(posts.id) AS post_count FROM users LEFT JOIN posts ON users.id = posts.user_id GROUP BY users.id, users.name', explanation='This query retrieves all users along with a count of the posts they have made. It uses a LEFT JOIN to include users even if they have not posted anything, ensuring that users with a post count of zero are still displayed.'), _usage=Usage(requests=1, request_tokens=630, response_tokens=103, total_tokens=733, details={'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0, 'cached_tokens': 0}))

✅ Generated SQL:
SELECT users.id, users.name, COUNT(posts.id) AS post_count FROM users LEFT JOIN posts ON users.id = posts.user_id GROUP BY users.id, users.name

✅ Explanation:
This query retrieves all users along with a count of the posts they have made. It uses a LEFT JOIN to include users even if they have not posted anything, ensuring that users with a post count of zero are still displayed.

✅ Cost:
Usage(requests=1, request_tokens=630, response_tokens=103, total_tokens=733, details={'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0, 'cached_tokens': 0})

📊 Results:
(1, 'John Doe', 2)
(2, 'Jane Smith', 1)

==================================================

Processing query: find users who have published posts

Processing query result: RunResult(_all_messages=[ModelRequest(parts=[SystemPromptPart(content='You are a proficient Database Administrator having expertise in generating SQL queries. Your task is to convert natural language requests into SQL queries for a SQLite database.\nYou must respond with a Success object containing a sql_query and an explanation.\n\nDatabase schema:\n{DB_SCHEMA}\n\nFormat your response exactly like this, with no additional text or formatting:\n{{\n "type": "Success",\n "sql_query": "<your SQL query here>",\n "explanation": "<your explanation here>"\n}}\n\nExamples:\n User: show me all users who have published posts\n {{\n "type": "Success",\n "sql_query": "SELECT DISTINCT users.* FROM users JOIN posts ON users.id = posts.user_id WHERE posts.published = TRUE",\n "explanation": "This query finds all users who have at least one published post by joining the users and posts tables."\n }}\n\n User: count posts by user\n {{\n "type": "Success",\n "sql_query": "SELECT users.name, COUNT(posts.id) as post_count FROM users LEFT JOIN posts ON users.id = posts.user_id GROUP BY users.id, users.name",\n "explanation": "This query counts the number of posts for each user, including users with no posts using LEFT JOIN."\n }}\n\n If you receive an error message about a previous query, analyze the error and fix the issues in your new query.\n Common fixes include:\n - Correcting column names\n - Fixing JOIN conditions\n - Adjusting GROUP BY clauses\n - Handling NULL values properly\n\nIf you cannot generate a valid query, respond with:\n{{\n "type": "InvalidRequest",\n "error_message": "<explanation of why the request cannot be processed>"\n}}\n\nImportant:\n1. Respond with ONLY the JSON object, no additional text\n2. Always include the "type" field as either "Success" or "InvalidRequest"\n3. All queries must be SELECT statements\n4. Provide clear explanations\n5. Use proper JOIN conditions and WHERE clauses as needed\n', part_kind='system-prompt'), UserPromptPart(content='find users who have published posts', timestamp=datetime.datetime(2024, 12, 21, 16, 25, 31, 26011, tzinfo=datetime.timezone.utc), part_kind='user-prompt')], kind='request'), ModelResponse(parts=[ToolCallPart(tool_name='final_result_Success', args=ArgsJson(args_json='{"sql_query":"SELECT DISTINCT users.* FROM users JOIN posts ON users.id = posts.user_id WHERE posts.published = TRUE","explanation":"This query identifies all users who have at least one post marked as published by joining the users and posts tables on their respective IDs."}'), tool_call_id='call_1ecYqTquJhhYEiBSRELqQAuT', part_kind='tool-call')], timestamp=datetime.datetime(2024, 12, 21, 16, 25, 31, tzinfo=datetime.timezone.utc), kind='response'), ModelRequest(parts=[ToolReturnPart(tool_name='final_result_Success', content='Final result processed.', tool_call_id='call_1ecYqTquJhhYEiBSRELqQAuT', timestamp=datetime.datetime(2024, 12, 21, 16, 25, 32, 229726, tzinfo=datetime.timezone.utc), part_kind='tool-return')], kind='request')], _new_message_index=0, data=Success(type='Success', sql_query='SELECT DISTINCT users.* FROM users JOIN posts ON users.id = posts.user_id WHERE posts.published = TRUE', explanation='This query identifies all users who have at least one post marked as published by joining the users and posts tables on their respective IDs.'), _usage=Usage(requests=1, request_tokens=626, response_tokens=68, total_tokens=694, details={'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0, 'cached_tokens': 0}))

✅ Generated SQL:
SELECT DISTINCT users.* FROM users JOIN posts ON users.id = posts.user_id WHERE posts.published = TRUE

✅ Explanation:
This query identifies all users who have at least one post marked as published by joining the users and posts tables on their respective IDs.

✅ Cost:
Usage(requests=1, request_tokens=626, response_tokens=68, total_tokens=694, details={'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0, 'cached_tokens': 0})

📊 Results:
(1, 'John Doe', 'john@example.com', '2024-12-21 14:05:32')
(2, 'Jane Smith', 'jane@example.com', '2024-12-21 14:05:32')

==================================================

Processing query: show me all draft posts with their authors

Processing query result: RunResult(_all_messages=[ModelRequest(parts=[SystemPromptPart(content='You are a proficient Database Administrator having expertise in generating SQL queries. Your task is to convert natural language requests into SQL queries for a SQLite database.\nYou must respond with a Success object containing a sql_query and an explanation.\n\nDatabase schema:\n{DB_SCHEMA}\n\nFormat your response exactly like this, with no additional text or formatting:\n{{\n "type": "Success",\n "sql_query": "<your SQL query here>",\n "explanation": "<your explanation here>"\n}}\n\nExamples:\n User: show me all users who have published posts\n {{\n "type": "Success",\n "sql_query": "SELECT DISTINCT users.* FROM users JOIN posts ON users.id = posts.user_id WHERE posts.published = TRUE",\n "explanation": "This query finds all users who have at least one published post by joining the users and posts tables."\n }}\n\n User: count posts by user\n {{\n "type": "Success",\n "sql_query": "SELECT users.name, COUNT(posts.id) as post_count FROM users LEFT JOIN posts ON users.id = posts.user_id GROUP BY users.id, users.name",\n "explanation": "This query counts the number of posts for each user, including users with no posts using LEFT JOIN."\n }}\n\n If you receive an error message about a previous query, analyze the error and fix the issues in your new query.\n Common fixes include:\n - Correcting column names\n - Fixing JOIN conditions\n - Adjusting GROUP BY clauses\n - Handling NULL values properly\n\nIf you cannot generate a valid query, respond with:\n{{\n "type": "InvalidRequest",\n "error_message": "<explanation of why the request cannot be processed>"\n}}\n\nImportant:\n1. Respond with ONLY the JSON object, no additional text\n2. Always include the "type" field as either "Success" or "InvalidRequest"\n3. All queries must be SELECT statements\n4. Provide clear explanations\n5. Use proper JOIN conditions and WHERE clauses as needed\n', part_kind='system-prompt'), UserPromptPart(content='\nPrevious query failed with error: no such column: posts.status\nPlease generate a corrected SQL query for the original request: show me all draft posts with their authors\n', timestamp=datetime.datetime(2024, 12, 21, 16, 25, 33, 648254, tzinfo=datetime.timezone.utc), part_kind='user-prompt')], kind='request'), ModelResponse(parts=[ToolCallPart(tool_name='final_result_Success', args=ArgsJson(args_json='{"sql_query": "SELECT posts.*, users.name as author_name FROM posts JOIN users ON posts.user_id = users.id WHERE posts.published = FALSE", "explanation": "This query selects all draft posts (those not published) along with their authors by joining the posts and users tables based on the user ID."}'), tool_call_id='call_PFcdShmLLrI4ATOqh4MplMAt', part_kind='tool-call')], timestamp=datetime.datetime(2024, 12, 21, 16, 25, 33, tzinfo=datetime.timezone.utc), kind='response'), ModelRequest(parts=[ToolReturnPart(tool_name='final_result_Success', content='Final result processed.', tool_call_id='call_PFcdShmLLrI4ATOqh4MplMAt', timestamp=datetime.datetime(2024, 12, 21, 16, 25, 35, 126566, tzinfo=datetime.timezone.utc), part_kind='tool-return')], kind='request')], _new_message_index=0, data=Success(type='Success', sql_query='SELECT posts.*, users.name as author_name FROM posts JOIN users ON posts.user_id = users.id WHERE posts.published = FALSE', explanation='This query selects all draft posts (those not published) along with their authors by joining the posts and users tables based on the user ID.'), _usage=Usage(requests=1, request_tokens=654, response_tokens=90, total_tokens=744, details={'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0, 'cached_tokens': 0}))

✅ Generated SQL:
SELECT posts.*, users.name as author_name FROM posts JOIN users ON posts.user_id = users.id WHERE posts.published = FALSE

✅ Explanation:
This query selects all draft posts (those not published) along with their authors by joining the posts and users tables based on the user ID.

✅ Cost:
Usage(requests=1, request_tokens=654, response_tokens=90, total_tokens=744, details={'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0, 'cached_tokens': 0})

📊 Results:
(2, 1, 'Draft Post', 'Work in Progress', 0, '2024-12-21 14:05:32', 'John Doe')

==================================================

Processing query: what is the count of users table

Processing query result: RunResult(_all_messages=[ModelRequest(parts=[SystemPromptPart(content='You are a proficient Database Administrator having expertise in generating SQL queries. Your task is to convert natural language requests into SQL queries for a SQLite database.\nYou must respond with a Success object containing a sql_query and an explanation.\n\nDatabase schema:\n{DB_SCHEMA}\n\nFormat your response exactly like this, with no additional text or formatting:\n{{\n "type": "Success",\n "sql_query": "<your SQL query here>",\n "explanation": "<your explanation here>"\n}}\n\nExamples:\n User: show me all users who have published posts\n {{\n "type": "Success",\n "sql_query": "SELECT DISTINCT users.* FROM users JOIN posts ON users.id = posts.user_id WHERE posts.published = TRUE",\n "explanation": "This query finds all users who have at least one published post by joining the users and posts tables."\n }}\n\n User: count posts by user\n {{\n "type": "Success",\n "sql_query": "SELECT users.name, COUNT(posts.id) as post_count FROM users LEFT JOIN posts ON users.id = posts.user_id GROUP BY users.id, users.name",\n "explanation": "This query counts the number of posts for each user, including users with no posts using LEFT JOIN."\n }}\n\n If you receive an error message about a previous query, analyze the error and fix the issues in your new query.\n Common fixes include:\n - Correcting column names\n - Fixing JOIN conditions\n - Adjusting GROUP BY clauses\n - Handling NULL values properly\n\nIf you cannot generate a valid query, respond with:\n{{\n "type": "InvalidRequest",\n "error_message": "<explanation of why the request cannot be processed>"\n}}\n\nImportant:\n1. Respond with ONLY the JSON object, no additional text\n2. Always include the "type" field as either "Success" or "InvalidRequest"\n3. All queries must be SELECT statements\n4. Provide clear explanations\n5. Use proper JOIN conditions and WHERE clauses as needed\n', part_kind='system-prompt'), UserPromptPart(content='what is the count of users table', timestamp=datetime.datetime(2024, 12, 21, 16, 25, 35, 135611, tzinfo=datetime.timezone.utc), part_kind='user-prompt')], kind='request'), ModelResponse(parts=[ToolCallPart(tool_name='final_result_Success', args=ArgsJson(args_json='{"sql_query": "SELECT COUNT(*) as user_count FROM users", "explanation": "This query counts the total number of rows in the users table, providing the total number of users."}'), tool_call_id='call_vvBQ3QO4UwLHvpEpn0Gam5zt', part_kind='tool-call')], timestamp=datetime.datetime(2024, 12, 21, 16, 25, 35, tzinfo=datetime.timezone.utc), kind='response'), ModelRequest(parts=[ToolReturnPart(tool_name='final_result_Success', content='Final result processed.', tool_call_id='call_vvBQ3QO4UwLHvpEpn0Gam5zt', timestamp=datetime.datetime(2024, 12, 21, 16, 25, 36, 421186, tzinfo=datetime.timezone.utc), part_kind='tool-return')], kind='request')], _new_message_index=0, data=Success(type='Success', sql_query='SELECT COUNT(*) as user_count FROM users', explanation='This query counts the total number of rows in the users table, providing the total number of users.'), _usage=Usage(requests=1, request_tokens=627, response_tokens=65, total_tokens=692, details={'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0, 'cached_tokens': 0}))

✅ Generated SQL:
SELECT COUNT(*) as user_count FROM users

✅ Explanation:
This query counts the total number of rows in the users table, providing the total number of users.

✅ Cost:
Usage(requests=1, request_tokens=627, response_tokens=65, total_tokens=692, details={'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0, 'cached_tokens': 0})

📊 Results:
(2,)

==================================================

Processing query: show me the title of the posts published

Processing query result: RunResult(_all_messages=[ModelRequest(parts=[SystemPromptPart(content='You are a proficient Database Administrator having expertise in generating SQL queries. Your task is to convert natural language requests into SQL queries for a SQLite database.\nYou must respond with a Success object containing a sql_query and an explanation.\n\nDatabase schema:\n{DB_SCHEMA}\n\nFormat your response exactly like this, with no additional text or formatting:\n{{\n "type": "Success",\n "sql_query": "<your SQL query here>",\n "explanation": "<your explanation here>"\n}}\n\nExamples:\n User: show me all users who have published posts\n {{\n "type": "Success",\n "sql_query": "SELECT DISTINCT users.* FROM users JOIN posts ON users.id = posts.user_id WHERE posts.published = TRUE",\n "explanation": "This query finds all users who have at least one published post by joining the users and posts tables."\n }}\n\n User: count posts by user\n {{\n "type": "Success",\n "sql_query": "SELECT users.name, COUNT(posts.id) as post_count FROM users LEFT JOIN posts ON users.id = posts.user_id GROUP BY users.id, users.name",\n "explanation": "This query counts the number of posts for each user, including users with no posts using LEFT JOIN."\n }}\n\n If you receive an error message about a previous query, analyze the error and fix the issues in your new query.\n Common fixes include:\n - Correcting column names\n - Fixing JOIN conditions\n - Adjusting GROUP BY clauses\n - Handling NULL values properly\n\nIf you cannot generate a valid query, respond with:\n{{\n "type": "InvalidRequest",\n "error_message": "<explanation of why the request cannot be processed>"\n}}\n\nImportant:\n1. Respond with ONLY the JSON object, no additional text\n2. Always include the "type" field as either "Success" or "InvalidRequest"\n3. All queries must be SELECT statements\n4. Provide clear explanations\n5. Use proper JOIN conditions and WHERE clauses as needed\n', part_kind='system-prompt'), UserPromptPart(content='show me the title of the posts published', timestamp=datetime.datetime(2024, 12, 21, 16, 25, 36, 428434, tzinfo=datetime.timezone.utc), part_kind='user-prompt')], kind='request'), ModelResponse(parts=[ToolCallPart(tool_name='final_result_Success', args=ArgsJson(args_json='{"sql_query":"SELECT title FROM posts WHERE published = TRUE","explanation":"This query retrieves the titles of all posts that have been marked as published."}'), tool_call_id='call_wkb9qISadI545XilG1RXjr9f', part_kind='tool-call')], timestamp=datetime.datetime(2024, 12, 21, 16, 25, 36, tzinfo=datetime.timezone.utc), kind='response'), ModelRequest(parts=[ToolReturnPart(tool_name='final_result_Success', content='Final result processed.', tool_call_id='call_wkb9qISadI545XilG1RXjr9f', timestamp=datetime.datetime(2024, 12, 21, 16, 25, 37, 566200, tzinfo=datetime.timezone.utc), part_kind='tool-return')], kind='request')], _new_message_index=0, data=Success(type='Success', sql_query='SELECT title FROM posts WHERE published = TRUE', explanation='This query retrieves the titles of all posts that have been marked as published.'), _usage=Usage(requests=1, request_tokens=628, response_tokens=44, total_tokens=672, details={'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0, 'cached_tokens': 0}))

✅ Generated SQL:
SELECT title FROM posts WHERE published = TRUE

✅ Explanation:
This query retrieves the titles of all posts that have been marked as published.

✅ Cost:
Usage(requests=1, request_tokens=628, response_tokens=44, total_tokens=672, details={'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0, 'cached_tokens': 0})

📊 Results:
('First Post',)
("Jane's Post",)

==================================================

Processing query: show me the structure of the posts

Processing query result: type='InvalidRequest' error_message="Failed to generate valid query after 3 attempts. Last error: 'InvalidRequest' object has no attribute 'sql_query'"
Error: Failed to generate valid query after 3 attempts. Last error: 'InvalidRequest' object has no attribute 'sql_query'

==================================================

Processing query: show me the names of all the users

Processing query result: RunResult(_all_messages=[ModelRequest(parts=[SystemPromptPart(content='You are a proficient Database Administrator having expertise in generating SQL queries. Your task is to convert natural language requests into SQL queries for a SQLite database.\nYou must respond with a Success object containing a sql_query and an explanation.\n\nDatabase schema:\n{DB_SCHEMA}\n\nFormat your response exactly like this, with no additional text or formatting:\n{{\n "type": "Success",\n "sql_query": "<your SQL query here>",\n "explanation": "<your explanation here>"\n}}\n\nExamples:\n User: show me all users who have published posts\n {{\n "type": "Success",\n "sql_query": "SELECT DISTINCT users.* FROM users JOIN posts ON users.id = posts.user_id WHERE posts.published = TRUE",\n "explanation": "This query finds all users who have at least one published post by joining the users and posts tables."\n }}\n\n User: count posts by user\n {{\n "type": "Success",\n "sql_query": "SELECT users.name, COUNT(posts.id) as post_count FROM users LEFT JOIN posts ON users.id = posts.user_id GROUP BY users.id, users.name",\n "explanation": "This query counts the number of posts for each user, including users with no posts using LEFT JOIN."\n }}\n\n If you receive an error message about a previous query, analyze the error and fix the issues in your new query.\n Common fixes include:\n - Correcting column names\n - Fixing JOIN conditions\n - Adjusting GROUP BY clauses\n - Handling NULL values properly\n\nIf you cannot generate a valid query, respond with:\n{{\n "type": "InvalidRequest",\n "error_message": "<explanation of why the request cannot be processed>"\n}}\n\nImportant:\n1. Respond with ONLY the JSON object, no additional text\n2. Always include the "type" field as either "Success" or "InvalidRequest"\n3. All queries must be SELECT statements\n4. Provide clear explanations\n5. Use proper JOIN conditions and WHERE clauses as needed\n', part_kind='system-prompt'), UserPromptPart(content='show me the names of all the users', timestamp=datetime.datetime(2024, 12, 21, 16, 25, 40, 312604, tzinfo=datetime.timezone.utc), part_kind='user-prompt')], kind='request'), ModelResponse(parts=[ToolCallPart(tool_name='final_result_Success', args=ArgsJson(args_json='{"sql_query":"SELECT name FROM users","explanation":"This query retrieves the names of all users from the users table."}'), tool_call_id='call_c06zk9SA5TyoqBG0sjVEhpdJ', part_kind='tool-call')], timestamp=datetime.datetime(2024, 12, 21, 16, 25, 40, tzinfo=datetime.timezone.utc), kind='response'), ModelRequest(parts=[ToolReturnPart(tool_name='final_result_Success', content='Final result processed.', tool_call_id='call_c06zk9SA5TyoqBG0sjVEhpdJ', timestamp=datetime.datetime(2024, 12, 21, 16, 25, 41, 194161, tzinfo=datetime.timezone.utc), part_kind='tool-return')], kind='request')], _new_message_index=0, data=Success(type='Success', sql_query='SELECT name FROM users', explanation='This query retrieves the names of all users from the users table.'), _usage=Usage(requests=1, request_tokens=628, response_tokens=38, total_tokens=666, details={'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0, 'cached_tokens': 0}))

✅ Generated SQL:
SELECT name FROM users

✅ Explanation:
This query retrieves the names of all users from the users table.

✅ Cost:
Usage(requests=1, request_tokens=628, response_tokens=38, total_tokens=666, details={'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0, 'cached_tokens': 0})

📊 Results:
('John Doe',)
('Jane Smith',)

==================================================

Future Scope

The system could be enhanced with:

  1. Query Optimization: Analyzing and optimizing generated queries
  2. Schema Learning: Dynamically adapting to different database schemas
  3. Multi-database Support: Extending beyond SQLite to other databases
  4. Query History: Learning from successful queries
  5. Performance Metrics: Adding query execution time analysis

Pros and Cons

Pros:

  • Natural language interface
  • Self-healing capabilities
  • Strong type safety
  • Async operation
  • Detailed error reporting

Cons:

  • Limited to SELECT queries
  • Dependency on external LLM service
  • Potential cost implications
  • Limited to predefined schema
  • May require multiple attempts for complex queries

Conclusion

The SQL Agent demonstrates how modern AI technologies can make database interactions more accessible while maintaining robustness through self-healing mechanisms. While there are limitations, the system provides a solid foundation for natural language database interactions.

This article provides a comprehensive overview of the SQL Agent implementation, its architecture, and potential improvements. Feel free to adapt and expand upon this structure based on your specific needs and audience.

References

connect with me

--

--

The AI Forum
The AI Forum

Published in The AI Forum

Its AI forum where all the topics spread across Data Analytics, Data Science, Machine Learning, Deep Learning are discussed.

Plaban Nayak
Plaban Nayak

Written by Plaban Nayak

Machine Learning and Deep Learning enthusiast

No responses yet