Building a Self-Healing SQL Query Generator Agent with Pydantic.ai and Groq
Introduction
In today’s data-driven world, the ability to query databases efficiently is crucial. However, writing SQL queries can be challenging, especially for those not well-versed in database operations. This article introduces a self-healing SQL query generator that converts natural language requests into SQL queries, making database interactions more accessible and robust.
What is Pydantic AI?
Pydantic AI is an agentic framework designed to simplify the process of creating and managing AI agents. It builds upon the solid foundation of the Pydantic library, which is widely used for data validation and type checking in Python projects. With over 2 million downloads per month, Pydantic has proven its reliability and efficiency in the developer ecosystem.
Key Features and Advantages
1. Ease of Use
One of the standout features of Pydantic AI is its simplicity. Creating an agent is straightforward, requiring just a few lines of code. You can define an agent by specifying the model name and a system prompt, making it accessible even for those new to AI development.
2. Built on Vanilla Python
Unlike some other frameworks that abstract away much of the underlying code, Pydantic AI is built on vanilla Python. This approach gives developers full control and visibility into the inner workings of their agents, making it easier to customize and debug when needed.
3. Dependency Injection
Pydantic AI incorporates best practices from software engineering, including dependency injection. This feature allows for more flexible and maintainable code, especially in larger projects.
4. Integration with Observability Tool
For those concerned with observability, Pydantic AI offers seamless integration with Logfire. This tool allows for comprehensive tracing of prompts, costs, and other important metrics related to your AI agents.
5. Function Calling and Structured Output
Pydantic AI excels in function calling and generating structured output. You can easily define object models for your agent’s responses, ensuring that the output matches your specific requirements. This is particularly useful when working with APIs or other systems that expect data in a certain format.
6. Cost Tracking
Understanding the resource usage of your AI agents is crucial, especially in production environments. Pydantic AI makes this easy by providing built-in cost tracking features, allowing you to monitor token usage and associated costs for each interaction.
Technology Stack
This implementation leverages several modern technologies:
- Pydantic.ai: For building AI agents with strong type safety
- Groq: A fast and efficient LLM provider
- SQLite: Lightweight, serverless database
- aiosqlite: Async SQLite driver for Python
- Python 3.11+: For modern async/await support
Workflow Description
The SQL Agent follows a sophisticated workflow:
- User Input: Receives natural language query requests
- Query Generation: Converts requests to SQL using LLM
- Validation: Ensures query syntax and structure
- Execution: Tests the query against the database
- Self-Healing: Retries with error context if execution fails
- Result Delivery: Returns results or detailed error messages
Code Walkthrough
1. Core Components
The implementation consists of several key components:
# Models for type safety
class Success(BaseModel):
type: str = Field("Success", pattern="^Success$")
sql_query: Annotated[str, MinLen(1)]
explanation: str
class InvalidRequest(BaseModel):
type: str = Field("InvalidRequest", pattern="^InvalidRequest$")
error_message: str
Defining dependencies
@dataclass
class Deps:
conn: aiosqlite.Connection
db_schema: str = DB_SCHEMA
2. Agent Configuration
sqlagent = Agent(
openai_model,
deps_type=Deps,
retries=3,
result_type=Response,
system_prompt=("""You are a proficient Database Administrator having expertise in generating SQL queries. Your task is to convert natural language requests into SQL queries for a SQLite database.
You must respond with a Success object containing a sql_query and an explanation.
Database schema:
{DB_SCHEMA}
Format your response exactly like this, with no additional text or formatting:
{{
"type": "Success",
"sql_query": "<your SQL query here>",
"explanation": "<your explanation here>"
}}
Examples:
User: show me all users who have published posts
{{
"type": "Success",
"sql_query": "SELECT DISTINCT users.* FROM users JOIN posts ON users.id = posts.user_id WHERE posts.published = TRUE",
"explanation": "This query finds all users who have at least one published post by joining the users and posts tables."
}}
User: count posts by user
{{
"type": "Success",
"sql_query": "SELECT users.name, COUNT(posts.id) as post_count FROM users LEFT JOIN posts ON users.id = posts.user_id GROUP BY users.id, users.name",
"explanation": "This query counts the number of posts for each user, including users with no posts using LEFT JOIN."
}}
If you receive an error message about a previous query, analyze the error and fix the issues in your new query.
Common fixes include:
- Correcting column names
- Fixing JOIN conditions
- Adjusting GROUP BY clauses
- Handling NULL values properly
If you cannot generate a valid query, respond with:
{{
"type": "InvalidRequest",
"error_message": "<explanation of why the request cannot be processed>"
}}
Important:
1. Respond with ONLY the JSON object, no additional text
2. Always include the "type" field as either "Success" or "InvalidRequest"
3. All queries must be SELECT statements
4. Provide clear explanations
5. Use proper JOIN conditions and WHERE clauses as needed
""")
)
3. Query Execution and Retry Logic
async def query_database(prompt: str, conn: aiosqlite.Connection) -> Response:
max_retries = 3
last_error: Optional[str] = None
for attempt in range(max_retries):
try:
result = await agent.run(prompt, deps=deps)
success, error = await execute_query(conn, result.sql_query)
if success:
return result
last_error = error
prompt = f"""
Previous query failed with error: {error}
Please generate a corrected SQL query for the original request: {prompt}
"""
except Exception as e:
last_error = str(e)
continue
Code Implementation
- Install required dependencies
%pip install 'pydantic-ai-slim[openai,groq,logfire]'
%pip install aiosqlite
2. Setup API Keys
from google.colab import userdata
import os
os.environ["OPENAI_API_KEY"] = userdata.get('OPENAI_API_KEY')
os.environ["GROQ_API_KEY"] = userdata.get('GROQ_API_KEY')
3. Import required dependencies
from pydantic_ai.models.openai import OpenAIModel
from pydantic_ai.models.groq import GroqModel
import aiosqlite
import asyncio
from typing import Union, TypeAlias, Annotated,Optional,Tuple
from dataclasses import dataclass
from pydantic import BaseModel, Field
from annotated_types import MinLen
from pydantic_ai import Agent, ModelRetry, RunContext
Google Colab runs a Jupyter notebook backend, which inherently has an active event loop to manage interactions and outputs. Therefore, any attempt to initiate another event loop without patching will lead to conflicts and errors.
Role of nest_asyncio
- Patching the Event Loop: The
nest_asyncio
library modifies the behavior ofasyncio
to allow for nested calls to its functions. By callingnest_asyncio.apply()
, we essentially instruct your code to use the already running event loop instead of trying to create a new one. This enables us to execute asynchronous tasks seamlessly within the Colab environment - Usage: To implement this in our Colab notebook, we simply need to import
nest_asyncio
and apply it right after your imports:
import nest_asyncio
nest_asyncio.apply()
4. Instantiate LLM
openai_model = OpenAIModel('gpt-4o-mini')
groq_model = GroqModel("llama3-groq-70b-8192-tool-use-preview")
5. Define DB_SCHEMA
# Define the schema for our example database
DB_SCHEMA = """
CREATE TABLE IF NOT EXISTS users (
id INTEGER PRIMARY KEY AUTOINCREMENT,
name TEXT NOT NULL,
email TEXT UNIQUE NOT NULL,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
CREATE TABLE IF NOT EXISTS posts (
id INTEGER PRIMARY KEY AUTOINCREMENT,
user_id INTEGER,
title TEXT NOT NULL,
content TEXT,
published BOOLEAN DEFAULT FALSE,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
FOREIGN KEY (user_id) REFERENCES users(id)
);
"""
6. Define Agents
Agents are PydanticAI’s primary interface for interacting with LLMs.
In some use cases a single Agent will control an entire application or component, but multiple agents can also interact to embody more complex workflows.
The Agent class has full API documentation, but conceptually you can think of an agent as a container for:
- A system prompt — a set of instructions for the LLM written by the developer
- One or more function tool — functions that the LLM may call to get information while generating a response
- An optional structured result type — the structured datatype the LLM must return at the end of a run
- A dependency type constraint — system prompt functions, tools and result validators may all use dependencies when they’re run
- Agents may optionally also have a default LLM model associated with them; the model to use can also be specified when running the agent
sqlagent = Agent(
openai_model,
deps_type=Deps,
retries=3,
result_type=Response,
system_prompt=("""You are a proficient Database Administrator having expertise in generating SQL queries. Your task is to convert natural language requests into SQL queries for a SQLite database.
You must respond with a Success object containing a sql_query and an explanation.
Database schema:
{DB_SCHEMA}
Format your response exactly like this, with no additional text or formatting:
{{
"type": "Success",
"sql_query": "<your SQL query here>",
"explanation": "<your explanation here>"
}}
Examples:
User: show me all users who have published posts
{{
"type": "Success",
"sql_query": "SELECT DISTINCT users.* FROM users JOIN posts ON users.id = posts.user_id WHERE posts.published = TRUE",
"explanation": "This query finds all users who have at least one published post by joining the users and posts tables."
}}
User: count posts by user
{{
"type": "Success",
"sql_query": "SELECT users.name, COUNT(posts.id) as post_count FROM users LEFT JOIN posts ON users.id = posts.user_id GROUP BY users.id, users.name",
"explanation": "This query counts the number of posts for each user, including users with no posts using LEFT JOIN."
}}
If you receive an error message about a previous query, analyze the error and fix the issues in your new query.
Common fixes include:
- Correcting column names
- Fixing JOIN conditions
- Adjusting GROUP BY clauses
- Handling NULL values properly
If you cannot generate a valid query, respond with:
{{
"type": "InvalidRequest",
"error_message": "<explanation of why the request cannot be processed>"
}}
Important:
1. Respond with ONLY the JSON object, no additional text
2. Always include the "type" field as either "Success" or "InvalidRequest"
3. All queries must be SELECT statements
4. Provide clear explanations
5. Use proper JOIN conditions and WHERE clauses as needed
""")
)
7. Initiate the DB connection
async def init_database(db_path: str = "test.db") -> aiosqlite.Connection:
"""Initialize the database with schema"""
conn = await aiosqlite.connect(db_path)
# Enable foreign keys
await conn.execute("PRAGMA foreign_keys = ON")
# Create schema
await conn.executescript(DB_SCHEMA)
# Add some sample data if the tables are empty
async with conn.execute("SELECT COUNT(*) FROM users") as cursor:
count = await cursor.fetchone()
if count[0] == 0:
sample_data = """
INSERT INTO users (name, email) VALUES
('John Doe', 'john@example.com'),
('Jane Smith', 'jane@example.com');
INSERT INTO posts (user_id, title, content, published) VALUES
(1, 'First Post', 'Hello World', TRUE),
(1, 'Draft Post', 'Work in Progress', FALSE),
(2, 'Jane''s Post', 'Hello from Jane', TRUE);
"""
await conn.executescript(sample_data)
await conn.commit()
return conn
8. Helper function to execute the query
from typing import Tuple, Optional
async def execute_query(conn: aiosqlite.Connection, query: str) -> Tuple[bool, Optional[str]]:
"""
Execute a SQL query and return success status and error message if any.
Returns: (success: bool, error_message: Optional[str])
"""
try:
async with conn.execute(query) as cursor:
await cursor.fetchone()
return True, None
except Exception as e:
return False, str(e)
9. Helper function to invoke SQL Agent and process the query
async def main():
# Ensure GROQ API key is set
if not os.getenv("GROQ_API_KEY"):
raise ValueError("Please set GROQ_API_KEY environment variable")
# Initialize database
conn = await init_database("test.db")
try:
# Example queries to test
test_queries = [
"show me all users and the number of posts posted",
"find users who have published posts",
"show me all draft posts with their authors",
"what is the count of users table",
"show me the title of the posts published",
"show me the structure of the posts",
"show me the names of all the users"
]
for query in test_queries:
print(f"\nProcessing query: {query}")
result = await query_database(query, conn)
print(f"\nProcessing query result: {result}")
if isinstance(result, InvalidRequest):
print(f"Error: {result.error_message}")
else:
print("\n✅ Generated SQL:")
print(result.data.sql_query)
print("\n✅ Explanation:")
print(result.data.explanation)
print("\n✅ Cost:")
print(result._usage)
# Execute the query to show results
try:
async with conn.execute(result.data.sql_query) as cursor:
rows = await cursor.fetchall()
print("\n📊 Results:")
for row in rows:
print(row)
except Exception as e:
print(f"Error executing query: {query}")
continue
print("\n" + "="*50)
finally:
await conn.close()
10. Invoke the SQL Agent
asyncio.run(main())
##################################RESPONSE####################################
Processing query: show me all users and the number of posts posted
Processing query result: RunResult(_all_messages=[ModelRequest(parts=[SystemPromptPart(content='You are a proficient Database Administrator having expertise in generating SQL queries. Your task is to convert natural language requests into SQL queries for a SQLite database.\nYou must respond with a Success object containing a sql_query and an explanation.\n\nDatabase schema:\n{DB_SCHEMA}\n\nFormat your response exactly like this, with no additional text or formatting:\n{{\n "type": "Success",\n "sql_query": "<your SQL query here>",\n "explanation": "<your explanation here>"\n}}\n\nExamples:\n User: show me all users who have published posts\n {{\n "type": "Success",\n "sql_query": "SELECT DISTINCT users.* FROM users JOIN posts ON users.id = posts.user_id WHERE posts.published = TRUE",\n "explanation": "This query finds all users who have at least one published post by joining the users and posts tables."\n }}\n\n User: count posts by user\n {{\n "type": "Success",\n "sql_query": "SELECT users.name, COUNT(posts.id) as post_count FROM users LEFT JOIN posts ON users.id = posts.user_id GROUP BY users.id, users.name",\n "explanation": "This query counts the number of posts for each user, including users with no posts using LEFT JOIN."\n }}\n\n If you receive an error message about a previous query, analyze the error and fix the issues in your new query.\n Common fixes include:\n - Correcting column names\n - Fixing JOIN conditions\n - Adjusting GROUP BY clauses\n - Handling NULL values properly\n\nIf you cannot generate a valid query, respond with:\n{{\n "type": "InvalidRequest",\n "error_message": "<explanation of why the request cannot be processed>"\n}}\n\nImportant:\n1. Respond with ONLY the JSON object, no additional text\n2. Always include the "type" field as either "Success" or "InvalidRequest"\n3. All queries must be SELECT statements\n4. Provide clear explanations\n5. Use proper JOIN conditions and WHERE clauses as needed\n', part_kind='system-prompt'), UserPromptPart(content='show me all users and the number of posts posted', timestamp=datetime.datetime(2024, 12, 21, 16, 25, 29, 420161, tzinfo=datetime.timezone.utc), part_kind='user-prompt')], kind='request'), ModelResponse(parts=[ToolCallPart(tool_name='final_result_Success', args=ArgsJson(args_json='{"type":"Success","sql_query":"SELECT users.id, users.name, COUNT(posts.id) AS post_count FROM users LEFT JOIN posts ON users.id = posts.user_id GROUP BY users.id, users.name","explanation":"This query retrieves all users along with a count of the posts they have made. It uses a LEFT JOIN to include users even if they have not posted anything, ensuring that users with a post count of zero are still displayed."}'), tool_call_id='call_hFetDeIfPDo92NQNtyRJBuAS', part_kind='tool-call')], timestamp=datetime.datetime(2024, 12, 21, 16, 25, 29, tzinfo=datetime.timezone.utc), kind='response'), ModelRequest(parts=[ToolReturnPart(tool_name='final_result_Success', content='Final result processed.', tool_call_id='call_hFetDeIfPDo92NQNtyRJBuAS', timestamp=datetime.datetime(2024, 12, 21, 16, 25, 31, 17975, tzinfo=datetime.timezone.utc), part_kind='tool-return')], kind='request')], _new_message_index=0, data=Success(type='Success', sql_query='SELECT users.id, users.name, COUNT(posts.id) AS post_count FROM users LEFT JOIN posts ON users.id = posts.user_id GROUP BY users.id, users.name', explanation='This query retrieves all users along with a count of the posts they have made. It uses a LEFT JOIN to include users even if they have not posted anything, ensuring that users with a post count of zero are still displayed.'), _usage=Usage(requests=1, request_tokens=630, response_tokens=103, total_tokens=733, details={'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0, 'cached_tokens': 0}))
✅ Generated SQL:
SELECT users.id, users.name, COUNT(posts.id) AS post_count FROM users LEFT JOIN posts ON users.id = posts.user_id GROUP BY users.id, users.name
✅ Explanation:
This query retrieves all users along with a count of the posts they have made. It uses a LEFT JOIN to include users even if they have not posted anything, ensuring that users with a post count of zero are still displayed.
✅ Cost:
Usage(requests=1, request_tokens=630, response_tokens=103, total_tokens=733, details={'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0, 'cached_tokens': 0})
📊 Results:
(1, 'John Doe', 2)
(2, 'Jane Smith', 1)
==================================================
Processing query: find users who have published posts
Processing query result: RunResult(_all_messages=[ModelRequest(parts=[SystemPromptPart(content='You are a proficient Database Administrator having expertise in generating SQL queries. Your task is to convert natural language requests into SQL queries for a SQLite database.\nYou must respond with a Success object containing a sql_query and an explanation.\n\nDatabase schema:\n{DB_SCHEMA}\n\nFormat your response exactly like this, with no additional text or formatting:\n{{\n "type": "Success",\n "sql_query": "<your SQL query here>",\n "explanation": "<your explanation here>"\n}}\n\nExamples:\n User: show me all users who have published posts\n {{\n "type": "Success",\n "sql_query": "SELECT DISTINCT users.* FROM users JOIN posts ON users.id = posts.user_id WHERE posts.published = TRUE",\n "explanation": "This query finds all users who have at least one published post by joining the users and posts tables."\n }}\n\n User: count posts by user\n {{\n "type": "Success",\n "sql_query": "SELECT users.name, COUNT(posts.id) as post_count FROM users LEFT JOIN posts ON users.id = posts.user_id GROUP BY users.id, users.name",\n "explanation": "This query counts the number of posts for each user, including users with no posts using LEFT JOIN."\n }}\n\n If you receive an error message about a previous query, analyze the error and fix the issues in your new query.\n Common fixes include:\n - Correcting column names\n - Fixing JOIN conditions\n - Adjusting GROUP BY clauses\n - Handling NULL values properly\n\nIf you cannot generate a valid query, respond with:\n{{\n "type": "InvalidRequest",\n "error_message": "<explanation of why the request cannot be processed>"\n}}\n\nImportant:\n1. Respond with ONLY the JSON object, no additional text\n2. Always include the "type" field as either "Success" or "InvalidRequest"\n3. All queries must be SELECT statements\n4. Provide clear explanations\n5. Use proper JOIN conditions and WHERE clauses as needed\n', part_kind='system-prompt'), UserPromptPart(content='find users who have published posts', timestamp=datetime.datetime(2024, 12, 21, 16, 25, 31, 26011, tzinfo=datetime.timezone.utc), part_kind='user-prompt')], kind='request'), ModelResponse(parts=[ToolCallPart(tool_name='final_result_Success', args=ArgsJson(args_json='{"sql_query":"SELECT DISTINCT users.* FROM users JOIN posts ON users.id = posts.user_id WHERE posts.published = TRUE","explanation":"This query identifies all users who have at least one post marked as published by joining the users and posts tables on their respective IDs."}'), tool_call_id='call_1ecYqTquJhhYEiBSRELqQAuT', part_kind='tool-call')], timestamp=datetime.datetime(2024, 12, 21, 16, 25, 31, tzinfo=datetime.timezone.utc), kind='response'), ModelRequest(parts=[ToolReturnPart(tool_name='final_result_Success', content='Final result processed.', tool_call_id='call_1ecYqTquJhhYEiBSRELqQAuT', timestamp=datetime.datetime(2024, 12, 21, 16, 25, 32, 229726, tzinfo=datetime.timezone.utc), part_kind='tool-return')], kind='request')], _new_message_index=0, data=Success(type='Success', sql_query='SELECT DISTINCT users.* FROM users JOIN posts ON users.id = posts.user_id WHERE posts.published = TRUE', explanation='This query identifies all users who have at least one post marked as published by joining the users and posts tables on their respective IDs.'), _usage=Usage(requests=1, request_tokens=626, response_tokens=68, total_tokens=694, details={'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0, 'cached_tokens': 0}))
✅ Generated SQL:
SELECT DISTINCT users.* FROM users JOIN posts ON users.id = posts.user_id WHERE posts.published = TRUE
✅ Explanation:
This query identifies all users who have at least one post marked as published by joining the users and posts tables on their respective IDs.
✅ Cost:
Usage(requests=1, request_tokens=626, response_tokens=68, total_tokens=694, details={'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0, 'cached_tokens': 0})
📊 Results:
(1, 'John Doe', 'john@example.com', '2024-12-21 14:05:32')
(2, 'Jane Smith', 'jane@example.com', '2024-12-21 14:05:32')
==================================================
Processing query: show me all draft posts with their authors
Processing query result: RunResult(_all_messages=[ModelRequest(parts=[SystemPromptPart(content='You are a proficient Database Administrator having expertise in generating SQL queries. Your task is to convert natural language requests into SQL queries for a SQLite database.\nYou must respond with a Success object containing a sql_query and an explanation.\n\nDatabase schema:\n{DB_SCHEMA}\n\nFormat your response exactly like this, with no additional text or formatting:\n{{\n "type": "Success",\n "sql_query": "<your SQL query here>",\n "explanation": "<your explanation here>"\n}}\n\nExamples:\n User: show me all users who have published posts\n {{\n "type": "Success",\n "sql_query": "SELECT DISTINCT users.* FROM users JOIN posts ON users.id = posts.user_id WHERE posts.published = TRUE",\n "explanation": "This query finds all users who have at least one published post by joining the users and posts tables."\n }}\n\n User: count posts by user\n {{\n "type": "Success",\n "sql_query": "SELECT users.name, COUNT(posts.id) as post_count FROM users LEFT JOIN posts ON users.id = posts.user_id GROUP BY users.id, users.name",\n "explanation": "This query counts the number of posts for each user, including users with no posts using LEFT JOIN."\n }}\n\n If you receive an error message about a previous query, analyze the error and fix the issues in your new query.\n Common fixes include:\n - Correcting column names\n - Fixing JOIN conditions\n - Adjusting GROUP BY clauses\n - Handling NULL values properly\n\nIf you cannot generate a valid query, respond with:\n{{\n "type": "InvalidRequest",\n "error_message": "<explanation of why the request cannot be processed>"\n}}\n\nImportant:\n1. Respond with ONLY the JSON object, no additional text\n2. Always include the "type" field as either "Success" or "InvalidRequest"\n3. All queries must be SELECT statements\n4. Provide clear explanations\n5. Use proper JOIN conditions and WHERE clauses as needed\n', part_kind='system-prompt'), UserPromptPart(content='\nPrevious query failed with error: no such column: posts.status\nPlease generate a corrected SQL query for the original request: show me all draft posts with their authors\n', timestamp=datetime.datetime(2024, 12, 21, 16, 25, 33, 648254, tzinfo=datetime.timezone.utc), part_kind='user-prompt')], kind='request'), ModelResponse(parts=[ToolCallPart(tool_name='final_result_Success', args=ArgsJson(args_json='{"sql_query": "SELECT posts.*, users.name as author_name FROM posts JOIN users ON posts.user_id = users.id WHERE posts.published = FALSE", "explanation": "This query selects all draft posts (those not published) along with their authors by joining the posts and users tables based on the user ID."}'), tool_call_id='call_PFcdShmLLrI4ATOqh4MplMAt', part_kind='tool-call')], timestamp=datetime.datetime(2024, 12, 21, 16, 25, 33, tzinfo=datetime.timezone.utc), kind='response'), ModelRequest(parts=[ToolReturnPart(tool_name='final_result_Success', content='Final result processed.', tool_call_id='call_PFcdShmLLrI4ATOqh4MplMAt', timestamp=datetime.datetime(2024, 12, 21, 16, 25, 35, 126566, tzinfo=datetime.timezone.utc), part_kind='tool-return')], kind='request')], _new_message_index=0, data=Success(type='Success', sql_query='SELECT posts.*, users.name as author_name FROM posts JOIN users ON posts.user_id = users.id WHERE posts.published = FALSE', explanation='This query selects all draft posts (those not published) along with their authors by joining the posts and users tables based on the user ID.'), _usage=Usage(requests=1, request_tokens=654, response_tokens=90, total_tokens=744, details={'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0, 'cached_tokens': 0}))
✅ Generated SQL:
SELECT posts.*, users.name as author_name FROM posts JOIN users ON posts.user_id = users.id WHERE posts.published = FALSE
✅ Explanation:
This query selects all draft posts (those not published) along with their authors by joining the posts and users tables based on the user ID.
✅ Cost:
Usage(requests=1, request_tokens=654, response_tokens=90, total_tokens=744, details={'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0, 'cached_tokens': 0})
📊 Results:
(2, 1, 'Draft Post', 'Work in Progress', 0, '2024-12-21 14:05:32', 'John Doe')
==================================================
Processing query: what is the count of users table
Processing query result: RunResult(_all_messages=[ModelRequest(parts=[SystemPromptPart(content='You are a proficient Database Administrator having expertise in generating SQL queries. Your task is to convert natural language requests into SQL queries for a SQLite database.\nYou must respond with a Success object containing a sql_query and an explanation.\n\nDatabase schema:\n{DB_SCHEMA}\n\nFormat your response exactly like this, with no additional text or formatting:\n{{\n "type": "Success",\n "sql_query": "<your SQL query here>",\n "explanation": "<your explanation here>"\n}}\n\nExamples:\n User: show me all users who have published posts\n {{\n "type": "Success",\n "sql_query": "SELECT DISTINCT users.* FROM users JOIN posts ON users.id = posts.user_id WHERE posts.published = TRUE",\n "explanation": "This query finds all users who have at least one published post by joining the users and posts tables."\n }}\n\n User: count posts by user\n {{\n "type": "Success",\n "sql_query": "SELECT users.name, COUNT(posts.id) as post_count FROM users LEFT JOIN posts ON users.id = posts.user_id GROUP BY users.id, users.name",\n "explanation": "This query counts the number of posts for each user, including users with no posts using LEFT JOIN."\n }}\n\n If you receive an error message about a previous query, analyze the error and fix the issues in your new query.\n Common fixes include:\n - Correcting column names\n - Fixing JOIN conditions\n - Adjusting GROUP BY clauses\n - Handling NULL values properly\n\nIf you cannot generate a valid query, respond with:\n{{\n "type": "InvalidRequest",\n "error_message": "<explanation of why the request cannot be processed>"\n}}\n\nImportant:\n1. Respond with ONLY the JSON object, no additional text\n2. Always include the "type" field as either "Success" or "InvalidRequest"\n3. All queries must be SELECT statements\n4. Provide clear explanations\n5. Use proper JOIN conditions and WHERE clauses as needed\n', part_kind='system-prompt'), UserPromptPart(content='what is the count of users table', timestamp=datetime.datetime(2024, 12, 21, 16, 25, 35, 135611, tzinfo=datetime.timezone.utc), part_kind='user-prompt')], kind='request'), ModelResponse(parts=[ToolCallPart(tool_name='final_result_Success', args=ArgsJson(args_json='{"sql_query": "SELECT COUNT(*) as user_count FROM users", "explanation": "This query counts the total number of rows in the users table, providing the total number of users."}'), tool_call_id='call_vvBQ3QO4UwLHvpEpn0Gam5zt', part_kind='tool-call')], timestamp=datetime.datetime(2024, 12, 21, 16, 25, 35, tzinfo=datetime.timezone.utc), kind='response'), ModelRequest(parts=[ToolReturnPart(tool_name='final_result_Success', content='Final result processed.', tool_call_id='call_vvBQ3QO4UwLHvpEpn0Gam5zt', timestamp=datetime.datetime(2024, 12, 21, 16, 25, 36, 421186, tzinfo=datetime.timezone.utc), part_kind='tool-return')], kind='request')], _new_message_index=0, data=Success(type='Success', sql_query='SELECT COUNT(*) as user_count FROM users', explanation='This query counts the total number of rows in the users table, providing the total number of users.'), _usage=Usage(requests=1, request_tokens=627, response_tokens=65, total_tokens=692, details={'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0, 'cached_tokens': 0}))
✅ Generated SQL:
SELECT COUNT(*) as user_count FROM users
✅ Explanation:
This query counts the total number of rows in the users table, providing the total number of users.
✅ Cost:
Usage(requests=1, request_tokens=627, response_tokens=65, total_tokens=692, details={'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0, 'cached_tokens': 0})
📊 Results:
(2,)
==================================================
Processing query: show me the title of the posts published
Processing query result: RunResult(_all_messages=[ModelRequest(parts=[SystemPromptPart(content='You are a proficient Database Administrator having expertise in generating SQL queries. Your task is to convert natural language requests into SQL queries for a SQLite database.\nYou must respond with a Success object containing a sql_query and an explanation.\n\nDatabase schema:\n{DB_SCHEMA}\n\nFormat your response exactly like this, with no additional text or formatting:\n{{\n "type": "Success",\n "sql_query": "<your SQL query here>",\n "explanation": "<your explanation here>"\n}}\n\nExamples:\n User: show me all users who have published posts\n {{\n "type": "Success",\n "sql_query": "SELECT DISTINCT users.* FROM users JOIN posts ON users.id = posts.user_id WHERE posts.published = TRUE",\n "explanation": "This query finds all users who have at least one published post by joining the users and posts tables."\n }}\n\n User: count posts by user\n {{\n "type": "Success",\n "sql_query": "SELECT users.name, COUNT(posts.id) as post_count FROM users LEFT JOIN posts ON users.id = posts.user_id GROUP BY users.id, users.name",\n "explanation": "This query counts the number of posts for each user, including users with no posts using LEFT JOIN."\n }}\n\n If you receive an error message about a previous query, analyze the error and fix the issues in your new query.\n Common fixes include:\n - Correcting column names\n - Fixing JOIN conditions\n - Adjusting GROUP BY clauses\n - Handling NULL values properly\n\nIf you cannot generate a valid query, respond with:\n{{\n "type": "InvalidRequest",\n "error_message": "<explanation of why the request cannot be processed>"\n}}\n\nImportant:\n1. Respond with ONLY the JSON object, no additional text\n2. Always include the "type" field as either "Success" or "InvalidRequest"\n3. All queries must be SELECT statements\n4. Provide clear explanations\n5. Use proper JOIN conditions and WHERE clauses as needed\n', part_kind='system-prompt'), UserPromptPart(content='show me the title of the posts published', timestamp=datetime.datetime(2024, 12, 21, 16, 25, 36, 428434, tzinfo=datetime.timezone.utc), part_kind='user-prompt')], kind='request'), ModelResponse(parts=[ToolCallPart(tool_name='final_result_Success', args=ArgsJson(args_json='{"sql_query":"SELECT title FROM posts WHERE published = TRUE","explanation":"This query retrieves the titles of all posts that have been marked as published."}'), tool_call_id='call_wkb9qISadI545XilG1RXjr9f', part_kind='tool-call')], timestamp=datetime.datetime(2024, 12, 21, 16, 25, 36, tzinfo=datetime.timezone.utc), kind='response'), ModelRequest(parts=[ToolReturnPart(tool_name='final_result_Success', content='Final result processed.', tool_call_id='call_wkb9qISadI545XilG1RXjr9f', timestamp=datetime.datetime(2024, 12, 21, 16, 25, 37, 566200, tzinfo=datetime.timezone.utc), part_kind='tool-return')], kind='request')], _new_message_index=0, data=Success(type='Success', sql_query='SELECT title FROM posts WHERE published = TRUE', explanation='This query retrieves the titles of all posts that have been marked as published.'), _usage=Usage(requests=1, request_tokens=628, response_tokens=44, total_tokens=672, details={'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0, 'cached_tokens': 0}))
✅ Generated SQL:
SELECT title FROM posts WHERE published = TRUE
✅ Explanation:
This query retrieves the titles of all posts that have been marked as published.
✅ Cost:
Usage(requests=1, request_tokens=628, response_tokens=44, total_tokens=672, details={'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0, 'cached_tokens': 0})
📊 Results:
('First Post',)
("Jane's Post",)
==================================================
Processing query: show me the structure of the posts
Processing query result: type='InvalidRequest' error_message="Failed to generate valid query after 3 attempts. Last error: 'InvalidRequest' object has no attribute 'sql_query'"
Error: Failed to generate valid query after 3 attempts. Last error: 'InvalidRequest' object has no attribute 'sql_query'
==================================================
Processing query: show me the names of all the users
Processing query result: RunResult(_all_messages=[ModelRequest(parts=[SystemPromptPart(content='You are a proficient Database Administrator having expertise in generating SQL queries. Your task is to convert natural language requests into SQL queries for a SQLite database.\nYou must respond with a Success object containing a sql_query and an explanation.\n\nDatabase schema:\n{DB_SCHEMA}\n\nFormat your response exactly like this, with no additional text or formatting:\n{{\n "type": "Success",\n "sql_query": "<your SQL query here>",\n "explanation": "<your explanation here>"\n}}\n\nExamples:\n User: show me all users who have published posts\n {{\n "type": "Success",\n "sql_query": "SELECT DISTINCT users.* FROM users JOIN posts ON users.id = posts.user_id WHERE posts.published = TRUE",\n "explanation": "This query finds all users who have at least one published post by joining the users and posts tables."\n }}\n\n User: count posts by user\n {{\n "type": "Success",\n "sql_query": "SELECT users.name, COUNT(posts.id) as post_count FROM users LEFT JOIN posts ON users.id = posts.user_id GROUP BY users.id, users.name",\n "explanation": "This query counts the number of posts for each user, including users with no posts using LEFT JOIN."\n }}\n\n If you receive an error message about a previous query, analyze the error and fix the issues in your new query.\n Common fixes include:\n - Correcting column names\n - Fixing JOIN conditions\n - Adjusting GROUP BY clauses\n - Handling NULL values properly\n\nIf you cannot generate a valid query, respond with:\n{{\n "type": "InvalidRequest",\n "error_message": "<explanation of why the request cannot be processed>"\n}}\n\nImportant:\n1. Respond with ONLY the JSON object, no additional text\n2. Always include the "type" field as either "Success" or "InvalidRequest"\n3. All queries must be SELECT statements\n4. Provide clear explanations\n5. Use proper JOIN conditions and WHERE clauses as needed\n', part_kind='system-prompt'), UserPromptPart(content='show me the names of all the users', timestamp=datetime.datetime(2024, 12, 21, 16, 25, 40, 312604, tzinfo=datetime.timezone.utc), part_kind='user-prompt')], kind='request'), ModelResponse(parts=[ToolCallPart(tool_name='final_result_Success', args=ArgsJson(args_json='{"sql_query":"SELECT name FROM users","explanation":"This query retrieves the names of all users from the users table."}'), tool_call_id='call_c06zk9SA5TyoqBG0sjVEhpdJ', part_kind='tool-call')], timestamp=datetime.datetime(2024, 12, 21, 16, 25, 40, tzinfo=datetime.timezone.utc), kind='response'), ModelRequest(parts=[ToolReturnPart(tool_name='final_result_Success', content='Final result processed.', tool_call_id='call_c06zk9SA5TyoqBG0sjVEhpdJ', timestamp=datetime.datetime(2024, 12, 21, 16, 25, 41, 194161, tzinfo=datetime.timezone.utc), part_kind='tool-return')], kind='request')], _new_message_index=0, data=Success(type='Success', sql_query='SELECT name FROM users', explanation='This query retrieves the names of all users from the users table.'), _usage=Usage(requests=1, request_tokens=628, response_tokens=38, total_tokens=666, details={'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0, 'cached_tokens': 0}))
✅ Generated SQL:
SELECT name FROM users
✅ Explanation:
This query retrieves the names of all users from the users table.
✅ Cost:
Usage(requests=1, request_tokens=628, response_tokens=38, total_tokens=666, details={'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0, 'cached_tokens': 0})
📊 Results:
('John Doe',)
('Jane Smith',)
==================================================
Future Scope
The system could be enhanced with:
- Query Optimization: Analyzing and optimizing generated queries
- Schema Learning: Dynamically adapting to different database schemas
- Multi-database Support: Extending beyond SQLite to other databases
- Query History: Learning from successful queries
- Performance Metrics: Adding query execution time analysis
Pros and Cons
Pros:
- Natural language interface
- Self-healing capabilities
- Strong type safety
- Async operation
- Detailed error reporting
Cons:
- Limited to SELECT queries
- Dependency on external LLM service
- Potential cost implications
- Limited to predefined schema
- May require multiple attempts for complex queries
Conclusion
The SQL Agent demonstrates how modern AI technologies can make database interactions more accessible while maintaining robustness through self-healing mechanisms. While there are limitations, the system provides a solid foundation for natural language database interactions.
This article provides a comprehensive overview of the SQL Agent implementation, its architecture, and potential improvements. Feel free to adapt and expand upon this structure based on your specific needs and audience.