Turning Words into SQL: Leveraging Streamlit and OpenAI’s Structured Output for Accurate Query Generation
Understanding OpenAI’s Structured Output
The launch of OpenAI’s Structured Output marks a significant step forward in AI-assisted development. Unlike conventional language models that return text as free-form responses, the Structured Output feature generates responses in a predictable and structured format, which can be validated and parsed using libraries like Pydantic. This ensures that the data the model returns aligns with specific schemas, reducing the need for post-processing and increasing reliability when integrating AI into applications.
Why Pydantic?
Pydantic is a data validation library that enforces type-checking and data integrity. By using Pydantic models, developers can define schemas that represent expected data structures. In this application, Pydantic ensures that the data received from OpenAI’s Structured Output conforms to predefined types and constraints, which is crucial for generating SQL queries that are syntactically and semantically correct.
Building the SQL Query Generator in Streamlit
Let’s dive into the code that brings all these components together.
Setting Up the Models
To start, we define several enums and Pydantic models to represent the elements of an SQL query. These include the table name, column names, operators (e.g., =
, >
, <
), and other SQL components like aggregate functions and ranking types. Here's the code snippet for the Pydantic models:
from enum import Enum
from pydantic import BaseModel
class Table(str, Enum):
support_engineer_metrics = "ASSISTANT_METRICS"
class Column(str, Enum):
id = "USER_ID"
assistant_name = "ASSISTANT_NAME"
assistant_id = "ASSISTANT_ID"
question = "QUESTION"
answer = "ANSWER"
feedback = "FEEDBACK"
date = "DATE"
class Operator(str, Enum):
eq = "="
gt = ">"
lt = "<"
le = "<="
ge = ">="
ne = "!="
class OrderBy(str, Enum):
asc = "asc"
desc = "desc"
class Aggregate(str, Enum):
count = "count"
sum = "sum"
avg = "avg"
min = "min"
max = "max"
class RankType(str, Enum):
rank = "rank"
dense_rank = "dense_rank"
class Condition(BaseModel):
column: str
operator: Operator
value: Union[str, int]
class Query(BaseModel):
table_name: Table
columns: list[Column]
conditions: list[Condition]
aggregate: Aggregate
rank_type: RankType
order_by: OrderBy
These models represent the building blocks of an SQL query, ensuring that only valid data is used to construct the query.
Generating the SQL Query
The heart of the application is the generate_query
function. This function sends a request to OpenAI's API with the user's input, expecting a structured output that conforms to the Query
model. The structured response is then parsed to extract the necessary details for generating an SQL query. Here's the code snippet:
import openai
from openai import OpenAI
client = OpenAI()
def generate_query(user_input):
completion = client.beta.chat.completions.parse(
model="gpt-4o-2024-08-06",
messages=[
{"role": "system", "content": "You are a helpful assistant that can help generate SQL queries based on a user request."},
{"role": "user", "content": user_input},
],
tools=[openai.pydantic_function_tool(Query)],
)
parsed_args = completion.choices[0].message.tool_calls[0].function.parsed_arguments
table_name = parsed_args.table_name.value
column_names = [column.value for column in parsed_args.columns]
operators_and_conditions = [[condition.column, condition.operator.value, condition.value] for condition in parsed_args.conditions]
aggregate_function = parsed_args.aggregate.value
order_by = parsed_args.order_by.value
rank_type = parsed_args.rank_type.value
sql_completion = client.chat.completions.create(
model='gpt-4o-2024-08-06',
messages=[{"role": "system", "content": query_gen_prompt}]
)
return sql_completion.choices[0].message.content
Integrating with Streamlit
Streamlit is a powerful tool for building interactive applications quickly. In this example, we use Streamlit to create a simple UI that allows users to input a natural language query, which is then processed to generate the corresponding SQL.
import streamlit as st
st.title('Query Generator App')
user_query = st.text_area("Enter your query:", "Can you rank the top users by the number of questions they have answered in July 2024?")
if st.button('Generate Query'):
if user_query:
with st.spinner('Generating query...'):
generated_sql = generate_query(user_query)
st.subheader('Generated SQL Query:')
st.code(generated_sql, language='sql')
else:
st.warning('Please enter a query.')
This interface allows users to easily interact with the AI model, providing instant SQL query generation based on natural language input.
Benefits of This Approach
- Efficiency: The integration of OpenAI’s Structured Output with Pydantic ensures that queries are generated quickly and correctly, without the need for manual input validation or correction.
- Accuracy: By defining a strict schema using Pydantic, we ensure that the generated queries conform to expected formats, reducing the likelihood of SQL syntax errors.
- Scalability: This method can be extended to support more complex SQL operations, additional databases, and even custom business logic, making it highly adaptable to various use cases.
- Ease of Use: The Streamlit interface makes it accessible to users with limited SQL knowledge, allowing them to generate queries with natural language, thus lowering the barrier to entry for complex data analysis.
Conclusion
By leveraging OpenAI’s Structured Output, Pydantic, and Streamlit, we can build powerful tools that simplify SQL query generation, making it more accessible and less error-prone. This approach not only saves time but also enhances the accuracy and reliability of SQL queries. Whether you’re a data analyst, a developer, or a business user, this tool can help streamline your workflow and improve your productivity.
Get in Touch / Code Repo
Linkedin — https://www.linkedin.com/in/ryan-k-ba8274122/
Github Repository — https://github.com/klapp101/openai_structured_output