How to connect OpenAI’s Assistant API to your SQL database

Mo Pourreza
Dataherald
Published in
5 min readNov 10, 2023
Generated BY DALLE-3

Introduction

A vast treasure of critical business information resides in relational databases, yet extracting insights from this data requires knowledge of SQL. However, most people do not know SQL. With Large language models getting better at language comprehension and reasoning every day, there is a big push to use them to craft SQL queries, unlocking valuable insights from relational databases.

OpenAI just released a lot of new features in their annual developer conference. Alongside their newest model (GPT-4-turbo) 🚀, they released the Assistant API which allows developers to create powerful chatbots and agents. The Assistants API includes two key features:

  1. The retrieval tool — which allows developers to chunk and retrieve relevant sections of documents through Retrieval Augmented Generation (RAG).
  2. Code interpreter — allows agents to execute code in a secure sandbox, giving assistants the ability to create files, illustrate charts, and run commands.

In this blog post, we’ll explore how to use of the OpenAI Assistant API to build NL-to-SQL assistants capable of generating SQL queries and extracting insights from any database.

We will use the Chinook.db sample database for this tutorial. By the end you will have a CLI agent that can answer questions from this database, as below:

Installation

Working with the Assistant API involves several steps: setting up threads, crafting messages, initiating runs, and managing various run statuses. We developed a lightweight wrapper around the API that reduces this complexity, which you can access in this repository. Our wrapper depends on just three Python packages and uses pure Python for ease of use.

To start building your own assistants with our wrapper first clone our repository and install the requirements:

git clone https://github.com/Dataherald/Assistant
cd Assistant
pip install -r requirements.txt

Then create an .env file and put your OPENAI_API_KEY there.

Building SQL Assistants

Now, let’s get hands-on with building our assistants. We will construct two distinct types:

  1. Simple Assistant — This assistant will use the code interpreter to create CSV files and implement only two Python functions. The first function retrieves the schema of the Chinook database and pass it to assistant, while the second executes SQL queries within it. This will work well for simple questions
  2. Dataherald Assistant for Complex Queries— Will integrate the open source Dataherald NL-to-SQL engine API, enabling it to field complex questions on a given database for real-world enterprise scenarios.

Simple SQL Assistant

Building assistants with our wrapper is very simple, All you need to do is define your custom functions, instantiate an assistant object, and then simply invoke the chat() function.

First, let’s implement two python functions, one for providing the schema of the database to the assistant and the other one to run the SQL queries over the database. you must create new classes corresponding to each of those two functions that inherit from the Function class. Then set the name, description, and the properties of these functions. Finally, you have to implement the function method, which will be called when the function is invoked. The function method must accept the same number of parameters as the number of parameters defined in the parameters attribute:

from function import Function, Property

class GetDBSchema(Function):
def __init__(self):
super().__init__(
name="get_db_schema",
description="Get the schema of the Chinook database",
)
def function(self):
conn = sqlite3.connect('Chinook.sqlite')
cursor = conn.cursor()
cursor.execute("SELECT sql FROM sqlite_master WHERE type='table';")
create_statements = cursor.fetchall()
conn.close()
return '\n\n'.join([statement[0] for statement in create_statements])

class RunSQLQuery(Function):
def __init__(self):
super().__init__(
name="run_sql_query",
description="Run a SQL query on the Chinook database",
parameters=[
Property(
name="query",
description="The SQL query to run",
type="string",
required=True,
),
]
)
def function(self, query):
conn = sqlite3.connect('Chinook.sqlite')
cursor = conn.cursor()
cursor.execute(query)
results = cursor.fetchall()
conn.close()
return '\n'.join([str(result) for result in results])

Now you have all of the functions you need to build your first SQL assistant. You can build the assistant simply by creating an object from the AIAssistant class as follows:

assistant = AIAssistant(
instruction="""
You are a SQL expert. User asks you questions about the Chinook database.
First obtain the schema of the database to check the tables and columns, then generate SQL queries to answer the questions.
""",
model="gpt-3.5-turbo-1106",
functions=[GetDBSchema(), RunSQLQuery()],
use_code_interpreter=True, #Using the code interperter to generate CSV files
)

Now to start chatting with the assistant simply call the chat method:

assistant.chat()

You’re now ready to query the Chinook database using your assistant. As you experiment, you might notice its limitations with complex questions. To enhance overcome this, we will now integrate the Dataherald engine for robust question-answering and employ the code interpreter for CSV file generation.

Dataherald Assistant

Generated by DALLE-3

Now let’s build a more capable SQL assistant using the open source Dataherald engine. We are going to use a hosted version of the Dataherald engine that is connected to two datasets of US real estate data and Senate Stock watcher. We will create a function to call Dataherald API to ask the questions and obtain a response. The function to call the API is as follows:

import requests
import json
HOST = "http://streamlit_engine.dataherald.ai/api/v1/questions"
DATABASES_IDS = {
'RealEstate': '6537c3dc4cec532eccb7d6cc',
'SenateStock': '65424c694cec532eccb7d766',
}
def answer_question(
question: str,
db_name: enumerate(DATABASES_IDS.keys()) = 'RealEstate'
) -> str:
payload = {
"db_connection_id": DATABASES_IDS[db_name],
"question": question,
}
json_data = json.dumps(payload)
response = requests.post(HOST, data=json_data)
if response.status_code == 201:
engine_response = response.json()['response'] + '\n' + json.dumps(response.json()['sql_query_result'])
return engine_response
else:
return "Sorry, I don't know the answer to that question."

Then in order to let the assistant use this API we will create a new Function class:

from function import Function, Property

class DataheraldFunction(Function):
def __init__(self):
super().__init__(
name="dataherald",
description="Answer questions on a given database",
parameters=[
Property(
name="db_name",
description="The database to query, possible values are: RealEstate, SenateStock",
type="string",
required=False,
),
Property(
name="question",
description="The question to answer",
type="string",
required=True,
),
]
)
def function(self, db_name, question):
return answer_question(question, db_name)

Now let’s create our assistant enabling code interpreter and also providing this function to it:

assistant = AIAssistant(
instruction="",
model="gpt-3.5-turbo-1106",
functions=[DataheraldFunction()],
use_code_interpreter=True,
)

Now let’s ask question by calling the chat() method:

assistant.chat()

About Dataherald

  • Sign up for free and use the hosted version of Dataherald
  • Our open-source engine is available on Github.
  • Join our Discord server to learn more about the project.

--

--