How to build a custom GPT from your relational database
Deploying an NL-to-SQL GPT with Dataherald
OpenAI recently unveiled the GPT Store, a platform allowing developers to publish their own tailored versions of GPT on OpenAI’s storefront. So far, approximately 3 million custom versions of GPT have been created with each featuring unique capabilities. While it may sound daunting, with the right backend set-up the process of crafting a custom GPT is very straightforward.
In this article we will show how you can build a custom GPT that is capable of retrieving data and answering questions from a structured database. For this example we will be using a PostGres database containing US real-estate data, but the same approach can be used with any relational database. We will use OpenAI’s custom GPT editor as well as Dataherald AI, a hosted NL-to-SQL API.
Prerequisites
To develop a custom GPT with OpenAI’s editor, ensure you have:
- Subscribed to the ChatGPT Plus plan from OpenAI.
- Have access to the Dataherald NL-to-SQL API and an API key
- Can host the application. For this tutorial we opted for the simple serverless approach using AWS Lambda. If you would like to also use Lambda for hosting your server check out this how-to article.
We will be using Dataherald’s Python SDK. For guidance on setting up your database with Dataherald AI, refer to Getting Started with Dataherald AI Guide. You can also find the full Dataherald documentation here.
NL-to-SQL Server
For this tutorial we will create and deploy a FastAPI application which will implement a single API endpoint. This API endpoint will:
- Receive the Natural Language question from our GPT
- Interact with Dataherald engine to generate the necessary SQL
- Execute it on the database and return the data
The following diagram illustrates the high level architecture of the application we will build:
Create the FastAPI App
Let’s create a simple FastAPI App that allows users to get the query result of a given question to a structured database. Assuming you have finished setting up your Dataherald AI organization and database connection, you should have the following environment variables:
DATAHERALD_API_KEY # your Dataherald AI API Key
DATABASE_CONNECTION_ID # database connection id on Dataherald AI
FINETUNING_ID # optional, but this really helps improve performance
First install the dependencies on python:
pip install fastapi
pip install dataherald
pip install mangum # optional, wrapper for AWS Lambda
pip install uvicorn
Then create a file app.py
, you can find all the code on my public repository:
import os
from fastapi import FastAPI, HTTPException
from dataherald import AsyncDataherald
from dataherald.types import SqlGenerationResponse
from mangum import Mangum # optional
import uvicorn
API_KEY = os.environ.get("DATAHERALD_API_KEY")
database_connection_id = os.environ.get("DATABASE_CONNECTION_ID")
finetuning_id = os.environ.get("FINETUNING_ID")
app = FastAPI()
handler = Mangum(app) # optional
@app.get("/")
async def root():
return {"message": "Hello World"}
@app.post("/questions", status_code=201)
async def answer_question(text: str):
# use dataherald client
dh_client = AsyncDataherald(api_key=API_KEY)
try:
# get generated sql query
response: SqlGenerationResponse = await dh_client.sql_generations.create(
prompt={"text": text, "db_connection_id": database_connection_id},
finetuning_id=finetuning_id, # exclude finetuning_id if you don't have one
)
# execute query
sql_results = await dh_client.sql_generations.execute(response.id, max_rows=10)
return sql_results
except Exception as e:
print(e)
raise HTTPException(status_code=500, detail="Internal Server Error") from e
if __name__ == "__main__":
uvicorn.run(app, host="0.0.0.0", port=8080)
This creates an endpoint /questions
that takes in a query parameter text
that is then passed into Dataherald AI which returns the SQL result of the answer.
To test out the sever, run python3 app.py
Set up a custom GPT on OpenAI
Once we have our hosted NL-to-SQL server, navigate to the OpenAI GPT editor and go to the Configure tab to build your custom GPT. You will need to be on OpenAI’s ChatGPT Plus plan to access the editor.
First enter the name and its description of your custom GPT. Next click “Create new action” under Actions to add querying your NL-to-SQL server as an action. Either import your OpenAPI Schema or take an example schema from OpenAI and replace the url with your hosted server with the appropriate details, like this:
Once the schema has been completed, create an action that uses the path for your querying endpoint. We will reference the operationId of the action in the instructions.
Now we can go back to Configure tab and let the GPT know to use the action when a user asks a question:
Let us test out our custom GPT on the Preview page:
It works! Our GPT detected that the question is related to the custom action we had built and called our API endpoint to retrieve the data to answer the question. You can check out the custom GPT we created in this tutorial here.
Now let us try a few harder questions that will require harder SQL:
Improve NL-to-SQL (Optional)
Once you ask your questions on the custom GPT you can head over to the Dataherald AI console and checkout the generated query yourself. To improve the accuracy of your NL-to-SQL results, verify your queries and use them to finetune your LLM models in Dataherald AI.
Final Thoughts
We are now ready to launch and release our GPT! All we need to do is add a privacy policy url in the action that you’ve created and hit publish. Before publishing, make sure your database does not contain any sensitive data, as the entire world can access it.
About Dataherald
- Sign up for free and use the hosted version of Dataherald
- Our open-source engine is available on Github.
- Join our Discord server to learn more about the project.