How to build a custom GPT from your relational database

Published in

Dataherald

5 min readJan 25, 2024

Deploying an NL-to-SQL GPT with Dataherald

OpenAI recently unveiled the GPT Store, a platform allowing developers to publish their own tailored versions of GPT on OpenAI’s storefront. So far, approximately 3 million custom versions of GPT have been created with each featuring unique capabilities. While it may sound daunting, with the right backend set-up the process of crafting a custom GPT is very straightforward.

In this article we will show how you can build a custom GPT that is capable of retrieving data and answering questions from a structured database. For this example we will be using a PostGres database containing US real-estate data, but the same approach can be used with any relational database. We will use OpenAI’s custom GPT editor as well as Dataherald AI, a hosted NL-to-SQL API.

Real Estate Database on Dataherald AI Console

Prerequisites

To develop a custom GPT with OpenAI’s editor, ensure you have:

Subscribed to the ChatGPT Plus plan from OpenAI.
Have access to the Dataherald NL-to-SQL API and an API key
Can host the application. For this tutorial we opted for the simple serverless approach using AWS Lambda. If you would like to also use Lambda for hosting your server check out this how-to article.

We will be using Dataherald’s Python SDK. For guidance on setting up your database with Dataherald AI, refer to Getting Started with Dataherald AI Guide. You can also find the full Dataherald documentation here.

NL-to-SQL Server

For this tutorial we will create and deploy a FastAPI application which will implement a single API endpoint. This API endpoint will:

Receive the Natural Language question from our GPT
Interact with Dataherald engine to generate the necessary SQL
Execute it on the database and return the data

The following diagram illustrates the high level architecture of the application we will build:

Architecture of our custom GPT retrieving data from a relational database through an API in a FastAPI app and the Dataherald hosted NL-to-SQL API — Diagram of custom GPT. Our FastAPI app will call Dataherald hostel NL-to-SQL API which will generate high accuracy SQL from our relational database

Create the FastAPI App

Let’s create a simple FastAPI App that allows users to get the query result of a given question to a structured database. Assuming you have finished setting up your Dataherald AI organization and database connection, you should have the following environment variables:

DATAHERALD_API_KEY # your Dataherald AI API Key

DATABASE_CONNECTION_ID # database connection id on Dataherald AI

FINETUNING_ID # optional, but this really helps improve performance

First install the dependencies on python:

pip install fastapi
pip install dataherald
pip install mangum # optional, wrapper for AWS Lambda 
pip install uvicorn

Then create a file app.py, you can find all the code on my public repository:

import os
from fastapi import FastAPI, HTTPException
from dataherald import AsyncDataherald
from dataherald.types import SqlGenerationResponse
from mangum import Mangum # optional
import uvicorn
API_KEY = os.environ.get("DATAHERALD_API_KEY")
database_connection_id = os.environ.get("DATABASE_CONNECTION_ID")
finetuning_id = os.environ.get("FINETUNING_ID")

app = FastAPI()
handler = Mangum(app) # optional

@app.get("/")
async def root():
    return {"message": "Hello World"}

@app.post("/questions", status_code=201)
async def answer_question(text: str):
    # use dataherald client
    dh_client = AsyncDataherald(api_key=API_KEY)
    try:
        # get generated sql query
        response: SqlGenerationResponse = await dh_client.sql_generations.create(
            prompt={"text": text, "db_connection_id": database_connection_id},
            finetuning_id=finetuning_id, # exclude finetuning_id if you don't have one
        )
        # execute query
        sql_results = await dh_client.sql_generations.execute(response.id, max_rows=10)

        return sql_results
    except Exception as e:
        print(e)
        raise HTTPException(status_code=500, detail="Internal Server Error") from e

if __name__ == "__main__":
    uvicorn.run(app, host="0.0.0.0", port=8080)

This creates an endpoint /questions that takes in a query parameter text that is then passed into Dataherald AI which returns the SQL result of the answer.

To test out the sever, run python3 app.py

Set up a custom GPT on OpenAI

Once we have our hosted NL-to-SQL server, navigate to the OpenAI GPT editor and go to the Configure tab to build your custom GPT. You will need to be on OpenAI’s ChatGPT Plus plan to access the editor.

First enter the name and its description of your custom GPT. Next click “Create new action” under Actions to add querying your NL-to-SQL server as an action. Either import your OpenAPI Schema or take an example schema from OpenAI and replace the url with your hosted server with the appropriate details, like this:

Once the schema has been completed, create an action that uses the path for your querying endpoint. We will reference the operationId of the action in the instructions.

Now we can go back to Configure tab and let the GPT know to use the action when a user asks a question:

Let us test out our custom GPT on the Preview page:

It works! Our GPT detected that the question is related to the custom action we had built and called our API endpoint to retrieve the data to answer the question. You can check out the custom GPT we created in this tutorial here.

Now let us try a few harder questions that will require harder SQL:

Improve NL-to-SQL (Optional)

Once you ask your questions on the custom GPT you can head over to the Dataherald AI console and checkout the generated query yourself. To improve the accuracy of your NL-to-SQL results, verify your queries and use them to finetune your LLM models in Dataherald AI.

Final Thoughts

We are now ready to launch and release our GPT! All we need to do is add a privacy policy url in the action that you’ve created and hit publish. Before publishing, make sure your database does not contain any sensitive data, as the entire world can access it.

About Dataherald

Sign up for free and use the hosted version of Dataherald
Our open-source engine is available on Github.
Join our Discord server to learn more about the project.