Natural Language to SQL using ChatGPT
ChatGPT is undoubtedly going to help everyone become smarter.
It can be a helpful tool to assist you in achieving your objectives, it helps in creating blog posts, writing code for you, and whatnot.
In this blog post, we will look at an exciting task: translating natural language to SQL.
The goal of NLP to SQL is to make it easier for users to interact with databases by allowing them to use natural language queries rather than having to write complex SQL code.
This helps a lot of non-technical users to get meaningful insights from their dataset using plain English language.
Problem Statement: Translating plain English text into SQL Queries
Answer to Natural Language question, such as “What countries had the top sales?” and convert that into a SQL query.
Step 1. Data Acquisition
We’ll be taking Sales Data from: https://www.kaggle.com/datasets/kyanyoga/sample-sales-data
We’ll read the above dataset using pandas and then push the same data into a temporary database using the SQL Alchemy library
Step 2. SQL Database setup and creating the connection
Setting up a temporary in-memory (RAM) database and then exporting this CSV →Pandas DF → SQL DB
- Setting up the DB
- Creating the connection to SQL and querying the database
Step 3. Setting up Open AI account and informing GPT about our data
It starts with setting up your own OPEN AI API key from the account section of OPEN AI, please refer to the below link for set up
https://platform.openai.com/account/api-keys
This should get followed by informing Chat GPT about your initial table structure, this step might require some trial and error based on the kind of database you are dealing with, but the below steps should give a fair idea.
We can see above how we created the SQL table structure using all columns from the pandas' data frame.
Step 4. Taking natural language requests from the end user and combining prompts
We’ll be creating an input function to grab the user request
In the above prompt user has asked to get the “Sum of sales per Postal Code” using plain English text.
Now we’ll be creating a function for combining the user prompt with the table structure with the additional string “A query to answer” followed by the “Select” Keyword so that GPT understands the ask correctly
Step 5. Calling the Da-Vinci model and parsing the response
Now we’ll be calling Text-DaVinci-Model for giving us the results, the completion API used below has various parameters such as temperature, max_tokens, etc., which are explained in the below detailed documentation link from the official homepage.
Now building a function to parse the response section and pass the same into our database.
Step 6. Fetching the results from DB
Now all we need to do is fetch the result object from the database to verify our results.
Voila, congratulations on fetching the result set accurately using the GPT model DaVinci
Thank you for reading till here. In this post, we learned how to convert plain English text into meaningful SQL queries to ease the task of non-technical and Business Users.
Code on Github: https://github.com/soumyansh/NLP-To-SQL.git
I hope you found this tutorial useful. I’m curious about what you think so hit me with some comments. You can also get in touch with me directly through email or connect with me on LinkedIn.