Natural Language to SQL using an Open Source LLM

Indira KriGan
Brillio Data Science
4 min readApr 25, 2023

Among multiple explorations that I’ve been conducting with my fellow LLM experimenters aka. my team, here is an interesting piece — an LLM-based approach for translating natural language questions into SQL queries.

As a use case, this has been tried, way before LLMs started ruling. A typical scenario where this would be useful is say — when an SQL-illiterate person needs to query things from a database. This can as well be extended to other underlying data sources like a graph, where we could build a solution to generate Cypher queries, for example, which can query the graph and return the results.

A very common ask from our customers when it comes to LLMs, has been about the use of open-source models in place of paid APIs ( Open AI , most of the times) , the pros and cons of each of the approaches along with the various architectural and design considerations that need to be worked upon- before the gigantic question of scaling the solution into production!

One step at a time — this first blog is about using an open-source model to generate sql queries, at inference time, using zero-shot learning. A little bit of prompt engineering, and I must say, pretty impressed to see the results!

The main tools /packages/ LLM that I used :

  1. HF chavinlo/alpaca-native — a replica of Stanford’s alpaca
  2. Langchain which is a framework for developing applications powered by language models
  3. bitsandbytes — lightweight wrapper around CUDA custom functions, in particular 8-bit optimizers, matrix multiplication, and quantization functions.
  4. AWS Sagemaker instance (ml.g5.8xlarge)

I used the above-mentioned model as it has been giving me good results for similar experiments on in-context learning for healthcare NER, assertion models, etc.

Well, time to take a quick peek into the code : ( please find the Git link at the bottom )

Step 1: Import everything

import torch
from transformers import LlamaTokenizer, LlamaForCausalLM, pipelinepy
from langchain.llms import HuggingFacePipeline
from langchain import PromptTemplate, LLMChain

Step 2: Let’s load the model and the tokenizer

base_model = LlamaForCausalLM.from_pretrained(
"chavinlo/alpaca-native",
load_in_8bit=True,
device_map='auto',
)

tokenizer = LlamaTokenizer.from_pretrained("chavinlo/alpaca-native")

Step 3: Define the pipeline and the prompt template. ( Experimented a bit with the temperature parameter as well as the prompt here — have fun )

pipe = pipeline(
"text-generation",
model=base_model,
tokenizer=tokenizer,
max_length=500,
temperature=0.3,
top_p=0.95,
repetition_penalty=1.2
)

local_llm = HuggingFacePipeline(pipeline=pipe)
llm_chain = LLMChain(prompt=prompt, llm=local_llm)


template = """
Write a SQL Query given the table name {Table} and columns as a list {Columns} for the given question :
{question}.
"""

prompt = PromptTemplate(template=template, input_variables=["Table","question","Columns"])

def get_llm_response(tble,question,cols):
llm_chain = LLMChain(prompt=prompt,
llm=local_llm
)
response= llm_chain.run({"Table" : tble,"question" :question, "Columns" : cols})
print(response)

Step 4: Is this working …

tble = "employee"
cols = ["id","name","date_of_birth","band","manager_id"]
question = "Query the count of employees in band L6 with 239045 as the manager ID"
get_llm_response(tble,question,cols)

## Answer: SELECT COUNT(*) FROM employee WHERE band='L6' AND manager_id=239045;

Another one —

tble = "employee"
cols = ["id","name","date_of_birth","band","manager_id"]
question = "Query the count of employees in band L6 and over 40 years of age"
get_llm_response(tble,question,cols)

## Answer: SELECT COUNT(*) FROM employee WHERE band='L6' AND date_of_birth>=(CURDATE() - INTERVAL 40 YEAR);

The quantization is definitely working —

Tue Apr 25 19:31:21 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.65.01 Driver Version: 515.65.01 CUDA Version: 11.7 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA A10G On | 00000000:00:1E.0 Off | 0 |
| 0% 24C P0 57W / 300W | 9845MiB / 23028MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 1659 C ...3/envs/python3/bin/python 9843MiB |
+-----------------------------------------------------------------------------+

A few questions that need to be answered as we would want to move this from a PoC to a bigger solution —

  1. How do we accommodate more than one table — which will be the case in most scenarios — a few answers come to mind: perhaps bigger, smarter prompts, few-shot learning, or fine-tuning?
  2. What about more complex queries?
  3. How would you evaluate the performance of an approach like this? Or compare with other LLM options?

Many things to ponder about and read and try! I will come back with more, and try to answer some of these along with any other fun stuff that I get to work on!

If you are working on something interesting in the field of LLMs or other generative AI models and would like to share and discuss it, please feel free to reach out.

And, here is the Git Link.

--

--