Bridging Natural Language to SQL (Locally!)

Amy Marks
4 min readFeb 6, 2024

--

What if you could ask your database a question just as you would ask a person? That’s precisely what I’ve created with a new application that leverages the power of a local LLM (in this case, TheBloke/Llama-2–7B-Chat-GGML) and an SQLite database. I decided to work on this after looking for something similar but only finding code examples for OpenAI LLM usage.

With this application, you can simply type in a question in natural language, and it will query the database for the answer. For example, the database I used was related to movie budget and gross, so you could ask: “Which year had the highest gross?” or “What was the average movie budget in 2009?” The application will then return the answer to your question (although, since I’m running it locally, it can take a long time to respond).

The Concept: Natural Language to SQL

The core idea is simple: allow users to input a natural language question and then translate that question into a SQL query to retrieve the answer from a database.

This process has a number of benefits. First, it makes it possible for people who are not experts in SQL to query data. This can be helpful for businesses, researchers, and anyone else who needs to analyze data. Second, it makes data more accessible and engaging. When people can ask questions about data in their own language, they are more likely to be interested in the results. This can lead to better decision-making and understanding of the world around us.

The Code: How It Works

Here’s a breakdown of the key components:

  • User Interface with Streamlit: Using Streamlit, I’ve created a user-friendly interface where users can input their questions in natural language. The application’s layout is easily configurable.
  • Llama2 Model: The heart of the application is the integration with Llama2, a language model capable of understanding and translating natural language queries. I load the model using the path to the pre-trained binary file and configure it with specific parameters like the number of GPU layers and context size.
  • Initializing the App: The application connects to an SQLite database using the SQLDatabase.from_uri method. This connection allows the translated SQL queries to be executed against the existing database. It also collects the user’s query via a Streamlit UI text entry screen. The SQLDatabaseChain from langchain-experimental is a chain used to interact with a SQL database. This chain takes the natural language query, passes it to the LLM, and then generates the corresponding SQL query.
  • Result Parsing: A function that takes a dictionary result as input and returns a new dictionary.The function iterates over the intermediate_steps in the result dictionary and extracts relevant information to create the new example dictionary. Inside the loop, it checks the type of each step and sets the answer_key based on the type. It also extracts and stores the table_info from the input dictionaries in the intermediate steps.Finally, it returns the created example dictionary.Prior to adding this section, Llama2–7B was very prone to outputting direct hallucinations.
  • Exception Handling: The code includes exception handling to notify users if their query fails.
  • Displaying the Result: Finally, the result is displayed on the Streamlit interface, allowing users to see the answer to their questions.

The application is designed to be easy to use and provide a seamless user experience.

The key resources I used in building this:

SQL Database Chain | 🦜️🔗 Langchain

SQL | 🦜️🔗 Langchain

Llama.cpp | 🦜️🔗 Langchain

langchain_experimental.sql.base.SQLDatabaseChain — 🦜🔗 LangChain 0.0.266

TheBloke/Llama-2–7B-GGML · Hugging Face

--

--