QueryCraft: Improving NL2SQL Accuracy with Query Correction Service:

Step 5. Exploring different ways to improve the Text2SQL pipeline accuracy.

Abhilasha Mangal
Towards Generative AI
4 min readJun 10, 2024

--

Query Correction serves as a vital component in the Text-To-SQL pipeline. The LLM model encounters challenges in generating correct and valid SQL queries. In such instances, a query correction service becomes essential, as it addresses these errors and ensures the generation of accurate SQL queries.

Highlight - Query correction service in Query craft’s pipeline

In the first phase of creating this pipeline, we conducted many experiments to assess the accuracy of the text-to-SQL pipeline. After analyzing the results, we can see that LLM generates many errors. These errors can be divided into the categories below.

Syntax errors: Typos, missing keywords, or incorrect punctuation can render the query invalid.

Logical errors: The query might be grammatically correct but not achieve the desired outcome because of flaws in how it translates to the user’s intent.

Schema mismatches: The query might reference tables or columns that don’t exist in the database or use incorrect data types.

Error analysis before query correction

After looking at these errors, we created the Query Correction service inside QueryCraft Framework, which helps reduce these types of errors. To understand the accuracy role more, we stored another field in our output file, ‘model_op1’.

This service mainly checks syntax-level errors, such as:

  1. In the first case, we observed that the LLM-generated queries were missing semicolons. Such queries resulted in syntax errors. We implemented a check to verify whether the query contained a semicolon to rectify this issue. Depending on the outcome, we appended a semicolon ‘;’ at the end of the query to eliminate the error.
  2. In the second case, we encountered instances where the LLM failed to generate certain basic SQL syntax elements. Specifically, we observed occurrences where the LLM generated “ILIKE,” whereas the correct SQL syntax should be “LIKE.” To rectify this error, we implemented a correction mechanism to replace instances of “ILIKE” with the appropriate “LIKE” syntax, ensuring adherence to standard SQL conventions and compatibility with database querying.
  3. In the third case, we observed instances where queries have some special char along with the SELECT statement. To rectify this type of error, we replaced these special chars with the blank and we also removed extra spaces from the queries.
  4. In the fourth case, we identified queries containing significant elements such as [“t1.”, “t2.”, “t3.”, “T1.”, “T2.”, “T3.”] to select columns, where proper column specifications were lacking. This significance occasionally led to errors such as “column not found” or syntax-level errors. To address this issue, we replaced these characters with ‘*’, resolving the error.
  5. In the fifth case, we noticed that the LLM-generated SQL queries included extraneous elements, such as explanations, answers, and other significant words that were not intended to be part of the query. Upon identifying these cases, we took the necessary steps to remove or separate these values from the SQL queries, resulting in the extraction of only the SQL query itself.

By using this query correction service, we can decrease errors, which you’ll notice has a big effect, as shown below.

Error analysis after query correction

To understand the query correction service’s significance more, we calculated the two types of execution accuracy (without query correction) and post-processing execution accuracy (with query correction). The table below shows a significant improvement using the query correction service.

Evaluation score analysis

Conclusion

Query correction service translates to a more reliable Text-To-SQL process, ultimately delivering trustworthy results, a smoother user experience, and better compatibility with real-world databases. Through continuous improvement, our service remains instrumental in bridging the gap between natural language questions and powerful SQL databases.

Ready to take your NL2SQL models to the next level? Explore QueryCraft’s pipeline.

Follow Towards Generative AI for more on the latest advancements in AI.

--

--