Building Your Own Text-to-SQL: Steps And Requirements

Wangda Tan
Waii
Published in
6 min readDec 13, 2023

The landscape of Language Learning Models (LLMs) is rapidly evolving and access to extremely powerful models has become commonplace. Initial demos of text-to-SQL on LLMs have made it look tantalizingly simple to build production systems containing this and similar features.

Many are beginning to realize, however, that transforming an LLM into a fully-fledged, production-ready text-to-SQL system is still a substantial engineering feat that requires dedicated long-term investment.

For businesses where SQL generation is not a core competency, partnering with a specialized provider (such as Waii ) is a strategic and efficient approach. This blog explores why DIY might be a lot more difficult than expected and why it might be better to go a different route.

LLM as a starting point

While LLMs offer a solid base for text-to-SQL systems, they represent just the beginning. The journey from an LLM to a production-ready system involves extensive development and refinement.

Validation & benchmarking. Recognition is the first step to change. Underpinning all of the work that goes into building and maintaining a text-to-SQL system is a way to give you reliable information about how well the system represents your user’s intent. It’s very easy to chase your tail making changes that seem right or anecdotally show improvements. Getting this right requires a lot of effort in curating datasets, defining metrics, and building test infrastructure.

Knowledge-graph: Retrieval-augmented generation (RAG) is widely accepted as the gold-standard architecture for building a text-to-SQL system based on LLMs. However, what do you retrieve? There is a ton of experimentation that goes into deciding what information is critical in the query generation and how to represent it. Join graphs, table relationships, functional dependence between columns, comments, statistics, samples, etc.

Once you have decided on what you need to use, you have to build a system that stores, updates and maintains this information. That in itself is a daunting task.

Discovery: Except for the smallest data sets, the knowledge graph won’t fit into the context window. Even if it does, too much information will likely lead to poor retention and instruction following. That means you have to choose the information used for each query. Which tables, columns, examples, documentation, etc. All this requires additional subsystems that need their own benchmarking infrastructure and information model to be successful.

SQL coverage. People have been adding functionality to the SQL standard for 40 years. And while your application might not require all the dark corners of the standard, you will likely need to develop a large base of functioning SQL capabilities. User-defined functions (UDFs), stored procedures, table functions, semi-structured data, temp tables, and spatial data,… Nearly all of it requires special treatment in the application layer in addition to the LLM. This requires managing information about these objects in the knowledge graph, specific discovery methods, and a syntax checker/compiler to validate that these objects have been applied correctly.

Customization: LLMs will give you the most likely response, but that’s not necessarily the right response for your environment. Real-world systems require additional information about the business to be integrated into the knowledge graph. Business definitions, common expressions and metrics (semantic layer), documentation, etc. This information again requires the implementation of a data life-cycle as well as specific validation methods.

Feedback and continued improvement: Everyone has a plan until they get punched in the mouth. How do you handle problems reported by users? You need to be able to determine what worked and what didn’t and decide how to best incorporate the feedback. Few-shot learning, fine-tuning, and post-processing are just a few options you need to have to be able to respond in time.

This is an incomplete list of some of the core things to provide a usable text-to-SQL system. You will likely need a minimum of 4–6 engineers to build this out over 9–12 months and unless your database/warehouse is extremely static, at least half of this team is required to continually maintain it.

Ok. So that’s it right? Well, good news / bad news. The bad news is that you will need more engineering, the good news is that the following items are more traditional distributed/backend engineering work and require less specialized engineering knowledge.

System engineering

Security and Compliance: Implementing data encryption, user management, and access controls is a critical part of getting the system right. Role-based access and detailed audit trails are also crucial for meeting compliance standards. Since you are building and maintaining a knowledge graph to support query generation, you have to define and enforce access policies for it. Which user is generating the query and what information are they allowed access for the generation?

Scalability: The goal is to serve as many users as possible with your architecture, and that means you have to be prepared: You will need a scale-out architecture, that’s easily and repeatably deployed. You will need guardrails to protect your database from too many, or bad / resource-intensive queries. You will need to carefully manage quotas and handle rate limits for LLMs, including building a caching layer to avoid wasted generation requests.

Redundancy and Reliability: Ensuring high availability and failover capabilities are essential for maintaining continuous business operations. That of course touches all components of your system as well as its dependencies. You have to handle outages in the query generation layer, and data storage layer, but also deal with availability issues for LLMs. These latter ones are unfortunately quite common and require a robust failover/fallback strategy.

Proactive Monitoring and Maintenance: Detecting and addressing issues proactively is key to maintaining high performance and accuracy. LLMs and query generation are non-deterministic by nature and detecting service degradation can be extremely hard. Knowing that your model accuracy has dropped slightly because of something going wrong in the knowledge graph, for instance, is critical but very difficult to implement.

Adapting to Evolving Data Landscapes: Your database schema is likely changing constantly, and as a business grows you typically end up with multiple different DBMS that have to work together. You have to stay on top of that as well.

Again, this is not an exhaustive list and will require at least 2–3 engineers to handle well on a continuous basis. This is in addition to the core system.

Conclusion

Just like “Hidden Technical Debt in Machine Learning Systems”, the core of text-to-SQL is just a small fraction of a complete system.

Building a comprehensive text-to-SQL system is a significant engineering endeavor that extends well beyond the capabilities of a basic LLM. It requires a deep commitment to system development, operational excellence, security, and continuous improvement. The investment in year one is likely going to be well over $1 million, with subsequent years at best seeing a slight decrease in the required continued headcount and investment.

What’s worse is that this area is not going to be a differentiating factor for most. The S in SQL stands for standard. Ok, that’s not true, but SQL is (almost) a standard language: Because of this it likely makes sense to rely on a partner, such as Waii. Their SQL is the same as yours.

Then, instead of building text-to-SQL, you can then set your sights on using GenAI to change the things that are bespoke to your business and make a much more impactful investment for your customers!

--

--