Benchmark Report: The Role of Large Language Models in Enterprise SQL Database Queries

The data.world team
data.world
Published in
6 min readNov 21, 2023

--

Introducing the data.world AI Lab and our research on Knowledge Graph and LLMs

By Dean Allemang and Juan Sequeda

The Knowledge Graph Difference

The year 2023 will go down in history as the time when a new generation of AI changed nearly every aspect of our lives. While generative AI has people rethinking everything from software engineering to UX design, we at data.world see particular opportunities in the areas of enterprise data management, data accessibility, and governance. As a leader in enterprise data management, in February of 2023, we established the data.world AI lab, a first for the data catalog industry.

Dr. Juan Sequeda is the Head of the data.world AI Lab, known for his pioneering work in technology to building and designing knowledge graphs from relational databases. He is joined by Dr. Dean Allemang, author of Semantic Web for the Working Ontologist and long-time evangelist for Semantic Web technology and large scale data management. data.world’s CTO Bryon Jacob rounds out the core of the AI lab.

The AI Lab has a simple mission: to explore how AI impacts the productivity and efficiency of everyone who works with data. This mission impacts data professionals (stewards, data managers, data scientists, analysts, application developers) as well as people who might not think of themselves as data professionals; sales executives, healthcare professionals, HR, marketing; anyone who relies on data to do their job better.

The focus of the AI lab is innovation: what functions can AI provide in data management that have been problematic or impossible so far? There are many examples of innovations that AI promises; automation of knowledge engineering, data documentation, data mapping, quality control, question answering, data oversight and many others. But the AI lab also focuses on quality of results. The earliest applications of Generative AI were notoriously fraught with quality issues; most of these have been wrapped up in the catch-all complaint of “hallucinations” — occasions where generative AI provides, with complete confidence, erroneous information. The AI lab combines research on innovation and quality to produce AI-assisted applications that will have real enterprise impact.

Our strategy hinges on fostering a collaborative environment with our customers and partners: co-innovation. Our goal is to stimulate innovation, cultivate new ideas, and validate hypotheses through prototyping and experimentation. Furthermore, we aim to ensure these concepts flourish through open dialogue and consultation. Through this, we strive to accomplish two main objectives. First, we endeavor to showcase the vast possibilities of incorporating Artificial Intelligence with the existing data.world platform in the present day. Second, we are committed to taking substantial, calculated risks that contribute towards transcending what is currently feasible, always aiming to stretch the limits of the data.world platform. This ongoing push to challenge boundaries is always conducted with an unwavering focus on maximizing value for our customers and partners.

Since our launch, we have been focusing on the particular problem of question answering, or, as we prefer to think of it, “Chat with your data” that is stored in SQL databases. Allowing a user from any part of the business to engage with data in interactive and dynamic ways. This research involves several aspects, including user experience (how should a user interact with the system, while they “Chat with their data”?), data access (how do we go from natural language to an appropriate query to the data source), and accuracy (how do we have sufficient confidence that the query we are submitting responds appropriately to the intent of the user?).

We’ll be reporting milestone results on the data.world blog and through other channels, but we’ll use this blog to track the technical progress of the research; the goals, the progress, as well as the challenges and setbacks. This is where we get down and dirty with the data.

We recently published our first research result about the accuracy of Chat with your data on SQL databases with and without knowledge graphs: A Benchmark to Understand the Role of Knowledge Graphs on Large Language Model’s Accuracy for Question Answering on Enterprise SQL Databases

Is this really going to work, or are we barking up the wrong tree? In the rest of the post, we will provide an overview of the benchmark and subsequent posts will dive into the details.

Benchmark report: The role of Knowledge Graphs for LLM accuracy in the Enterprise

The recent surge in AI development means enterprises are exploring the potential for Large Language Models (“LLMs”) to drive business value. A popular application is question answering systems, aka chat with the data. Being able to chat with data residing in SQL databases provides tremendous potential for transforming the way data-driven decision making is executed within enterprises. But challenges persist, notably in the form of “hallucinations” — instances where LLMs generate false information perceived as truth.

Academically, question answering systems, namely Text-to-SQL approaches, have shown remarkable performance in several benchmarks such as Spider, WikiSQL, KaggleDBQA. However, there is a disconnect with the enterprise because they lack: 1) complex enterprise database schemas, 2) enterprise questions crucial for operational and strategic planning and 3) a context layer connecting the data with the business.

Knowledge Graphs (KGs) have been identified as a promising solution to fill the business context gaps in order to reduce hallucinations. The integration of LLMs and KGs has already started gaining traction in academia and industrial research. Even Gartner states, “Knowledge graphs provide the perfect complement to LLM-based solutions where high thresholds of accuracy and correctness need to be attained.”

The goal of our research is to understand the accuracy of LLM-powered question answering systems with respect to enterprise questions, enterprise SQL databases and the role knowledge graphs play to improve the accuracy. We investigate the following two research questions:

RQ1: To what extent Large Language Models (LLMs) can accurately answer enterprise natural language questions over enterprise SQL databases.

RQ2: To what extent Knowledge Graphs can improve the accuracy of Large Language Models (LLMs) to answer enterprise natural language questions over enterprise SQL databases.

The hypothesis is the following: An LLM powered question answering system that answers a natural language question over a knowledge graph representation of the SQL database returns more accurate results than a LLM powered question answering system that answers a natural language question over the SQL database without a knowledge graph. We first created a benchmark consisting of:

  1. Enterprise SQL Schema: using the OMG Property and Casualty Data Model in the insurance domain
  2. Enterprise Question-Answer: 43 natural language enterprise questions falling on two spectrums:
    ​ ​ ​​​ ​ ​​​ ​ ​​​ A) low to high question complexity, pertaining to business reporting use cases to metrics and Key Performance Indicators (KPIs) questions
    ​ ​ ​​​ ​ ​​​ ​ B) low to high schema complexity, which requires a smaller number of tables to larger number of tables to answer the questions.

The two spectrums formed a quadrant in which questions could be classified: Low Question/Low Schema, High Question/Low Schema, Low Question/High Schema, and High Question/High Schema.

3. Context Layer: The ontology describing the business concepts, attributes, and relationships of the insurance domain, and the mappings from the SQL schema to the ontology. The ontology and mappings were used to create a Knowledge Graph representation of the SQL database.

4. Accuracy scoring based on Yale Spider benchmark

Main Results

Using GPT-4 and zero-shot prompting, the overall Knowledge Graphs accuracy was 3x the SQL accuracy. Questions over the SQL databases achieved 16.7% accuracy, which increased to 54.2% with a Knowledge Graph representation of the SQL database. For each quadrant:

  • Low Question/Low Schema, knowledge graph accuracy was 71.1% while the SQL accuracy was 25.5%
  • HighQuestion/Low Schema, knowledge graph accuracy was 66.9% while the SQL accuracy was 37.4%
  • Low Question/High Schema, knowledge graph accuracy was 35.7% while the SQL accuracy was 0%
  • High Question/High Schema, knowledge graph accuracy was 38.7% while the SQL accuracy was 0%

Conclusion

These experimental results are evidence that supports the main conclusion of this research: investing in Knowledge Graph provides higher accuracy for LLM powered question answering systems.

If enterprises want to achieve higher accurate results in an LLM powered question answering system, they must treat the business context and semantics as a first class citizen and invest in a data catalog platform with a knowledge graph architecture.

Next Blog series

In the following blog series we will be covering the following

  • Developing a QA System: LLM, Knowledge Graph and RAG
  • WHY we need a benchmark
  • Details behind the benchmark and results
  • The AI ready playbook: knowledge engineering in the era of LLMs

We look forward to engaging with all the readers!

-data.world AI Lab

--

--