SALI: Turning Natural Language Questions to Insights

Nimrodbusany
Labs Notebook
Published in
6 min readMar 21, 2024

By: Dr. Nimrod Busany, Gil Rosenblum, Prof. Dr. Ethan Hadar

A screenshot from SALI, showcasing its intuitive interface: a visualized data model of the underlying information system alongside a chat-bot for generating queries, reports, and data visualizations using text.

Companies have long leveraged data for insights and decision-making. Now, the surge of tools like ChatGPT, AI co-pilots, and development frameworks allows companies to take their data use a step further, applying AI to fundamentally reshape how they work.

However, the very opportunity presented by this wealth of data also creates a challenge. Companies are generating and collecting unprecedented amounts of data, leading to increased complexity. This data is often distributed across various sources and exists in diverse formats. As a result, the act of finding, extracting, and processing the necessary information becomes a significant challenge, often requiring the use of different query languages based on the underlying systems.

This data complexity creates a significant barrier to effective AI application development within the realm of structured and semi-structured information systems. Without a way to easily navigate this landscape, the process of generating new insights or developing production-ready AI applications can become slow and unreliable. This can lead to missed opportunities, suboptimal decisions, and a failure to fully harness the power of AI-driven insights.

Accenture Labs created SALI to address these challenges specifically within structured and semi-structured information systems. SALI empowers users by extending the semantic data catalog, enabling them to transform natural language questions into actionable data reports and insights. By seamlessly navigating the complexities of diverse data sources and formats, SALI provides a critical foundation for AI application development.

What is SALI?

SALI stands for Semantic Abstraction Language Interface. It is a powerful business insights co-pilot that allows users to write queries in natural language, have them translated into executable queries, validate, and execute them to generate instant value. SALI eliminates the need for users to have knowledge of query languages or the need for an in-depth understanding of the available data. SALI provides access to insight for non-technical users and increases the productivity of experienced data professionals. Before we dive deeper into SALI, let’s briefly discuss the semantic data catalog technology upon which SALI is built.

The Semantic Data Catalog

Traditional data catalogs often do not include a semantic representation of the cataloged data. Therefore, the data, although readable to technical users, is not cataloged in a way that’s meaningful to none-technical users. This results in catalogs that offer their audience limited capabilities when it comes to discovering and understanding data concepts. Search capabilities are mostly restricted to simple string-matching techniques, which can lead to missing potentially useful data assets.

A semantic data catalog, on the other hand, understands the meanings of concepts and the relationships between them to provide more accurate, contextual, and relevant search results. It does so by leveraging the power of ontologies, ontology embeddings, and vector search to improve data discovery and management. The semantic data catalog also assists in the organization and classification of data assets, leading to a more structured and consistent data ecosystem.

How Does SALI Work?

SALI leverages the power of a semantic data catalog to provide seamless data access and advanced analysis capabilities. Users simply write questions in plain text, and SALI generates executable queries that match those questions accordingly. SALI also allows for the combination of information from multiple data sources, even when each source has its own unique query language.

SALI is particularly useful when users are unfamiliar with specific query languages or need to work with multiple, disparate data sources. By abstracting away complexities, SALI provides a unified natural language interface for data querying across various platforms.

Behind the scenes, SALI takes the following steps to generate a valid query that can be executed at the underlying information system:

  • Understands the user’s question and searches for relevant data in the semantic data catalog such as relevant database tables.
  • Prompts an LLM to generate an executable query in the desired query language (SQL, SPRAQL, etc.) that would fetch the data that matches the user’s intent.
  • Validates the executable query syntactically, and semantically against the data model to eliminate hallucinations and increase success rate. If the validation fails, SALI repeats the query generation process with the newly provided validation feedback.
  • Checks that the executable query adheres to the data governance policies (e.g., access permission, read only) and outputs the executable query to the user.
High-Level Query Generation Flow

The result of this process is an executable query that fits a specific model and information system, that users can execute and get results at click of a button.

Beyond “JUST” Query Generation

[Explain] As queries are not always comprehensible to end users, it is often helpful to generate and display a natural language explanation of the generated query. To this end, SALI instructs a generative language model to explain the query in a natural language. The style of the desired response can be easily customized via prompt engineering. From a shorter and friendly style suitable for chatbots, to longer and more accurate response for business analysis using co-pilot systems.

The figure below shows an example of a generated query and a natural language explanation to that executable query produced by SALI. In the example, the user asks “Please provide the total amount of earnings per product sold in Euro”, and SALI returns with the generated query with an explanation just below it.

Query and an Explanation generate by SALI

[Visualize] Finally, SALI can use AI to produce data visualization that best fit the generated report. Below is a snapshot of a data visualization produced by SALI.

Here, SALI prompts the LLM with a user description of a desired analysis, a snippet of the query execution results, the generated query, and corresponding data model. Showcasing again the power of SALI to enrich the prompt with additional useful information for the LLM to decide on the right visualization.

A visualization generated by SALI

Benefits of SALI

SALI offers several benefits for users and organizations:

  1. Ease of Use: With SALI, users can write queries in natural language, eliminating the need to learn complex query languages. This makes it easier for non-technical users to interact with databases and retrieve the information they need.
  2. Cross-Platform Querying: Being able to generate various (query) languages, SALI abstracts away the differences in query languages and provides users a unified interface for querying data across different platforms.
  3. Improved Query Accuracy and Speed: By leveraging the semantic data catalog’s knowledge graph, SALI can quickly find relevant data from across the catalog, and generate accurate queries that consider the meanings and relationships of concepts. This leads to more relevant data reports and reduces the chances of missing important data assets.
  4. Beyond Queries: SALI uses AI to explain queries, and even come-up with data visualization to provide further insights.
  5. Comply with governance policies: SALI ensures that generated queries adhere to data governance policies. It enforces the user’s role and access privileges to restrict access to sensitive data and ensure compliance with data privacy regulations.
  6. Federated Queries: When integrated with a knowledge graph platform, like Stardog, SALI can leverage established connections to federated and (Tabular, Semi-Structured, and even Unstructured), and generate SPARQL queries over virtual knowledge graphs. Letting companies gain access to insights without having to duplicate, maintain, and pay extra storage for required data.

Conclusion

In this blog post, we explored SALI, the semantic abstraction language interface that turns natural language into data reports. We discussed how we added query generation capabilities on top of the semantic data catalog to achieve cross-platform query generation capabilities.

SALI offers a user-friendly and efficient way to interact with databases and retrieve the information you need. By leveraging the power of ontologies and vector search, SALI provides accurate and relevant reports to business questions, while ensuring data governance and compliance.

If you are looking to improve your data management strategy and simplify the querying process, consider SALI. It can help you unlock the full potential of your data assets and drive success in your organization.

Many thanks to:

Lead Developers: Hananel Hadad, Zofia Maszlanka

Tech-Vision: Dan Klein

Editing: Eslin Cemal

And the extended Data Team at Accenture Labs!

--

--