Leveraging GenAI to Query Data in Enterprises
The Need
The imagination of business folks in enterprises has run wild since ChatGPT was made available to the world. They have wanted the freedom for long from the extended wait times told to them by business analysts, data analysts, and data engineers to provide them with insights. “If only I could get a tool to query data in natural language and put it into a chart to present their insight!” told one such marketing executive.
While in new age organizations, such a marketing executive could learn SQL to attain independence, enterprises are populated with ‘specialists’. Having such a tool would also help them circumvent learning SQL as an interface, hypothetically, if they would have done so.
The Industry Trends
However, CIOs have become extremely cautious about using tools like ChatGPT to empower the employees in their enterprise to query their data in natural language. Many organizations like Samsung had to ban the usage, given data leaks. To allay these fears, OpenAI has recently launched its enterprise edition. However, the industry has skepticism about any hidden terms and conditions.
Meanwhile, companies like Meta have open-sourced their foundational LLM models, LLaMA, giving a lot of momentum to on-premise deployment. No wonder this led to a leaked Google memo titled, “We Have No Moat, And Neither Does OpenAI”. With this, with easily available computational power( read that of a laptop), models could be fine-tuned, something which previously was the reserve of Google, OpenAI, etc. Finetuning is the process of taking a pre-trained LLM and further training it on a smaller, specific dataset to adapt it for a particular task or to improve its performance.
While fine-tuning a foundational model, that is purpose-built, has come in the reach of any organization, it is not a solution that is imperative. There are other techniques, like Retrieval-Augmented Generation (RAG), that can help in increasing the model performance by passing “context” to the LLM prompt. RAG combines a retriever system, which fetches relevant information contained in document snippets from a large corpus and passes it to an LLM, which produces answers using the information from those snippets.
Ideally, the technique that should be leveraged should match application goals. Fine-tuning fits closed domains with fixed data. RAG suits open domains with evolving knowledge. For transparency, RAG provides explainable retrieval. To minimize hallucination risks, RAG grounds responses in evidence. Choosing the right fine-tuning versus RAG balance unlocks the full potential of large language models in an application context. Hybrid approaches optimize for cost, speed, and robustness. We should choose an approach that ultimately leads to better system performance and user experience in the application.
Our Organizational Context
- Our organization has an on-premise data organization due to strategic and regulatory imperatives.
- We are in the process of introducing a modern data stack-based data platform in our organization. With it, we are capturing a lot of metadata in our metadata management platform (catalog).
- We also have an on-prem deployment of a licensed platform offering vector as well as relevancy search. The platform also offers an out-of-domain model for semantic search-based retrieval.
Querying Data with GenAI: Our Approach
There can be multiple approaches to creating such an application. However, we are following the RAG-based approach. The choice was made to reduce ( remove, in the best case)the hallucination risks and have some explainability so that we can customize the responses.
In addition to the metadata management and semantic retrieval platform, we are using a language model integration framework to create our workflows as well as an application creation framework.
The only choice that we were left with was that of LLM. Given that we are only using metadata, and that we wanted to get started quickly, we were able to get the requisite approvals to use ChatGPT. However, we may revisit the choice in the future given that Llama-7b and 13b fine-tuned models outperform the 70b-chat and GPT-4 models.
Finally, if we have to provide our users to translate numbers into insights as well in the future, we may have to invest further in our on-prem LLM capability, with the help of open source.
Initial Results
While the semantic retrieval engines and LLMs are evaluated on multiple corpora of data, our evaluation framework is based on simple ‘Yes’/’No’. With major efforts going to augment retrieval, we are able to get respectable results on simple table or join queries. There are edge cases that have arisen due to data modeling issues ( dimensional modeling v/s one big table). Currently, while we have been able to restrict hallucination, we believe a business analyst would still be required to validate the queries. The dream of true independence has still not been achieved.
(The post will see updates basis our progress).