SuperKnowa- Simplest Framework Yet to Swiftly Build Enterprise RAG Solutions at Scale
Releasing a framework to quickly build your prod-ready RAG (Retriever Augmented Generation) pipeline. A higher abstraction open framework that can glue RAG components like Lego pieces; to tailor retriever, re-ranker, Model Eval Kit, RLHF etc. for any GenAI applications.
Enterprise LLMs is Different
Ever since LLM became a rage everyone is exploring the unlimited potential it has for enterprise applications. In order to apply the power of LLM to the private dataset, the RAG (Retriever Augmented Generation) pipeline is often used. RAG does not create a new pre-train model, but rather passes the relevant context as an input to Foundational Model.
Creation of an RAG PoC on a small dataset by putting together boxes & passing an input to LLM API is no rocket science and even an eighth grader can do it now. However, taking that PoC to production is an altogether different ball game. What can be acceptable for ChatGPT will be totally unacceptable for enterprise LLMs. You do not have the option to train your model from scratch (like OpenAI) on your private data and with just in-context learning getting good results is non-trivial. And how to create reliable, accurate & scalable RAG pipeline is not a problem anyone has fully solved.
What you can do in an example of a few documents in RAG becomes an unmanageable AI engineering challenge with a few million documents. Your RAG must pipeline must scale in terms of diversity of source, accuracy, and response time as it grows.
- The Enterprise use cases often deal with a diversity of data formats like books, pdfs, semi-structured documents, relational databases, etc.
- Enterprise GenAI applications must not hallucinate. RAG pipeline is “grounded” in reality meaning the context passed to the model from the source, however eliminating & measuring hallucination is a challenge.
- Even though the source is grounded, as the scale grows it becomes a challenge to find relevant output, needing a variety of indexes or vector search optimization or re-ranker algorithm.
- Yet another unsolved problem is the Evaluation of the LLM applications. There is no single statistical metric to measure the accuracy or even a reliable one. And tracking experiments is another challenge.
- The Enterprise LLMs need a mechanism to measure the alignment of AI models by scientifically capturing human feedback for AI helpfulness, harm, and rejected answers.
- Lastly, every production-grade application needs a mechanism to debug and monitor the performance of the solution.
No matter the GenAI use case, the above RAG challenges remain the same. Generative AI is all about scale. As the scale grows, so does complexity, cost, skills needed, and challenge to get the right accuracy.
SuperKnowa
To solve these challenges, we are releasing SuperKnowa. The simplest framework yet to quickly build your prod-ready pipeline. With a simple yaml file config components of RAG like lego pieces in a completely open framework powered by the watsonx platform.
Scale your GenAI Like Lego Pieces
SuperKnowa is designed to take the ‘engineering’ complexity out of GenAI so that the team can focus on the ‘AI’ part. Built on the top of watsonx.ai, this open framework is tested on a scale of diverse 1M documents to 200M documents to improve retriever searches and evaluate models for hallucinations.
Connect to your favorite ecosystem and glue Lego pieces across the RAG lifecycle of model development, data pipelines, and much more. Easily extend to watsonx. data for federated presto queries as well as watsonx. governance to measure the trustworthiness of Generative AI solutions.
Putting RAG on Steroids
SuperKnowa comes with tooling to offer higher level abstraction to build RAG solutions and at the same completely open to going as deep as needed to refine your RAG.
- AI Alignment Tool that captures human feedback about helpfulness, harmfulness, and relevancy ranking of provided answers
- LLM Eval Toolkit with all statistical metrics to compare model outputs, log experiments and visualize results using charts & leaderboard
- Deployed on RedHat OCP (OpenShift Containerized Platform) and can be ported to any Kubernetes env on any cloud or on-prem
- Retriever component to find the right context using ColBERT re-ranker, blended query indexing to optimize the vector search
- Indexing options using Solr, ElasticSearch, Watson Discovery
- Boilerplate to Fine-Tune LLaMA-2 & for in-context learning using FALCON, FLAN & other open source LLMs
Using SuperKnowa Framework
SuperKnowa framework is powered by watsonx can run anywhere, on-premise, or on any cloud.
The whole RAG pipeline can be easily configured using a .yaml config file with minimal effort needed to get into murky details. At the same time it is completely open to debug & enhance as needed for experts; delivering faster time to value. It comes with
SuperKnowa framework offers below configurable tools,
- Source Indexing
Index your private documents using Solr, Watson Discovery, or Elastic Search
Detailed Blog-
2. Neural Retriever
Improve the quality of answers with the right context for LLMs using tailored queries on Elastic Search Retriever & ColBERT re-ranker
Detailed Blog
3. In-Context Learning using LLMs (LLAMA, SLATE, FALCON, etc)
In-context learning is at the heart of RAG architecture and this boilerplate can call open source models. Here is a detailed blog to understand in-context learning.
4. Fine-tuning FALCON & LLAMA2 using QLORA
Notebooks to prepare instruct DB and fine tune FALCON-7B and LLAMA2–7B. This detailed blog explains the steps.
5. AI Alignment with Human Feedback Tool
While some answers can be generated using steps 1–4, the biggest challenge is to measure & improve quality of those generated answers to humans. AI Model should follow instructions (helpful) while not generating false (hallucinate) or unethical (harmful answers). It is also important to measure quality of various answers by humans and rank them. So an AI Alignment tool is often used as part of process.
Analyse the results using an interactive dashboard.
Detailed Blog-
6. RLHF (Reinforcement Learning with Human Feedback)
There is no replacement for human feedback to align LLM to human instructions. The way to make model learn it is by RLHF.
This detailed blog decribes the RLHF implementation as well as reward model training.
7. LLM Eval Toolkit
Trying various models with various dials of RAG Pipeline means hundreds of experiments across dozens of statistical metrics. To manage it can become a nightmare.
Thus SuperKnowa framework has LLM Eval Tookit integrated with MLFlow to automatically log experiments from notebooks and generate a leaderboard across all the statistical metrics like BLUE, ROUGE, METEOR, SIMHash etc.
Detailed Blog —
8. Debugging the RAG Pipeline
You can always expect that first iteration won’t give the best results. The job is then to find if its retriever or re-ranker or model or prompt which went wrong. Another reason is to log the response time, to get faster responses. The Debugging pipeline is designed for it.
Detailed Blog-
9. Deploy & Infer
Deploy this RAG pipeline locally or on RedHat Openshift and take to any cloud provider or Kubernetes env.
The GenAI Use Cases Built using SuperKnowa
Using SuperKnowa, my team was able to quickly create new LLM applications in a matter of days and also take it to enterprise customers for on-field evaluation to take it to prod. The code shelf of these is available on GitHub to plug & play.
1. Conversational Q&A on Private Knowledge Base
Engage in natural language conversations with SuperKnowa’s conversational Question & Answer (Q&A) system. Ask questions based on the private enterprise knowledge base, and receive detailed, context-aware responses.
2. Ask Your Pdf/Documents
Leverage SuperKnowa’s “Ask your documents” feature to unlock the potential of your PDFs and text documents. SuperKnowa can help you extract relevant information, answer specific questions, and assist in information retrieval.
3. Summarisation
Effortlessly generate coherent and informative summaries with SuperKnowa’s summarization feature across large text corpus using FlanT5 and UL2. Extract the main points and essential details from articles, reports, and other texts, allowing for efficient content comprehension.
4. Key Points from your PDF
SuperKnowa’s abstractive summarisation feature goes beyond simple extraction using FlanUL2, and LLAMA2. It can analyze lengthy PDF documents and generate concise abstractive summaries, capturing the essence of the content. Additionally, SuperKnowa identifies key points, making it easier to comprehend and communicate complex information.
5. Text to SQL
Experience the power of SuperKnowa’s Text-to-SQL capability, which transforms natural language queries into structured SQL queries. Interact with databases using plain language, eliminating the need for expertise in SQL.
Future Enhancements
We will continue to add more enterprise use cases to the boilerplate shelf. The biggest enhancement we are working towards is a recommendation system on the retriever part which can use the right of querying mechanism in Elastic Index (like ELSER with a blended query) to give better results quickly.
Also, we will add modules to integrate with federated data systems and a component to govern the RAG pipelines using AI Factsheets & OpenPages.
Follow Towards Generative AI for more on the latest advancement in AI.