SuperKnowa- Simplest Framework Yet to Swiftly Build Enterprise RAG Solutions at Scale

Releasing a framework to quickly build your prod-ready RAG (Retriever Augmented Generation) pipeline. A higher abstraction open framework that can glue RAG components like Lego pieces; to tailor retriever, re-ranker, Model Eval Kit, RLHF etc. for any GenAI applications.

Published in

Towards Generative AI

9 min readAug 18, 2023

Enterprise LLMs is Different

Ever since LLM became a rage everyone is exploring the unlimited potential it has for enterprise applications. In order to apply the power of LLM to the private dataset, the RAG (Retriever Augmented Generation) pipeline is often used. RAG does not create a new pre-train model, but rather passes the relevant context as an input to Foundational Model.

Creation of an RAG PoC on a small dataset by putting together boxes & passing an input to LLM API is no rocket science and even an eighth grader can do it now. However, taking that PoC to production is an altogether different ball game. What can be acceptable for ChatGPT will be totally unacceptable for enterprise LLMs. You do not have the option to train your model from scratch (like OpenAI) on your private data and with just in-context learning getting good results is non-trivial. And how to create reliable, accurate & scalable RAG pipeline is not a problem anyone has fully solved.

What you can do in an example of a few documents in RAG becomes an unmanageable AI engineering challenge with a few million documents. Your RAG must pipeline must scale in terms of diversity of source, accuracy, and response time as it grows.

The Enterprise use cases often deal with a diversity of data formats like books, pdfs, semi-structured documents, relational databases, etc.
Enterprise GenAI applications must not hallucinate. RAG pipeline is “grounded” in reality meaning the context passed to the model from the source, however eliminating & measuring hallucination is a challenge.
Even though the source is grounded, as the scale grows it becomes a challenge to find relevant output, needing a variety of indexes or vector search optimization or re-ranker algorithm.
Yet another unsolved problem is the Evaluation of the LLM applications. There is no single statistical metric to measure the accuracy or even a reliable one. And tracking experiments is another challenge.
The Enterprise LLMs need a mechanism to measure the alignment of AI models by scientifically capturing human feedback for AI helpfulness, harm, and rejected answers.
Lastly, every production-grade application needs a mechanism to debug and monitor the performance of the solution.

No matter the GenAI use case, the above RAG challenges remain the same. Generative AI is all about scale. As the scale grows, so does complexity, cost, skills needed, and challenge to get the right accuracy.

SuperKnowa

To solve these challenges, we are releasing SuperKnowa. The simplest framework yet to quickly build your prod-ready pipeline. With a simple yaml file config components of RAG like lego pieces in a completely open framework powered by the watsonx platform.

Scale your GenAI Like Lego Pieces

SuperKnowa is designed to take the ‘engineering’ complexity out of GenAI so that the team can focus on the ‘AI’ part. Built on the top of watsonx.ai, this open framework is tested on a scale of diverse 1M documents to 200M documents to improve retriever searches and evaluate models for hallucinations.

Glue your RAG components like Lego Pieces

Connect to your favorite ecosystem and glue Lego pieces across the RAG lifecycle of model development, data pipelines, and much more. Easily extend to watsonx. data for federated presto queries as well as watsonx. governance to measure the trustworthiness of Generative AI solutions.

Putting RAG on Steroids

SuperKnowa comes with tooling to offer higher level abstraction to build RAG solutions and at the same completely open to going as deep as needed to refine your RAG.

AI Alignment Tool that captures human feedback about helpfulness, harmfulness, and relevancy ranking of provided answers
LLM Eval Toolkit with all statistical metrics to compare model outputs, log experiments and visualize results using charts & leaderboard
Deployed on RedHat OCP (OpenShift Containerized Platform) and can be ported to any Kubernetes env on any cloud or on-prem
Retriever component to find the right context using ColBERT re-ranker, blended query indexing to optimize the vector search
Indexing options using Solr, ElasticSearch, Watson Discovery
Boilerplate to Fine-Tune LLaMA-2 & for in-context learning using FALCON, FLAN & other open source LLMs

Using SuperKnowa Framework

SuperKnowa framework is powered by watsonx can run anywhere, on-premise, or on any cloud.

The whole RAG pipeline can be easily configured using a .yaml config file with minimal effort needed to get into murky details. At the same time it is completely open to debug & enhance as needed for experts; delivering faster time to value. It comes with

Build your RAG with a single config file

SuperKnowa framework offers below configurable tools,

Source Indexing

Index your private documents using Solr, Watson Discovery, or Elastic Search

Detailed Blog-

The Significance of Data Indexing in the Retrieval Augmented Generation Pipeline

Understanding the power of data indexing and mapping in different sources to enhance the results from the pipeline.

medium.com

2. Neural Retriever

Improve the quality of answers with the right context for LLMs using tailored queries on Elastic Search Retriever & ColBERT re-ranker

Detailed Blog

Improving RAG (Retrieval Augmented Generation) Answer Quality with Re-ranker

Implementing the Re-ranker algorithm in the RAG pipeline

medium.com

3. In-Context Learning using LLMs (LLAMA, SLATE, FALCON, etc)

In-context learning is at the heart of RAG architecture and this boilerplate can call open source models. Here is a detailed blog to understand in-context learning.

Understanding In-Context Learning for Language Models

In-context learning for RAG (Retrieval Augmented Generation) pipeline

medium.com

4. Fine-tuning FALCON & LLAMA2 using QLORA

Notebooks to prepare instruct DB and fine tune FALCON-7B and LLAMA2–7B. This detailed blog explains the steps.

Quantized Fine-tuning LLaMa-2 for RAG Question Answering

QLoRA fine-tuning LLaMa-2 on an instruct-based dataset, prompt engineering, and evaluation.

medium.com

5. AI Alignment with Human Feedback Tool

While some answers can be generated using steps 1–4, the biggest challenge is to measure & improve quality of those generated answers to humans. AI Model should follow instructions (helpful) while not generating false (hallucinate) or unethical (harmful answers). It is also important to measure quality of various answers by humans and rank them. So an AI Alignment tool is often used as part of process.

Analyse the results using an interactive dashboard.

Detailed Blog-

Human feedback evaluation for RAG pipeline.

SuperKnowa framework for collection human feedback for the application of Retrieval-Augmented Generation (RAG)…

medium.com

6. RLHF (Reinforcement Learning with Human Feedback)

There is no replacement for human feedback to align LLM to human instructions. The way to make model learn it is by RLHF.

This detailed blog decribes the RLHF implementation as well as reward model training.

RLHF Reward Model Training

A popular technique to finetune large language models with human feedback is called reinforcement learning from human…

medium.com

7. LLM Eval Toolkit

Trying various models with various dials of RAG Pipeline means hundreds of experiments across dozens of statistical metrics. To manage it can become a nightmare.

Thus SuperKnowa framework has LLM Eval Tookit integrated with MLFlow to automatically log experiments from notebooks and generate a leaderboard across all the statistical metrics like BLUE, ROUGE, METEOR, SIMHash etc.

Detailed Blog —

LLM Evaluation Toolkit for RAG Pipelines

Introducing a novel SuperKnowa framework for evaluating LLM-based RAG pipelines and automatically generating…

medium.com

8. Debugging the RAG Pipeline

You can always expect that first iteration won’t give the best results. The job is then to find if its retriever or re-ranker or model or prompt which went wrong. Another reason is to log the response time, to get faster responses. The Debugging pipeline is designed for it.

Detailed Blog-

Debugging pipeline for RAG.

SuperKnowa offers a comprehensive debug pipeline to analyze performance and logging for a Retrieval-Augmented…

medium.com

9. Deploy & Infer

Deploy this RAG pipeline locally or on RedHat Openshift and take to any cloud provider or Kubernetes env.

The GenAI Use Cases Built using SuperKnowa

Using SuperKnowa, my team was able to quickly create new LLM applications in a matter of days and also take it to enterprise customers for on-field evaluation to take it to prod. The code shelf of these is available on GitHub to plug & play.

1. Conversational Q&A on Private Knowledge Base

Engage in natural language conversations with SuperKnowa’s conversational Question & Answer (Q&A) system. Ask questions based on the private enterprise knowledge base, and receive detailed, context-aware responses.

Q&A on Enterperise’s Private Knowledge Base

2. Ask Your Pdf/Documents

Leverage SuperKnowa’s “Ask your documents” feature to unlock the potential of your PDFs and text documents. SuperKnowa can help you extract relevant information, answer specific questions, and assist in information retrieval.

Upload and Ask questions on your documents

3. Summarisation

Effortlessly generate coherent and informative summaries with SuperKnowa’s summarization feature across large text corpus using FlanT5 and UL2. Extract the main points and essential details from articles, reports, and other texts, allowing for efficient content comprehension.

4. Key Points from your PDF

SuperKnowa’s abstractive summarisation feature goes beyond simple extraction using FlanUL2, and LLAMA2. It can analyze lengthy PDF documents and generate concise abstractive summaries, capturing the essence of the content. Additionally, SuperKnowa identifies key points, making it easier to comprehend and communicate complex information.

5. Text to SQL

Experience the power of SuperKnowa’s Text-to-SQL capability, which transforms natural language queries into structured SQL queries. Interact with databases using plain language, eliminating the need for expertise in SQL.

Future Enhancements

We will continue to add more enterprise use cases to the boilerplate shelf. The biggest enhancement we are working towards is a recommendation system on the retriever part which can use the right of querying mechanism in Elastic Index (like ELSER with a blended query) to give better results quickly.

Also, we will add modules to integrate with federated data systems and a component to govern the RAG pipelines using AI Factsheets & OpenPages.

Star SuperKnowa’s GitHub Repo Here

GitHub - ibm-ecosystem-engineering/SuperKnowa: This repository is intended for IBM Ecosystem…

This repository is intended for IBM Ecosystem partners. It contains pluggable components designed to tackle various…

github.com

Follow Towards Generative AI for more on the latest advancement in AI.

SuperKnowa- Simplest Framework Yet to Swiftly Build Enterprise RAG Solutions at Scale

Releasing a framework to quickly build your prod-ready RAG (Retriever Augmented Generation) pipeline. A higher abstraction open framework that can glue RAG components like Lego pieces; to tailor retriever, re-ranker, Model Eval Kit, RLHF etc. for any GenAI applications.

Enterprise LLMs is Different

SuperKnowa

Scale your GenAI Like Lego Pieces

Putting RAG on Steroids

Using SuperKnowa Framework

The Significance of Data Indexing in the Retrieval Augmented Generation Pipeline

Understanding the power of data indexing and mapping in different sources to enhance the results from the pipeline.

Improving RAG (Retrieval Augmented Generation) Answer Quality with Re-ranker

Implementing the Re-ranker algorithm in the RAG pipeline

Understanding In-Context Learning for Language Models

In-context learning for RAG (Retrieval Augmented Generation) pipeline

Quantized Fine-tuning LLaMa-2 for RAG Question Answering

QLoRA fine-tuning LLaMa-2 on an instruct-based dataset, prompt engineering, and evaluation.

Human feedback evaluation for RAG pipeline.

SuperKnowa framework for collection human feedback for the application of Retrieval-Augmented Generation (RAG)…

RLHF Reward Model Training

A popular technique to finetune large language models with human feedback is called reinforcement learning from human…

LLM Evaluation Toolkit for RAG Pipelines

Introducing a novel SuperKnowa framework for evaluating LLM-based RAG pipelines and automatically generating…

Debugging pipeline for RAG.

SuperKnowa offers a comprehensive debug pipeline to analyze performance and logging for a Retrieval-Augmented…

The GenAI Use Cases Built using SuperKnowa

1. Conversational Q&A on Private Knowledge Base

2. Ask Your Pdf/Documents

3. Summarisation

4. Key Points from your PDF

5. Text to SQL

Future Enhancements

GitHub - ibm-ecosystem-engineering/SuperKnowa: This repository is intended for IBM Ecosystem…

This repository is intended for IBM Ecosystem partners. It contains pluggable components designed to tackle various…

Written by Kunal Sawarkar