Modular RAG as a Service

Published in

Nuclia

4 min readJun 25, 2024

Set Up your own RAG pipeline and deploy it at scale

In the realm of data processing and retrieval, there is no one-size-fits-all solution. Building a robust Retrieval-Augmented Generation (RAG) system is inherently complex, requiring expertise across various domains. The challenge amplifies when using open-source components, as deploying a RAG system in production demands stringent data protection while ensuring scalability and maintenance.

Nuclia’s modular RAG aims to address these challenges by allowing you to fine-tune every aspect of the RAG process — from data ingestion to chunking and retrieval strategies — without the hassle of piecing everything together. This seamless integration facilitates bringing your tailored RAG pipeline to production swiftly and securely.

What is Retrieval Augmented Generation?

RAG, or Retrieval Augmented Generation, is a technique in natural language processing that combines information retrieval with text generation. Here’s a breakdown of how it works:

1) Retrieval: In this step, the system retrieves relevant information from a large dataset or knowledge base. This dataset can include documents, articles, or any other form of structured or unstructured data. The retrieval process uses search algorithms to find the most relevant pieces of information based on a query or prompt.

2) Augmentation: The retrieved information is then used to augment the input query. This means that the system combines the original input with the retrieved information to provide a richer context. This augmented input helps in generating more accurate and informative responses.

3) Generation: Finally, a text generation model (LLMs) uses the augmented input to generate a coherent and contextually relevant response. The generation step produces a natural language output that answers the query or fulfils the prompt.

What is modular RAG?

Modular Retrieval Augmented Generation (Modular RAG) is an advanced approach within the Retrieval Augmented Generation framework that focuses on a modular architecture for better flexibility, scalability, and customization.

Why choose Nuclia modular RAG as a Service?

Nuclia’s modular RAG-as-a-Service framework provides flexibility and control, enabling you to customize each process of the RAG pipeline to suit your specific use case. Unlike traditional methods, where integration and deployment are fraught with complexities, Nuclia offers a streamlined approach that simplifies the entire setup.

How Does It Work?

The Nuclia modular RAG framework is divided into three main components, each addressing a specific part of the end-to-end RAG process:

1. Data Processing Component
2. Data Storage Component
3. Retrieval and Ranking Component
4. LLM Component

1. Data Processing Component

This component is responsible for extracting data from various file types, including videos and tables. It performs multiple automated tasks:
– Optical Character Recognition (OCR) for extracting text from images.
– Automatic speech-to-text conversion.
– Table detection and extraction.
– Content chunking.
– Named entity recognition for knowledge graphs.

Customizable Options:
– Chunk Size: Define the size of content chunks.
– Anonymization: Anonymize data based on named entities.
– Embeddings: Choose the embedding model to use.
– Resource Summarization: Summarize resources as needed.

2. Data Storage Component

Processed data is stored in NucliaDB, utilizing a robust data framework with four different indexes:
– Full Text Index: For comprehensive text searches.
– Chunk Index: For efficient retrieval of text chunks.
– Vector Index: For similarity searches using vector embeddings.
– Knowledge Graph: For structured data and entity relationships.

3. Retrieval and Ranking Component

This component allows you to decide on various retrieval strategies and ranking mechanisms, ensuring the retrieved data meets your requirements.

Customizable Options:
– Context Size: Define the size of the context to be retrieved.
– Resource Context: Determine the number of resources used as context.
– Textual Hierarchy: Establish a hierarchy for the generated data.
– Query Types: Choose between semantic query, exact match query, or a combination of both.

4. LLM Component

Customize the Language Model (LLM) to suit your application’s needs.

Customizable Options:
– LLM Choice: Select the LLM to use (ChatGPT, Anthropic, Mistral, Gemini) and it’s version.
– Prompt Definition: Define the prompts and the behavior of the LLM.

Nuclia’s modular RAG framework offers a flexible and powerful solution to the complexities of building and deploying RAG systems. By allowing you to fine-tune every aspect of the RAG pipeline, Nuclia ensures that you can create a system tailored to your specific needs and bring it to production with ease.

Set up your own RAG pipeline today with Nuclia, and experience the power of a fully modular and scalable data retrieval solution.

Modular RAG as a Service

Written by Eudald