Production Ready Advanced RAG Optimization with Llama-Index and Qdrant Vector Database

Happy LLM

(λx.x)eranga
Effectz.AI
15 min readJul 18, 2024

--

1. Background

In my previous post series, I discussed building RAG applications using tools such as LlamaIndex, LangChain, GPT4All, Ollama etc to leverage LLMs for specific use cases. In this post, I will explore how to optimize the accuracy/performance of the RAG system using three different techniques, Sentence window retrieval, Hybrid search, Re-ranking. All source codes related to this post have been published on GitLab. Please clone the repo to continue with the post.

2. RAG Types

2.1. Naive RAG

Retrieval-Augmented Generation (RAG) enhances the performance of Large Language Models (LLMs) by incorporating external, authoritative knowledge bases into the response generation process. The RAG applications can be categorized into three categories, Naive RAG, Advanced RAG and Modular RAG. Read more about these RAG types from here. The earliest and most basic RAG methodology referred as Naive RAG. Naive RAG faces various challenges across all phases. In Retrieval phase there could be failure to retrieve all relevant chunks or retrieving irrelevant chunks. In Augmentation phase, the challenges with integrating the context from retrieved chunks that may be disjointed or contain repetitive information. In Generation phase, LLM may potentially generate answers that are not grounded in the provided context (retrieved chunks) or generate answers based on an irrelevant context that is retrieved.

2.2 Advanced RAG

Advanced RAG has evolved as a new paradigm with targeted enhancements to address some of the limitations of the Naive RAG paradigm. The Advanced RAG optimization techniques can be categorized into pre-retrieval, retrieval, and post-retrieval optimizations. Pre-retrieval optimizations focus on data indexing optimizations as well as query optimizations. Data indexing optimization techniques aim to store the data in a way that helps you improve retrieval efficiency. Sliding window, Enhance data granularity, Adding metadata, Optimizing index structures and some techniques of pre-retrieval optimization. The retrieval stage aims to identify the most relevant context. Usually, the retrieval is based on vector search, which calculates the semantic similarity between the query and the indexed data. Thus, the majority of retrieval optimization techniques revolve around the embedding models. Fine-tuning embedding models and Dynamic embedding are some retrieval optimization techniques. Additional processing of the retrieved context can help address issues such as exceeding the context window limit or introducing noise, thus hindering the focus on crucial information. Prompt compression and Re-ranking are some post-retrieval optimization techniques. If you need more information about these advanced RAG optimization techniques, please refer the post here. In this post I have discussed how to use these pre-retrieval, retrieval and post-retrieval optimization techniques in real-world application scenario. In this post, I have used Sentence window retrieval(pre-retrieval), Hybrid search(retrieval), Re-ranking(post-retrieval) to build advanced RAG application. Following figure discuss the difference between Naive vs Advanced RAG with these respective techniques.

2.3. Sentence Window Retrieval

In Sentence Window Retrieval technique used fetch a single sentence during retrieval and return a window of text around the sentence. The key idea behind Sentence Window Retrieval is to separate the embedding and synthesis processes, allowing for more granular and targeted information retrieval. Instead of embedding and retrieving entire text chunks, this method focuses on individual sentences or smaller units of text. By embedding these smaller units and storing them in a vector database, we can perform more precise similarity searches to find the most relevant sentences for a given query. In addition to retrieving the relevant sentences, Sentence Window Retrieval also includes the surrounding context — the sentences that come before and after the target sentence. This expanded context window is then fed into the language model for synthesis, ensuring that the generated answer has the necessary context for coherence and completeness. Read more about Sentence Window Retrieval from here.

2.4. Hybrid Search

Hybrid search is a search technique that combines two or more search algorithms to improve the relevance of search results. Although it is not defined which algorithms are combined, hybrid search most commonly refers to the combination of traditional keyword-based search and modern vector search. Traditionally, keyword-based search was the obvious choice for search engines. But with the advent of Machine Learning algorithms, vector embeddings enabled a new search technique — called vector or semantic search — that allowed us to search across data semantically. However, both search techniques have essential tradeoffs to consider.

  1. Keyword-based search While its exact keyword-matching capabilities are beneficial for specific terms, such as product names or industry jargon, it is sensitive to typos and synonyms, which lead it to miss important context.
  2. Vector or semantic search While its semantic search capabilities allow multi-lingual and multi-modal search based on the data’s semantic meaning and make it robust to typos, it can miss essential keywords. Additionally, it depends on the quality of the generated vector embeddings and is sensitive to out-of-domain terms.

Combining keyword-based and vector searches into a hybrid search allows you to leverage the advantages of both search techniques to improve search results’ relevance, especially for text-search use cases. Read more about hybrid search from here. There are various vector databases which supports Hybrid search. In this post I have used Qdrant vector database. Qdrant supports hybrid search by combining search results from sparse and dense vectors.

2.5. Re-ranking

In RAG, a large number of contexts may be retrieved, but not all of them are necessarily relevant to the question. Reranking allows for the reordering and filtering of documents, placing the relevant ones at the forefront, thereby enhancing the effectiveness of RAG.

In RAG, we perform a semantic search across many text documents, and then use a language model to generate a response. To ensure fast search times at scale, we typically use vector search. However, there is some information loss because we’re compressing this information into a single vector. While semantic search retrieves context based on its semantic similarity to the search query, most similar doesn’t necessarily mean most relevant. Because of this information loss, we often see that the top few vector search documents will miss relevant information. Unfortunately, the retrieval may return relevant information below our top_k cutoff. Reranking is one of the simplest methods for dramatically improving recall performance in RAG or any other retrieval-based pipeline. It involves reordering the most relevant items at the top, even though using a better retrieval model is also an option.

The re-ranking is done by re-ranking models(re-rankers). In summary, rerankers are cross-encoder models that take as input a document-query pair, and emit a combined relevance score for that input pair. Using rerankers, users can sort documents from most to least relevant for a given query. There are various re-ranking models available. In this post I have used re-rank model BAAI/bge-reranker-base with SentenceTransformerRerank. SentenceTransformerRerank uses the cross-encoders from the sentence-transformer package to re-order nodes. Read more about these re-ranking types from this post.

3. Implementation

The complete implementation of the RAG application is detailed below. The full source code of the application is available for access and review on GitLab.

3.1. Configurations

In the config.py file, I have defined various configurations used in the RAG application. These configurations are read through environment variables in adherence to the principles of 12-factor apps.

import os

# define init index
INIT_INDEX = os.getenv('INIT_INDEX', 'false').lower() == 'true'

# http api port
HTTP_PORT = os.getenv('HTTP_PORT', 7654)

# qdrant vector store config
QDRANT_HOST = os.getenv('QDRANT_HOST', 'localhost')
QDRANT_PORT = os.getenv('QDRANT_PORT', 6333)

3.2 HTTP API

The RAG application contains HTTP API. The HTTP API implementation is carried out in api.py. This API includes an HTTP POST endpoint api/question, which accepts a JSON object containing a question and user_id. The user_id is utilized for demonstration purposes. In a real-world application, this could be managed with an Authorization header (e.g., JWT Bearer token) in the HTTP request. When a question request is received from the user, it is forwarded to the chat function in the model.py.

from flask import Flask
from flask import jsonify
from flask import request
from flask_cors import CORS
import logging
import sys
from model import *
from config import *

app = Flask(__name__)
CORS(app)

logging.basicConfig(stream=sys.stdout, level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')

@app.route('/api/question', methods=['POST'])
def post_question():
json = request.get_json(silent=True)
question = json['question']
user_id = json['user_id']
logging.info("post question `%s` for user `%s`", question, user_id)

resp = chat(question, user_id)
data = {'answer':resp}

return jsonify(data), 200

if __name__ == '__main__':
init_llm()
init_index()
init_query_engine()

app.run(host='0.0.0.0', port=HTTP_PORT, debug=True)

3.3 Model

Below is the implementation of the Model. It includes a function, init_index, which scrapes data from a given directory path and creates the vector index in Qdrant vector store. It do pre-retrieval optimization with SentenceWindowNodeParser. It separates the document into single sentences, which will be embedded. For each sentence, it creates a context window. In here window_size is 3(window_size = 3) which means the resulting window will be three sentences long, starting at the previous sentence of the embedded sentence and spanning the sentence after. The window will be stored as metadata. During retrieval, the sentence that most closely matches the query is returned. After retrieval, you need to replace the sentence with the entire window from the metadata by defining a MetadataReplacementPostProcessor and using it in the list of node_postprocessors. The hybrid search enabled in the QdrantVectorStore. The init_query_engine function initializes the LlamIndex query engine with the previously created vector store index. It handles the post-retrieval re-ranking with SentenceTransformerRerank with BAAI/bge-reranker-base model. The chat function is responsible for posting questions to the LLM.

import chromadb
import logging
import sys

from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.core import (Settings, VectorStoreIndex, SimpleDirectoryReader, PromptTemplate, Document)
from llama_index.core import StorageContext, ServiceContext
from llama_index.core.node_parser import SentenceWindowNodeParser
from llama_index.core.indices.postprocessor import MetadataReplacementPostProcessor
from llama_index.core.indices.postprocessor import SentenceTransformerRerank
from qdrant_client import QdrantClient
from llama_index.vector_stores.qdrant import QdrantVectorStore
from llama_index.llms.openai import OpenAI

import logging
import sys

from config import *

logging.basicConfig(stream=sys.stdout, level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')


global query_engine
query_engine = None

global index
index = None

def init_llm():
llm = OpenAI(model="gpt-4")
embed_model = OpenAIEmbedding(model_name="text-embedding-3-large")

Settings.llm = llm
Settings.embed_model = embed_model


def init_index():
global index

# read documents in docs directory
# the directory contains data set related to red team and blue team cyber security strategy
reader = SimpleDirectoryReader(input_dir="./docs", recursive=True)
documents = reader.load_data()

logging.info("index creating with `%d` documents", len(documents))

# create large document with documents for better text balancing
document = Document(text="\n\n".join([doc.text for doc in documents]))

# sentece window node parser
# window_size = 3, the resulting window will be three sentences long
node_parser = SentenceWindowNodeParser.from_defaults(
window_size=3,
window_metadata_key="window",
original_text_metadata_key="original_text",
)

# create qdrant client
qdrant_client = QdrantClient(f"http://{QDRANT_HOST}:{QDRANT_PORT}")

# delete collection if exists,
# in production application, the collection needs to be handle without deleting
qdrant_client.delete_collection("rahasak")

# qdrant vector store with enabling hybrid search
vector_store = QdrantVectorStore(
collection_name="rahasak",
client=qdrant_client,
enable_hybrid=True,
batch_size=20
)

# storage context and service context
storage_context = StorageContext.from_defaults(vector_store=vector_store)
service_context = ServiceContext.from_defaults(
llm=Settings.llm,
embed_model=Settings.embed_model,
node_parser=node_parser,
)

# initialize vector store index with qdrant
index = VectorStoreIndex.from_documents(
[document],
service_context=service_context,
storage_context=storage_context,
embed_model=Settings.embed_model
)


def init_query_engine():
global query_engine
global index

# after retrieval, we need to replace the sentence with the entire window from the metadata by defining a
# MetadataReplacementPostProcessor and using it in the list of node_postprocessors
postproc = MetadataReplacementPostProcessor(target_metadata_key="window")

# re-ranker with BAAI/bge-reranker-base model
rerank = SentenceTransformerRerank(
top_n=2,
model="BAAI/bge-reranker-base"
)

# similarity_top_k configure the retriever to return the top 3 most similar documents, the default value of similarity_top_k is 2
# use meta data post processor and re-ranker as post processors
query_engine = index.as_query_engine(
similarity_top_k=3,
node_postprocessors=[postproc, rerank],
)


def chat(input_question, user):
global query_engine

response = query_engine.query(input_question)
logging.info("response from llm - %s", response)

# view sentece window retrieval window and origianl text
logging.info("sentence window retrieval window - %s", response.source_nodes[0].node.metadata["window"])
logging.info("sentence window retrieval orginal_text - %s", response.source_nodes[0].node.metadata["original_text"])

return response.response

4. Run Application

Below are the main steps to operate the RAG application and interact with it. Questions can be submitted using the HTTP API, and responses will be received accordingly.

4.1. Install Dependencies

In this application, I have utilized a number of Python packages that need to be installed using Python’s pip package manager before running the application. The requirements.txt file lists all the necessary packages.

huggingface-hub
sentence-transformers

Flask==2.0.1
Werkzeug==2.2.2
flask-cors

tiktoken
unstructured
llama-index
llama-index-llms-openai
llama-index-embeddings-openai
llama-index-embeddings-huggingface
llama-index-vector-stores-qdrant

torch
qdrant-client
fastembed

I have used python virtual environment to setup these dependencies. These packages can be easily installed by executing the command pip install -r requirements.txt.

# create virtual environment in `ollama` source directory
❯❯ cd iollama
❯❯ python -m venv .venv

# enable virtual environment
❯❯ source .venv/bin/activate

# install dependencies
❯❯ pip install -r requirements.txt

# if you get dependency conflict issues when installing the packages through requirements.txt, manually install them via pip
❯❯ pip install huggingface-hub
❯❯ pip install sentence-transformers
❯❯ pip install Flask==2.0.1
❯❯ pip install Werkzeug==2.2.2
❯❯ pip install flask-cors
❯❯ pip install tiktoken
❯❯ pip install unstructured
❯❯ pip install llama-index
❯❯ pip install llama-index-llms-openai
❯❯ pip install llama-index-embeddings-openai
❯❯ pip install llama-index-embeddings-huggingface
❯❯ pip install llama-index-vector-stores-qdrant
❯❯ pip install torch
❯❯ pip install qdrant-client
❯❯ pip install fastembed

4.2. Run Qdrant Vector Store

I have used Qdrant vector database as the vector storage in the RAG application. Next I have run the Qdrant with docker.

# run qdrant with docker
❯❯ docker run -d -p 6333:6333 qdrant/qdrant

❯❯ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
8a80f7907273 docker.io/qdrant/qdrant:latest ./entrypoint.sh 26 hours ago Up 26 hours 0.0.0.0:6333->6333/tcp priceless_ishizaka

4.3. Run RAG Application

The RAG application can be initiated through api.py as outlined below. Prior to running it, it's necessary to set a few configurations via environment variables. Once api.py is executed, it will start the HTTP API, enabling users to post their questions.

# set openai API key
❯❯ export OPENAI_API_KEY=<add openai api key here>

# run the rag application
❯❯ python api.py
2024-07-18 13:55:23,265 - INFO - index creating with `46` documents
2024-07-18 13:55:23,274 - INFO - HTTP Request: DELETE http://localhost:6333/collections/rahasak "HTTP/1.1 200 OK"
2024-07-18 13:55:23,275 - INFO - HTTP Request: GET http://localhost:6333/collections/rahasak/exists "HTTP/1.1 200 OK"
2024-07-18 13:55:23,277 - INFO - HTTP Request: GET http://localhost:6333/collections/rahasak/exists "HTTP/1.1 200 OK"
Fetching 5 files: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:00<00:00, 67216.41it/s]
Fetching 5 files: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:00<00:00, 51025.60it/s]
2024-07-18 13:55:24,253 - INFO - HTTP Request: GET http://localhost:6333/collections/rahasak/exists "HTTP/1.1 200 OK"
Fetching 5 files: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:00<00:00, 54330.36it/s]
Fetching 5 files: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:00<00:00, 51025.60it/s]
2024-07-18 13:55:25,949 - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2024-07-18 13:55:27,382 - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2024-07-18 13:55:28,700 - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2024-07-18 13:55:30,145 - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2024-07-18 13:55:30,171 - INFO - Retrying request to /embeddings in 0.913349 seconds
2024-07-18 13:55:38,234 - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2024-07-18 13:55:39,565 - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2024-07-18 13:55:41,006 - INFO - HTTP Request: PUT http://localhost:6333/collections/rahasak "HTTP/1.1 200 OK"
2024-07-18 13:55:41,017 - INFO - HTTP Request: PUT http://localhost:6333/collections/rahasak/index?wait=true "HTTP/1.1 200 OK"
2024-07-18 13:55:41,019 - INFO - HTTP Request: GET http://localhost:6333/collections/rahasak/exists "HTTP/1.1 200 OK"
2024-07-18 13:55:41,020 - INFO - HTTP Request: GET http://localhost:6333/collections/rahasak "HTTP/1.1 200 OK"
2024-07-18 13:56:09,043 - INFO - HTTP Request: PUT http://localhost:6333/collections/rahasak/points?wait=true "HTTP/1.1 200 OK"
2024-07-18 13:56:09,078 - INFO - HTTP Request: PUT http://localhost:6333/collections/rahasak/points?wait=true "HTTP/1.1 200 OK"
2024-07-18 13:56:09,101 - INFO - HTTP Request: PUT http://localhost:6333/collections/rahasak/points?wait=true "HTTP/1.1 200 OK"
2024-07-18 13:56:09,123 - INFO - HTTP Request: PUT http://localhost:6333/collections/rahasak/points?wait=true "HTTP/1.1 200 OK"
2024-07-18 13:56:09,144 - INFO - HTTP Request: PUT http://localhost:6333/collections/rahasak/points?wait=true "HTTP/1.1 200 OK"
2024-07-18 13:56:09,164 - INFO - HTTP Request: PUT http://localhost:6333/collections/rahasak/points?wait=true "HTTP/1.1 200 OK"
2024-07-18 13:56:09,190 - INFO - HTTP Request: PUT http://localhost:6333/collections/rahasak/points?wait=true "HTTP/1.1 200 OK"
2024-07-18 13:56:09,214 - INFO - HTTP Request: PUT http://localhost:6333/collections/rahasak/points?wait=true "HTTP/1.1 200 OK"
2024-07-18 13:56:09,240 - INFO - HTTP Request: PUT http://localhost:6333/collections/rahasak/points?wait=true "HTTP/1.1 200 OK"
2024-07-18 13:56:09,263 - INFO - HTTP Request: PUT http://localhost:6333/collections/rahasak/points?wait=true "HTTP/1.1 200 OK"
2024-07-18 13:56:09,287 - INFO - HTTP Request: PUT http://localhost:6333/collections/rahasak/points?wait=true "HTTP/1.1 200 OK"
2024-07-18 13:56:09,311 - INFO - HTTP Request: PUT http://localhost:6333/collections/rahasak/points?wait=true "HTTP/1.1 200 OK"
2024-07-18 13:56:09,334 - INFO - HTTP Request: PUT http://localhost:6333/collections/rahasak/points?wait=true "HTTP/1.1 200 OK"
2024-07-18 13:56:09,353 - INFO - HTTP Request: PUT http://localhost:6333/collections/rahasak/points?wait=true "HTTP/1.1 200 OK"
2024-07-18 13:56:09,378 - INFO - HTTP Request: PUT http://localhost:6333/collections/rahasak/points?wait=true "HTTP/1.1 200 OK"
2024-07-18 13:56:09,409 - INFO - HTTP Request: PUT http://localhost:6333/collections/rahasak/points?wait=true "HTTP/1.1 200 OK"
2024-07-18 13:56:09,433 - INFO - HTTP Request: PUT http://localhost:6333/collections/rahasak/points?wait=true "HTTP/1.1 200 OK"
2024-07-18 13:56:09,455 - INFO - HTTP Request: PUT http://localhost:6333/collections/rahasak/points?wait=true "HTTP/1.1 200 OK"
2024-07-18 13:56:09,480 - INFO - HTTP Request: PUT http://localhost:6333/collections/rahasak/points?wait=true "HTTP/1.1 200 OK"
2024-07-18 13:56:09,505 - INFO - HTTP Request: PUT http://localhost:6333/collections/rahasak/points?wait=true "HTTP/1.1 200 OK"
2024-07-18 13:56:09,526 - INFO - HTTP Request: PUT http://localhost:6333/collections/rahasak/points?wait=true "HTTP/1.1 200 OK"
2024-07-18 13:56:09,550 - INFO - HTTP Request: PUT http://localhost:6333/collections/rahasak/points?wait=true "HTTP/1.1 200 OK"
2024-07-18 13:56:09,559 - INFO - HTTP Request: PUT http://localhost:6333/collections/rahasak/points?wait=true "HTTP/1.1 200 OK"
* Serving Flask app 'api' (lazy loading)
* Environment: production
WARNING: This is a development server. Do not use it in a production deployment.
Use a production WSGI server instead.
* Debug mode: on
2024-07-18 13:56:10,576 - INFO - WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead.
* Running on all addresses (0.0.0.0)
* Running on http://127.0.0.1:7654
* Running on http://10.13.40.61:7654
2024-07-18 13:56:10,576 - INFO - Press CTRL+C to quit
2024-07-18 13:56:10,577 - INFO - * Restarting with stat

4.4 Post Question

Once the RAG application is running, I can submit questions related to the red-team use case via the HTTP API. In here we have the ability to examine the original sentence retrieved for each node, along with the surrounding window of sentences that were forwarded to the LLM.

# post question
❯❯ curl -XPOST "http://localhost:7654/api/question" \
--header "Content-Type: application/json" \
--data '
{
"question": "what is red team",
"user_id": "eranga"
}
'


# debug output of rag, original sentence retrieved for each node,
2024-07-18 13:57:21,582 - INFO - sentence window retrieval orginal_text - A red team consists of cybersecurity experts who play
the role of adversaries, probing for weaknesses and vulnerabilities within an
organization’s security systems and procedures.

# debug output of rag, surrounding window of sentences that were forwarded to the LLM
2024-07-18 13:57:21,582 - INFO - sentence window retrieval window - Understanding Red Teaming
Red teaming is a structured and strategic approach to evaluating an
organization's security measures. Unlike traditional security assessments,
red teaming goes beyond simple vulnerability scanning or penetration
testing. It simulates real-world cyber threats and assesses an organization's
overall preparedness. A red team consists of cybersecurity experts who play
the role of adversaries, probing for weaknesses and vulnerabilities within an
organization's security systems and procedures.
Key Red Teaming Services
Red team services offer a multifaceted approach to cybersecurity, making
them a vital component of regulatory compliance and risk management.
These services encompass various practices, including:
Penetration Testing:
Penetration testing, often referred to as pen-testing, is a foundational
element of red teaming services. It involves simulating cyberattacks to
3/23/24, 1:17 PM The Role of Red Teaming in Regulatory Compliance and Risk Management | by John Nathan | Medium
https://medium.com/@johnnathans/the-role-of-red-teaming-in-regulatory-compliance-and-risk-management-3b1e4e3f320d 3/15identify vulnerabilities in an organization's networks, systems, and
applications.


# response from llm
{
"answer": "A red team is a group of cybersecurity experts who simulate real-world cyber threats to assess an organization's overall preparedness. They play the role of adversaries, probing for weaknesses and vulnerabilities within an organization\u2019s security systems and procedures. Their approach is target-driven, seeking to gain access to predetermined objectives by exploiting relevant weaknesses anywhere within an organization. This process goes beyond simple vulnerability scanning or penetration testing. The red team often operates as an independent cybersecurity provider."
}



---



# post question
❯❯ curl -XPOST "http://localhost:7654/api/question" \
--header "Content-Type: application/json" \
--data '
{
"question": "what is blue team",
"user_id": "eranga"
}
'


# debug output of rag, original sentence retrieved for each node,
2024-07-18 14:04:50,840 - INFO - sentence window retrieval orginal_text - The blue team focuses on analyzing such attacks and developing
methodologies for their mitigation and prevention.
2024-07-18 14:04:50,840 - INFO - 127.0.0.1 - - [18/Jul/2024 14:04:50] "POST /api/question HTTP/1.1" 200 -

# debug output of rag, surrounding window of sentences that were forwarded to the LLM
2024-07-18 14:04:50,839 - INFO - sentence window retrieval window - View GDPR Compliance Commitment
✓ I agree
3/23/24, 1:18 PM The Role of Red Team and Blue Team in Cybersecurity
https://maddevs.io/blog/red-team-vs-blue-team-in-cybersecurity/ 2/24are the red and blue teams. In this article, we will explore the fundamental principles of
their operation, methods of interaction, and their value for the cybersecurity of
organizations.
The red team specializes in simulating real threats and attacks to identify vulnerabilities
in defense systems. The blue team focuses on analyzing such attacks and developing
methodologies for their mitigation and prevention. The purple team aims to facilitate
effective interaction between offensive and defensive elements, analyze the results,
and suggest measures for optimizing mutual strategies and tactics.
The collaborative work of the red and blue teams transforms cybersecurity approaches
from static measures into a dynamic, continuously updated system. The purple team
coordinates these efforts, and ensures effective communication and knowledge transfer
between the red and blue teams, thereby enhancing the overall effectiveness of a
cybersecurity strategy.


# reponse from llm
{
"answer": "The blue team in cybersecurity focuses on analyzing attacks and developing methodologies for their mitigation and prevention. They are responsible for installing, configuring, and monitoring antivirus software, intrusion detection systems, and other protective mechanisms on devices that connect to the corporate network."
}

Reference

  1. https://medium.com/@drjulija/what-are-naive-rag-advanced-rag-modular-rag-paradigms-edff410c202e
  2. https://www.thecloudgirl.dev/blog/three-paradigms-of-retrieval-augmented-generation-rag-for-llms
  3. https://towardsdatascience.com/advanced-retrieval-augmented-generation-from-theory-to-llamaindex-implementation-4de1464a9930#3275
  4. https://towardsdatascience.com/advanced-rag-01-small-to-big-retrieval-172181b396d4
  5. https://www.linkedin.com/pulse/sentence-window-retrieval-optimizing-llm-performance-rutam-bhagat-v24of
  6. https://towardsdatascience.com/improving-retrieval-performance-in-rag-pipelines-with-hybrid-search-c75203c2f2f5
  7. https://medium.com/@samvardhan777/enhancing-rag-with-hybrid-search-using-qdrant-and-llamaindex-e190f93e4864
  8. https://medium.com/@abul.aala.fareh/different-reranking-techniques-in-llamaindex-6a56ed1f30a3
  9. https://medium.aiplanet.com/advanced-rag-cohere-re-ranker-99acc941601c
  10. https://medium.com/@bavalpreetsinghh/llamaindex-enhancing-context-with-metadata-replacement-and-sentence-window-node-parser-94e5ed8cdd6a
  11. https://towardsdatascience.com/advanced-retrieval-augmented-generation-from-theory-to-llamaindex-implementation-4de1464a9930#c1e2
  12. https://medium.aiplanet.com/setting-up-query-pipeline-for-advanced-rag-workflow-using-llamaindex-666ddd7d0d41
  13. https://medium.aiplanet.com/advanced-rag-using-llama-index-e06b00dc0ed8

--

--