Harnessing the Power of Corrective RAG (CRAG): Building High-Precision Recommendation Systems with Qdrant & Llama 3

21 min readAug 25, 2024

What are recommendation systems?

Scrolling through your favourite social media app and seeing content tailored to your preferences, or picking a movie to watch that appeared on your Netflix home screen, are all examples of recommendation systems. In the simplest of terms, a recommendation system is a technology that helps users find items or content that might interest them. Not only do they play a huge role in enhancing user experience, they also provide tons of benefits to business owners. From “you might also like” nudges to sharing “what other customers are buying”, these systems can bump up sales by a whopping 10–50%!

Types of Recommendation Systems:

Collaborative Filtering: This type suggests items based on what similar users liked. Imagine it as a buddy system — if people with tastes like yours loved a certain book or movie, it’s likely you’ll enjoy it too!
Content-Based Filtering: This approach recommends items similar to the ones you’ve shown an interest in before.

If you only know which interactions happened in the past, collaborative filtering is your go-to method. However, if you have additional data about users and items, you can enhance your predictions by using content and context filtering to gauge the likelihood of new interactions. A hybrid approach is often used, depending on the type of data you have access to.

What Is Corrective RAG?

RAG boosts language models by using retrieved documents, but it can falter if those documents aren’t relevant. To tackle this, Corrective RAG (CRAG) was developed, which makes these systems more reliable. CRAG has a simple evaluator that checks the quality of the documents and decides the next steps based on confidence levels. To ensure we’re getting the best responses, CRAG also pulls data from the web, not just static databases.

We will understand CRAG in depth in the following sections but, before that, let us quickly define the scope and prerequisites for this tutorial.

Purpose: This guide aims to demonstrate how to build high-precision recommendation systems by leveraging the Corrective RAG (CRAG) framework, using Qdrant for vector similarity search of stored data points and Llama 3 as LLM for generating responses.

Prerequisites: To follow this guide, you should have a basic understanding of machine learning concepts, particularly in natural language processing (NLP) and recommendation systems. Familiarity with vector databases, Python programming, and a few tools like Qdrant, Llama 3 and Taviliy is also recommended.

Understanding Corrective RAG

Corrective Retrieval-Augmented Generation (CRAG) is a framework designed to enhance the robustness and accuracy of language model-generated responses. Unlike standard RAG systems, CRAG introduces a corrective layer to ensure the relevance and reliability of the retrieved information.

Components Involved in Corrective RAG:

1. Retrieval

The retrieval component involves fetching relevant documents from both static databases and large-scale web searches. This stage aims to gather a broad range of potential information sources that may answer the input query. It leverages existing retrieval mechanisms and also rewrites queries to perform web searches, thereby extending the scope and diversity of the retrieved content.

2. Knowledge Correction

This component refines and verifies the relevance and accuracy of the retrieved documents. It comprises three sub-components:

2.a. Retrieval Evaluator

The retrieval evaluator assesses the relevance of each document or knowledge strip against the input query. Using a fine-tuned model, it assigns confidence scores to determine if the content is Correct, Incorrect, or Ambiguous. This scoring helps decide whether the documents should be used directly for response generation, discarded, or supplemented with additional data.

2.b. Knowledge Refinement

In the knowledge refinement step, each relevant document is broken down into fine-grained knowledge strips. These strips are individually evaluated for relevance, with irrelevant strips filtered out. The remaining relevant strips are then recomposed into coherent internal knowledge, which provides precise and concise information to the generator.

2.c. Knowledge Searching (Web Search)

When static sources fail to provide relevant information, the knowledge searching component comes into play. It uses web searches to find external knowledge. The process involves rewriting the query into keyword-rich prompts, retrieving relevant web pages, and extracting their content. This content is then refined similarly to the internal knowledge, ensuring only the most relevant information is retained.

3. Generator

The generator is the final stage, where an arbitrary generative model utilizes the refined knowledge (both internal and external) to produce the final response. It synthesizes the processed information into coherent, contextually appropriate outputs, ensuring that the responses are based on the most relevant and accurate data available.

Together, these components ensure that the CRAG system not only retrieves relevant information but also critically evaluates and refines it to enhance the overall quality and reliability of the generated responses.

Feel free to browse through the research paper on Corrective RAG (CRAG) to study this in depth. The paper also demonstrates how CRAG can be seamlessly integrated with existing models, offering a lightweight yet powerful solution for improving language generation systems.

Advantages of Corrective RAG Over Traditional Methods

Higher Accuracy: The use of a retrieval evaluator and corrective actions lead to a more accurate selection and utilization of information. For instance, in evaluations, CRAG showed substantial improvements over standard RAG and other advanced methods like Self-RAG across various datasets, with accuracy improvements of up to 36.6% on certain tasks.
Consistent Performance Across Tasks: CRAG’s ability to refine and correct information ensures that it performs well across different types of tasks, whether short-form or long-form content generation. This consistency is crucial for applications requiring precise and contextually accurate responses.
Enhanced Adaptability: CRAG is designed to work with various underlying LLMs without needing specific instruction tuning for critic tokens, unlike some advanced methods. This adaptability allows it to maintain high precision even when switching between different models, providing a plug-and-play solution that can leverage the best available language models.

Let’s now go over the tools needed for this guide.

1. Qdrant: Vector Search Engine

Qdrant is a highly efficient vector search engine designed for high-dimensional vector searches, commonly used in applications such as recommendation systems, semantic search, and machine learning model retrievals. Key features of Qdrant include:

High-Speed Vector Searches: Qdrant excels in performing rapid and accurate searches across large datasets of high-dimensional vectors.
Scalability: It can scale horizontally, handling growing datasets and increased query loads efficiently.
Integration and API: Qdrant offers user-friendly APIs for seamless integration with various data pipelines and applications.
Real-Time Updates: The engine supports real-time updates, ensuring that the search index remains current with the latest data.

In the context of Corrective RAG, Qdrant plays a crucial role by:

Efficient Retrieval: Qdrant quickly retrieves relevant documents or knowledge strips based on vector similarity, ensuring that the most pertinent information is available for further evaluation.
Enhanced Precision: By facilitating high-speed searches, Qdrant ensures that the retrieval process is not a bottleneck, allowing the system to handle large volumes of data and complex queries with ease.
Support for Corrective Actions: The efficiency of Qdrant allows CRAG to rapidly perform corrective actions, such as discarding irrelevant documents and conducting supplementary web searches without significant delays. As it stores the similarity scores, we can directly access it to use it in our retrieval evaluation pipeline.

2. Llama 3: Advanced LLM

Llama 3 is a state-of-the-art language model (LLM) known for its advanced natural language processing capabilities. Llama 3 has enhanced capabilities in understanding and maintaining context over long passages, making it highly effective for generating coherent and contextually accurate responses. The main reason to use it, however, is that it outperforms GPT 4 in grade-level reasoning.

How Qdrant and Llama 3 Work Together to Create a Seamless Recommendation System

You can use any vector database and LLM of your choice as CRAG is simply a framework to build pipelines, and any of the tools can be interchanged. However, there are a few advantages to using them. Efficient vector storage as well as very fast search of similar vectors land us in the sweet spot of retrieval evaluation for CRAG in just a few lines of codes. Llama 3 is fast, open source, and high-performing for content generation tasks. This works in our favour as recommendation systems do not require complex tasks like solving mathematical or coding problems. Groq allows us to use the language model with 70B parameters and no restrictions on the context window, free of cost.

The combined efficiency of these two tools ensures that the system can handle large-scale data and complex queries without compromising on speed or accuracy.

Now that we have discussed the key concepts a fair bit, let us go over the implementation.

Setting Up the Environment

Create a Python virtual environment.

python -m venv env

Activate the virtual environment.

source env/bin/activate

Check the Python version that is installed.

python --version

Use the commands pip list and pip freeze > requirements.txt to save all the packages and their versions in the environment.

Installing Qdrant

This can be done locally or using the Qdrant Cloud. For development and testing, we will be deploying it locally.

Pull the image:

docker pull qdrant/qdrant

Also install qdrant-client using the following command, which would help us connect to the container where Qdrant is running :

pip install qdrant-client

In the following command, revise $(pwd)/path/to/data for your Docker configuration. Then use the updated command to run the container:

docker run -p 6333:6333 \
    -v $(pwd)/path/to/data:/qdrant/storage \
    qdrant/qdrant

With this command, you start a Qdrant instance with the default configuration. It stores all data in the ./path/to/data directory. Here I am creating qdrant_storage as the directory name to store all the data.

By default, Qdrant uses port 6333, so at localhost:6333 you should see the welcome message ensuring that you are connected to the database.

Installing Llama 3

Groq is a library for optimizing and deploying models. It requires a license key, which you can obtain from the Groq website.

pip install groq

First, sign up/sign in with your account, then go to GroqCloud -> Create API Key which you can copy and save. Browse through the documentation to select from different Llama models available for use.

Test the connection with the API key by initiating Groq client and using ‘llama-3.1–8b-instant’ model to generate a response to your query.

#Testing Groq API connection by running a simple test
import os
from groq import Groq
os.environ["GROQ_API_KEY"] = "API_KEY"
client = Groq()
response = client.chat.completions.create(
    messages = [
        {"role": "user", "content": "Give me the names of the continents of the world"}
    ],
    model = "llama-3.1-8b-instant"
)
print(response)

Other libraries and packages to be installed

You would also need to install tavily-python to perform web searches. Fetch the API key by clicking on Get API Key on the top right corner of the page.

I also used basic Python libraries and packages for data preprocessing, such as NumPy, pandas, and re. This step depends on the dataset you are working with.

Building the Recommendation System

Data Preparation

To build a recommendation system, I will use the E Commerce dataset. I will also create a user-item interaction file from it. Hence, to keep things simple, we will use Collaborative filtering to recommend articles to users.

Load the csv data file

# Load the CSV data into a pandas DataFrame
df = pd.read_csv('/Users/anveshamishra/Downloads/data.csv', encoding='ISO-8859–1')

Perform data cleaning and handling of missing value.

# Convert CustomerID to integer, handling missing values if necessary
df = df.dropna()
df['CustomerID'] = df['CustomerID'].astype(int)

Check the number of unique stock codes. As Stock Codes are alphanumeric, we map it to integers to allow us to search and retrieve items easily. I have also reduced the number of items to 250 to reduce the computational time but I do encourage you to try with the entire dataset.

# Create a unique list of StockCodes after filtering
unique_stock_codes = df['StockCode'].unique()
print(f"Unique Stock Codes: {unique_stock_codes}")
# Create a positional mapping for each StockCode
stock_code_mapping = {code: idx+1 for idx, code in enumerate(unique_stock_codes)}
# Replace StockCodes with positional mapping
df['StockCode'] = df['StockCode'].map(stock_code_mapping)
# Drop rows where StockCode values are greater than 250
df = df[df['StockCode'] <= 250]

We now have 3995 customers and 250 items. I have grouped the transaction records on the basis of customer id to later store the transaction history of each customer on the database. You can save the dictionary with key (customer id) and values (transactions) in a CSVfile or JSON file. I chose the latter.

# Group the data by CustomerID
grouped = df.groupby('CustomerID')
# Initialize a dictionary to hold the grouped data with sampled readings
data_by_customer = {}
# Set the random seed for reproducibility
np.random.seed(42)
# Iterate over each group
for customer_id, group in grouped:
 # Drop the 'CustomerID' column and convert the rest of the group to a dictionary
 data_by_customer[int(customer_id)] = group.drop(columns=['CustomerID']).to_dict(orient='records')
# Convert the dictionary to a JSON string
json_data = json.dumps(data_by_customer, indent=4)

Lastly, create a user interaction file to create a vector representation of the items which have been bought by each customer.

df2 = df[['CustomerID','StockCode']]
df2 = df2.groupby(['CustomerID'])['StockCode'].agg(list).reset_index()
df2['StockCode']= df2['StockCode'].transform(lambda x : [0 if y+1 not in x else y+1 for y in range(num_items)])
filename = 'user_interaction_data.csv'
df2.to_csv(filename,index = False)
print(f"User Interactive CSV file saved successfully as {filename}.")

Converting Data into Vector Embeddings and Storing It in the Qdrant Database

Load the two files created above and initialize the Qdrant Client.

import json
from qdrant_client import QdrantClient
from qdrant_client import models
import pandas as pd
import ast

# Load the JSON data
with open('/Users/anveshamishra/Documents/GitHub/crag_based_recommender_system/data_preprocessing/data_grouped_by_customer_reduced.json', 'r') as f:
    data_by_customer = json.load(f)

user_interaction = pd.read_csv('/Users/anveshamishra/Documents/GitHub/crag_based_recommender_system/user_interaction_data.csv')
index = user_interaction['CustomerID'].tolist()


# Initialize the Qdrant client (adjust host and port as needed)
client = QdrantClient(host='localhost', port=6333)

Create a collection to store the vector embeddings and payload.

# Define the collection name
collection_name = "customer_recommendations"

# Check if the collection exists, delete if it does, and create a new one
if client.collection_exists(collection_name):
    client.delete_collection(collection_name)


# Create a collection

first_collection = client.create_collection(
    collection_name=collection_name,
    vectors_config=models.VectorParams(size=250, distance=models.Distance.COSINE)
)

Insert the customer ids as index and the transactions as payload into the collection.

# Index the data into Qdrant
vector_insert = client.upsert(
    collection_name=collection_name,
    points=models.Batch(
        ids=index,
        vectors=user_interaction['StockCode'].tolist()
    )
)
print(vector_insert)

# Initialize an empty list to store aggregated payloads
payloads = []

# Populate the payload list with aggregated data for each customer
for customer_id, transactions in data_by_customer.items():
    aggregated_payload = {
        "CustomerID": customer_id,
        "Transactions": transactions
    }
    payloads.append(aggregated_payload)

# Ensure ids and payloads match in number
assert len(index) == len(payloads), "Number of IDs and payloads must match."

# Add payloads to the indexed data
payloads_insert = client.upsert(
    collection_name=collection_name,
    points=models.Batch(
        ids=index,
        vectors=user_interaction['StockCode'].tolist(),
        payloads=payloads
    )
)

print("Data successfully indexed with payloads.")

Implementing the Retrieval Mechanism with Qdrant

To search for similar vectors using a stored vector for a specific CustomerID (e.g., 12357), you can first retrieve the vector corresponding to that CustomerID from the Qdrant storage. Then, use this vector as the query vector for the search operation. Here’s how you can do it:

Use the Qdrant client.scroll API to get the vector associated with the customer ID.

from qdrant_client import QdrantClient
from qdrant_client import models
# Initialize the Qdrant client (adjust host and port as needed)
client = QdrantClient(host='localhost', port=6333)
# Define the collection name
collection_name = "customer_recommendations"
target_customer_id = "12357"
result = client.scroll(
    collection_name=collection_name,
    scroll_filter=models.Filter(
        must=[
            models.FieldCondition(key="CustomerID", match=models.MatchValue(value=target_customer_id)),
        ]
    ),
    limit=1,
    with_payload=True,
    with_vectors=True,
)

# Extract the vector from the result
if result and result[0][0].vector is not None:
    target_vector = result[0][0].vector
    #print(f"Vector for CustomerID {target_customer_id}: {target_vector}")
else:
    #print(f"No vector found for CustomerID {target_customer_id}")
    exit()

2. Perform a search operation using the retrieved vector as the query vector to find similar vectors.

# Now you can use this vector to search for similar vectors
similar_customers = client.search(
    collection_name=collection_name,
    query_vector=target_vector,
    limit=2  # You can set the limit to any number you prefer
)

# Print the results
for res in similar_customers:
    print(f"CustomerID: {res.id}, Score: {res.score}")

Output:

Notice that, out of the two similar customers, one is the query vector itself. Therefore, we need LLMs to create more user readable responses.

3. You can even get the payload information for all the vectors that were similar to the queried vector. This is important as this allows us to retrieve the transaction history of all the customers, giving us access to more information.

# Print the results and their payloads
for res in similar_customers:
    customer_id = res.id
    score = res.score
    payload = res.payload
    print(f"CustomerID: {customer_id}, Score: {score}")
    print("Payload:")
    for transaction in payload.get('Transactions', []):
        print(f"  - InvoiceNo: {transaction['InvoiceNo']}, "
              f"StockCode: {transaction['StockCode']}, "
              f"Description: {transaction['Description']}, "
              f"Quantity: {transaction['Quantity']}, "
              f"InvoiceDate: {transaction['InvoiceDate']}, "
              f"UnitPrice: {transaction['UnitPrice']}, "
              f"Country: {transaction['Country']}")

Output:

Similar documents retrieved using Qdrant

Enhancing Recommendations with Llama 3

To do this through a prompt, the steps are pretty much the same as above but with an extra function that extracts the customer id from the prompt.

# Function to parse the query and extract customer ID
def extract_customer_id(query):
    # Define a pattern that looks for keywords associated with customer IDs followed by a number
    pattern = r"(?:customer\s*id|customer\s*number|account\s*id)\s*[:#-]?\s*(\d+)"

    # Search for the pattern in the query
    match = re.search(pattern, query, re.IGNORECASE)

    if match:
        return match.group(1)

    # As a fallback, check for any number without specific keywords
    # Avoid numbers that are unlikely to be customer IDs (e.g., very small numbers)
    fallback_match = re.search(r"\b\d{5,}\b", query)  # Assuming IDs are at least 5digits long
    if fallback_match:
        return fallback_match.group(0)

    return None

Once we have vectors similar to the query vector, we can access their payloads and use Llama to generate responses. I have defined a prompt template to give more personalized responses.

# Function to generate recommendations based on the customer ID
def get_customer_recommendations(query):
    customer_id = extract_customer_id(query)
    if not customer_id:
        print("Customer ID not found in the query.")
        return

    print(f"Extracted Customer ID: {customer_id}")

    # Step 1: Find the vector for the target customer
    try:
        result = client.scroll(
            collection_name=collection_name,
            scroll_filter=models.Filter(
                must=[
                    models.FieldCondition(key="CustomerID", match=models.MatchValue(value=customer_id)),
                ]
            ),
            limit=1,
            with_payload=True,
            with_vectors=True,
        )
    except Exception as e:
        print(f"Error retrieving customer data: {e}")
        return

    # Extract the vector for the target customer
    if result and result[0][0].vector is not None:
        target_vector = result[0][0].vector
        print(f"Found vector for CustomerID {customer_id}")
    else:
        print(f"No vector found for CustomerID {customer_id}")
        return

    # Step 2: Find similar customers
    try:
        similar_customers = client.search(
            collection_name=collection_name,
            query_vector=target_vector,
            limit=5
        )
        print(f"Found {len(similar_customers)} similar customers.")
    except Exception as e:
        print(f"Error searching for similar customers: {e}")
        return

    # Get articles already bought by the target customer
    target_customer_transactions = result[0][0].payload.get('Transactions', [])
    target_articles = {trans['StockCode'] for trans in target_customer_transactions}

    # Collect articles from similar customers
    new_articles = []
    for res in similar_customers:
        payload = res.payload
        for transaction in payload.get('Transactions', []):
            stock_code = transaction['StockCode']
            if stock_code not in target_articles:
                new_articles.append(transaction)

    # Define a prompt template
    prompt_template = """
    User {customer_id} has interacted with the following articles: {existing_articles}.
    Based on similar users, suggest 2 new articles that User {customer_id} has not interacted with yet.
    Also, provide a reason for each recommendation.

    Here are some articles that similar users have interacted with:
    {similar_user_articles}
    Please provide your recommendations and reasons.
    """

    # Prepare data for the prompt
    existing_articles = ", ".join(f"{trans['Description']} (StockCode: {trans['StockCode']})" for trans in target_customer_transactions)
    similar_user_articles = "\n".join(
        f"Article: {article['Description']}, StockCode: {article['StockCode']}, Details: {article}"
        for article in new_articles[:10]  # Limit to a subset for the prompt
    )

    # Fill the prompt template with data
    prompt = prompt_template.format(
        customer_id=customer_id,
        existing_articles=existing_articles,
        similar_user_articles=similar_user_articles
    )

    # Use Groq API to get chat completion
    try:
        response = groq_client.chat.completions.create(
            messages=[{"role": "user", "content": prompt}],
            model="llama3-70b-8192"
        )
        # Print the response
        print(response.choices[0].message.content)
    except Exception as e:
        print(f"Error getting response from Groq: {e}")

Sample output:

# Example usage
query = "recommend me articles for customer id 12357"
get_customer_recommendations(query)

It’s pretty brilliant how the LLM is also able to describe the personal style of the customer and offer recommendations based on that.

Retrieval Evaluation

Now up in the final leg of this code would be to implement a sort of retrieval evaluation system.

I implemented the CRAG pipeline by evaluating similarity scores of the retrievals. There could be three cases :
Case 1: If the retrieval score is more than 0.6,
Case 2: If the retrieval score is less than 0.6,
Case 3: If customer id is not present in the prompt or not in the database.

Code for implementing the retrieval evaluation:

# Function to generate recommendations based on the query
def get_customer_recommendations(query):
    customer_id = extract_customer_id(query)

    if not customer_id:
        print("No customer ID found in the query. Fetching general recommendations.")
        external_context = fetch_external_data(query)
        generate_final_response_from_external(query, external_context)
        return

    print(f"Extracted Customer ID: {customer_id}")

    # Step 1: Find the vector for the target customer
    try:
        result = client.scroll(
            collection_name=collection_name,
            scroll_filter=models.Filter(
                must=[
                    models.FieldCondition(key="CustomerID", match=models.MatchValue(value=customer_id)),
                ]
            ),
            limit=1,
            with_payload=True,
            with_vectors=True,
        )
    except Exception as e:
        print(f"Error retrieving customer data: {e}")
        return

    # Check if any results were found
    if result[0]:
        print(f"Found vector and data for CustomerID {customer_id}")
    else:
        print(f"No vector found for CustomerID {customer_id}. Using external data for recommendations.")
        external_context = fetch_external_data(query)
        generate_final_response_from_external(query, external_context)
        return

    # Step 2: Extract target customer transactions
    target_customer_transactions = result[0][0].payload.get('Transactions', [])

    # Step 3: Find similar customers
    try:
        similar_customers = client.search(
            collection_name=collection_name,
            query_vector=result[0][0].vector,
            limit=5
        )
        print(f"Found {len(similar_customers)} similar customers.")
    except Exception as e:
        print(f"Error searching for similar customers: {e}")
        return
    for res in similar_customers:
        print(res.score)
        # Check if similarity scores are below a threshold (e.g., 0.1%)
    low_similarity = all(res.score < 0.6 for res in similar_customers[1:])
    print(low_similarity)

    if low_similarity:
        print("Low similarity scores found. Using external data to augment context.")
        external_context = fetch_external_data(query)
        generate_final_response(query, external_context, target_customer_transactions, similar_customers)
    else:
        generate_final_response(query, None, target_customer_transactions, similar_customers)

The two functions generate_final_response as well as generate_final_response_from_external contain the prompt templates and chat completion requests using different contexts as specified above.

Case 1:

For Case 1, if the similarity score is high, a relevant document has been retrieved and we can use it to generate a response.

Sample Output:

# Example usage
query = "recommend me articles for customer id 12535"
get_customer_recommendations(query)

Case 2:

If it is Case 2, then we do a web search and add its result as additional context in the prompt before generating a response.

Sample Output:

# Example usage
query = "recommend me articles for customer id 12556 if the customer is interested in home decor"
get_customer_recommendations(query)

Case 3:

Finally, for Case 3, we just use the search result as context and use Llama3 to generate the final response.

Define a function to collect data from the web as context.

# Function to fetch external data using Tavily API
def fetch_external_data(query):
    # Simulated external search result as an example
    # Step 1. Instantiating your TavilyClient
    tavily_client = TavilyClient(api_key=tavily_api_key)
    # Step 2. Executing a context search query
    external_context = tavily_client.get_search_context(query=query)
    return external_context

Add it to your prompt template as context.

# Define a prompt template for external data
    prompt_template = """
    Based on external data sources, suggest 2 top recommendations for the following query: "{query}".
    Also, provide a reason for each recommendation.

    External data sources provide the following context:
    {external_context}
    Please provide your recommendations and reasons.
    """
    #query_embedding = query_to_embedding(query)
    # Fill the prompt template with data
    prompt = prompt_template.format(
        query=query,
        external_context=external_context
    )

Generate the final response using Llama 3:

# Use Groq API to get chat completion
    try:
        response = groq_client.chat.completions.create(
            messages=[{"role": "user", "content": prompt}],
            model="llama3-70b-8192"
        )
        # Print the response
        print(response.choices[0].message.content)
    except Exception as e:
        print(f"Error getting response from Groq: {e}")

Sample Output:

# Example usage
query = "recommend me some items to buy if I have recently purchased a SET OF 3 BUTTERFLY COOKIE CUTTERS"
get_customer_recommendations(query)

Results

I used a simple Streamlit application to test the results:

Testing and Evaluation

Evaluation Metrics — using Ragas
Ragas is a tool designed to evaluate retrieval-augmented generation (RAG) pipelines. We will use this framework to evaluate our CRAG pipeline results. Here are the key metrics used by Ragas:
Faithfulness: Faithful responses are crucial for maintaining the credibility and reliability of the system. Users must trust that the information provided by the system is based on actual data and not fabricated. It typically involves comparing the response against the source context to ensure all claims are supported and the response does not introduce hallucinations.
Answer Relevancy: A response that is not relevant to the query is not useful to the user. Relevancy is key to user satisfaction and system effectiveness. It is measured using automated scoring against ground truth to determine how well the response answers the query.
Context Recall: It measures the extent to which the relevant information from the retrieved context is used in the generated response. It involves comparing the content of the response with the relevant portions of the context to see how much of the pertinent information is included.
Context Precision: High precision ensures that the response is not only using the context but doing so accurately, without introducing irrelevant information. It checks the response against the context to ensure that the information used is relevant and directly supports the response.

Combining Metrics for Holistic Evaluation

These metrics collectively provide a comprehensive evaluation of a CRAG pipeline. Here’s how they interact:

Faithfulness ensures that the response is trustworthy.
Answer Relevancy ensures that the response is useful.
Context Recall ensures that the response is comprehensive.
Context Precision ensures that the response is accurate.

Testing

Create a test set with queries and ground_truth

# Example usage and evaluation
test_queries = [
    "recommend me articles for customer id 12357 that it has not yet bought",
    "recommend me similar articles for customer id 18228"
]

ground_truth = [
    "Based on the user's history and external data, I recommend the PIGGY BANK RETROSPOT (StockCode: 77). This item matches the user's interest in RetroSpot-themed products and fits their preference for decorative home decor. Additionally, similar users have shown interest in this item, making it a great choice for User 12357.",
"Based on the user's history and external data, I recommend the JUMBO SHOPPER VINTAGE RED PAISLEY (StockCode: 70). This item suits the user's preference for home decor and novelty items, as evidenced by their interest in hot water bottles and metal signs. The bag's vintage and decorative style, along with its vibrant and whimsical design, aligns with the user's taste for quirky and humorous items, making it a likely favorite."
]

answers = []
contexts = []

Call the recommendation function on test queries and prepare a data dictionary to be converted into a Huggingface dataset

for query in test_queries:
    response, context = get_customer_recommendations(query)
    if response is not None:
        answers.append(clean_text(response))
        contexts.append([clean_text(context)])

# Prepare data for evaluation
data = {
    "question": test_queries,
    "answer": answers,
    "contexts": contexts,
    "ground_truth": ground_truth
}

# Convert dict to dataset
dataset = Dataset.from_dict(data)

Evaluate the test set recommendations

# Evaluate the recommendations
result = evaluate(
    dataset=dataset,
    metrics=[
        context_precision,
        context_recall,
        faithfulness,
        answer_relevancy,
    ],
    llm=langchain_llm,
    embeddings=fast_embeddings,
    raise_exceptions=False,
    run_config=RunConfig(timeout=120.0)
)

# Convert the result to a pandas DataFrame for easier analysis
df = result.to_pandas()
df.to_csv('results.csv')
print(df)

As we use the real world dataset, let us have a look at its performance

By evaluating a CRAG pipeline against these metrics, Ragas can provide a nuanced assessment of its performance, highlighting strengths and identifying areas for improvement.

Overall Analysis

High Precision and Recall for Query 2: Query 2 performs exceptionally well across all metrics, particularly in context recall, indicating that the response is both comprehensive and precise.
Balanced Performance for Query 1: While Query 1 has high scores, it does slightly lower on context recall. This suggests that while the response is precise and relevant, it could be improved by including more of the relevant information from the context.
Faithfulness and Relevancy: Both queries show strong performance in terms of faithfulness and relevancy, which are critical for user trust and satisfaction.

Areas for Improvement

Improve Recall for Query 1: Efforts should be made to enhance the context recall for Query 1 to ensure all relevant information is captured in the response.
Maintain High Precision: The high precision scores for both queries are commendable and should be maintained to ensure that the responses remain accurate.
Overall High Performance: The scores suggest that the CRAG pipeline is performing well, particularly for Query 2, and similar strategies can be employed to improve performance where needed.

Conclusion

Summary of Key Points

Recommender Systems: Systems that suggest items or content to users based on their preferences, improving user experience and boosting sales for businesses.
Types of Recommender Systems:

- Collaborative Filtering: Recommends items liked by similar users.
- Content-Based Filtering: Recommends items similar to those a user has shown interest in.
- Corrective RAG (CRAG): Enhances language model responses by evaluating and refining retrieved documents to ensure relevance and accuracy.
Components of CRAG:
Retrieval: Fetches relevant documents from static databases and web searches.
Knowledge Correction: Evaluates and refines the relevance and accuracy of documents.
Generator: Produces the final response using refined knowledge.
We used Qdrant which is a vector search engine for efficient retrieval.
For generating responses we used Llama3.
Evaluation with Ragas: Metrics like faithfulness, answer relevancy, context recall, and context precision are used to assess the performance of the CRAG pipeline.

Recap of What’s Covered in the Tutorial

In this tutorial, we started by exploring the magic behind recommender systems 🪄
- We then dived into Corrective RAG (CRAG), an advanced system that refines these recommendations, to be even more accurate.
- We set up the necessary tools, Qdrant and Llama3, and walked through building a recommendation system from scratch. This involved data preparation, retrieval with Qdrant, and using Llama3 for generating better recommendations.
- Finally, we evaluated our system with Ragas, discussed its performance, and highlighted areas for improvement.

Final Thoughts on the Benefits of Using Corrective RAG with Qdrant and Llama 3

Using Corrective RAG with Qdrant and Llama3 improves the performance of LLM-driven recommendation systems as they increase the precision of the output. Using the web search component would improve both user interaction and answer relevancy.

Suggestions for Further Reading and Advanced Topics

📚 Deep Dive into Recommender Systems: Explore more advanced recommendation algorithms and hybrid approaches.

📚 Advanced CRAG Techniques: Research on enhancing CRAG with LangGraph.

📚 Scaling Vector Databases: Learn about best practices for scaling vector databases to handle massive datasets.

Don’t hesitate to experiment with different datasets, configurations, and parameters to see how they impact the performance of your recommendation system.

Customization: Tailor the system to your specific use cases and requirements, leveraging the flexibility of CRAG, Qdrant, and Llama 3.
Continuous Improvement:
- Right now only the preference of a customer is taken into account but a similar item catalogue vector would make the results more robust.
- Try to add more relevant context in the query to improve the performance.
- Regularly evaluate and refine your system based on feedback and performance metrics to ensure it meets user needs effectively.

Appendix

Additional Resources

🔗 Qdrant documentation https://qdrant.tech/documentation/

🔗 Groq cloud for Llama3 documentation https://console.groq.com/docs/quickstart

🔗 Taviliy API https://docs.tavily.com/docs/tavily-api/introduction

🔗 https://blog.lancedb.com/implementing-corrective-rag-in-the-easiest-way-2/

🔗 https://www.pinecone.io/learn/advanced-rag-techniques/

🔗 https://arxiv.org/abs/2401.15884

Code Repository

🔗https://github.com/AnveshaM/crag_based_recommender_system

Harnessing the Power of Corrective RAG (CRAG): Building High-Precision Recommendation Systems with Qdrant & Llama 3

What are recommendation systems?

What Is Corrective RAG?

Understanding Corrective RAG

Components Involved in Corrective RAG:

1. Retrieval

2. Knowledge Correction

3. Generator

Advantages of Corrective RAG Over Traditional Methods

Let’s now go over the tools needed for this guide.

1. Qdrant: Vector Search Engine

2. Llama 3: Advanced LLM

How Qdrant and Llama 3 Work Together to Create a Seamless Recommendation System

Setting Up the Environment

Installing Qdrant

Installing Llama 3

Other libraries and packages to be installed

Building the Recommendation System

Data Preparation

Converting Data into Vector Embeddings and Storing It in the Qdrant Database

Implementing the Retrieval Mechanism with Qdrant

Retrieval Evaluation

Results

Testing and Evaluation

Testing

Conclusion

Summary of Key Points

Recap of What’s Covered in the Tutorial

Final Thoughts on the Benefits of Using Corrective RAG with Qdrant and Llama 3

Suggestions for Further Reading and Advanced Topics

Appendix

Written by Anvesha Mishra