Advanced RAG Techniques in AI Retrieval: A Deep Dive into the Chroma Course

Lakshmi narayana .U
13 min readJan 13, 2024
image generated by author with DALL.E-3

Overview of Advanced retrieval for AI with Chroma course by Anton Troynikov and Andrew NG

The course on “Advanced retrieval for AI with Chroma” is taught by Anton Trovnikov, a co-founder of Chroma, which is an open-source vector database. The course focuses on various techniques to enhance the accuracy of Retrieval Augmented Generation (RAG).

These techniques include Query expansion, which utilizes a Large Language Model (LLM) to rewrite or generate queries that accurately reflect the user’s intent. Another technique highlighted in the course is Document selection, where an LLM is used to select the most relevant sections from the retrieved documents for the RAG context.

The course also explores the process of fine-tuning embeddings. This involves adapting query embeddings based on user feedback. The benefits of these techniques are substantial, leading to a significant improvement in RAG accuracy and reducing common retrieval pitfalls associated with semantic similarity but lacking answer information.

The course content is comprehensive, covering common RAG retrieval failure modes, LLM-based query expansion for improved retrieval, text model scoring and ranking of retrieved results, and fine-tuning query embeddings with user feedback.

Contrasting with the course that utilizes Langchain, an insightful and comprehensive article on the implementation of RAG is provided by Ivan Ilin.

The aforementioned article provides an in-depth analysis of the following topics, primarily focusing on the implementation using Llamaindex.

Naive RAG Approach:

  • Describes text chunking, vectorization, and prompt engineering processes.

Advanced RAG Techniques:

  • Elaborates on complex chunking and vectorization strategies.
  • Discusses search index optimization.
  • Explores context enrichment methods.

Fusion Retrieval:

  • Combines keyword-based and semantic search.
  • Addresses refining retrieval results.

Query Transformations and Routing:

  • Explores their roles in enhancing retrieval quality.

Use of Agents in RAG:

  • Explains their function in complex retrieval tasks.
  • Covers multi-document handling.

After completing the course, it’s a good idea to read the mentioned article. This will help you see a broader view beyond what you learned about the Chroma vector store and consider other options.

Tips for running the course notebooks on your computer.

One of the challenges I encountered while running the notebooks on my computer was conflicts during Python installation.

Even after installing the modules, I would get ‘ModuleNotFoundError’. I fixed this by using python-m install instead of pip install. This makes sure the installation happens in the current environment.

python -m pip install “your library”

Related article: https://stackoverflow.com/questions/44528638/after-pip-successful-installed-modulenotfounderror

Additionally, it is better to configuring Chroma to save and load from your local machine. This way, data will be persisted automatically and loaded on start (if it exists).

client= chromed.PersistentClient (path="/path/to/save/to"

The path is where Chroma will store its database files on disk, and load them on start.

Running the Advanced Retrieval for AI with Chroma course notebooks highlighted the importance of effective troubleshooting, cache management, and hardware considerations in AI. Careful dependency management and adjusting the setup process to specific hardware characteristics improved performance and stability.

A Quick and Simple Recap on RAG

Retrieval-Augmented Generation (RAG) is a cool new way to make artificial intelligence (AI) smarter. It improves Large Language Models (LLMs), which are AI systems that can understand and generate text. RAG does this by adding a search function that pulls in the most relevant and up-to-date information from big databases. This means the AI can use the latest information to answer questions or make decisions.

RAG was first introduced in a research paper by Facebook in 2020. The researchers imagined RAG as a mix of two types of memory: one is like the AI’s existing knowledge, and the other is like a search engine. This combination allows the AI to access and use information more effectively, especially when answering complex questions. This idea has since been developed further and is now used in many AI applications.

In the field of Natural Language Processing (NLP), which is all about helping computers understand and generate human language, RAG has been a game-changer. Traditional language models could generate text, but they often couldn’t bring in extra, specific information while they were generating text. RAG solves this problem by combining the search capabilities of retrieval models with the text-generating skills of generative models. This has opened up new possibilities in NLP, making RAG a crucial tool for tasks that require detailed, informed responses.

RAG works by using two main parts: the retrieval model and the generative model. The retrieval model is like a librarian, pulling out relevant information from databases or collections of documents. This information is then given to the generative model, which is like a writer. The generative model uses the retrieved data to write coherent and informative text, ensuring that the responses are accurate and full of context.

While RAG significantly boosts the capabilities of LLMs, it’s crucial to recognize its dual nature. On one hand, RAG mitigates issues like false information generation and data leakage, enhancing the trustworthiness of AI interactions. However, the quality of RAG’s responses heavily depends on the quality of the retrieved data, highlighting the need for robust and reliable data sources​​.

Course contents of Advanced Retrieval for AI with Chroma

Source: Deeplearning.ai coursefeaturing Anton Troynikov and Andrew NG

The course “Advanced Retrieval for AI using Chroma” offers a detailed exploration of key techniques in AI retrieval. It delves into query expansion, which utilizes a Large Language Model (LLM) to rewrite or generate queries accurately reflecting user intent. The course also covers document selection, where another LLM is employed to select the most pertinent sections from retrieved documents for Retrieval Augmented Generation (RAG) context. Additionally, the course discusses the process of fine-tuning embeddings, which involves adapting query embeddings according to user feedback. These techniques significantly enhance RAG accuracy and reduce common retrieval pitfalls associated with semantic similarity but lacking answer information. The course content is comprehensive, covering common RAG retrieval failure modes, LLM-based query expansion for improved retrieval, text model scoring and ranking of retrieved results, and fine-tuning query embeddings with user feedback.

Putting Course Knowledge into Practice with My Book as a Reference

Source: Advanced Retrieval for AI with Chroma Course (Deeplearning.ai)

Exercise Workbook 1: Implementing Simple RAG Retrieval

Objective: Conducting a Comparative Analysis of Simple RAG Retrieval Outputs Against Other Document-Based Query-Response Mechanisms

Query used: “How to run a meeting?”

query = "How to run a meeting?"

results = chroma_collection.query(query_texts=[query], n_results=6)
retrieved_documents = results['documents'][0]

for document in retrieved_documents:
print(word_wrap(document))
print('\n')
Code execution snapshot: Author’s L1-student.ipynb
def rag(query, retrieved_documents, model="gpt-3.5-turbo"):
information = "\n\n".join(retrieved_documents)

messages = [
{
"role": "system",
"content": "You are a helpful expert financial research assistant. Your users are asking questions about information contained in an annual report."
"You will be shown the user's question, and the relevant information from the annual report. Answer the user's question using only this information."
},
{"role": "user", "content": f"Question: {query}. \n Information: {information}"}
]

response = openai_client.chat.completions.create(
model=model,
messages=messages,
)
content = response.choices[0].message.content
return content

Final response using OpenAI as above

Based on the information provided, running a meeting involves several
key steps:

1. Build a team and assign related tasks to individuals
within the group.
2. Align individual aspirations to company goals,
such as allowing someone interested in photography to be part of a
related project.
3. Consider organizing a townhall-style meeting to
make a formal meeting more informal, with snacks and music to create a
party-like atmosphere.
4. Make induction a small event, such as a
breakfast meeting, to allow new joiners to interact with the rest of
the team.
5. Encourage one-on-one conversations by keeping your door
open for unannounced visits during the day.

In terms of running the
actual meeting, it is important to:

1. Conduct a site survey and
gather all relevant project information before the meeting.
2. Drive
the meeting with a fixed agenda and deviate only when necessary.
3.
Ensure all relevant information has been shared with the meeting
participants.
4. Aim for a clear outcome from the meeting, rather than
just confirming the next meeting.
5. Stick to a planned schedule and
conclude the meeting to accommodate other activities or
commitments.

Ultimately, a meeting should have a purpose and an
agenda, and the participants should stick to the agenda to ensure that
the meeting is productive and focused.

Using Mistral-7B-v0.1-GGUF and LM Studio

Code using LMStudio’s endpoint

openai_client = OpenAI(base_url="http://localhost:1234/v1", api_key="not-needed")
def rag(query, retrieved_documents, model="local-model"):
information = "\n\n".join(retrieved_documents)

completion = openai_client.chat.completions.create(
model="local-model", # this field is currently unused
messages = [
{
"role": "system",
"content": "You are a helpful expert financial research assistant. Your users are asking questions about information contained in an annual report."
"You will be shown the user's question, and the relevant information from the annual report. Answer the user's question using only this information."
},
{"role": "user", "content": f"Question: {query}. \n Information: {information}"}
],
temperature=0.7
)
content = completion.choices[0].message.content
return content

Output

How to run a meeting? It seems you have provided various ways to
conduct a meeting, including aligning individual aspirations with
company goals, making induction a small event, and keeping one on one
conversations. Additionally, you've mentioned prioritizing the meeting
based on how much the other party is willing to chase them for it and
avoiding deviations from an agenda. It's also important to note that
meetings should not be an end in themselves but rather a way of
achieving goals. To run a meeting effectively, it's crucial to have an
agenda and stick to it, involve key people, and prioritize the meeting
based on its importance.

Response using OpenAI assistant

Code block

#create assistant
#1. create an assistant
assistant = client.beta.assistants.create(
name = "Directing Business Q&A",
instructions = "You are a AI Personal Assistant, answer questions on the document provided.",
tools = [{"type": "retrieval"}],
model = "gpt-4-1106-preview",
file_ids = ["file-id"]
)

Output

user: How to run a meeting?
assistant: To run a meeting effectively, the document suggests the following steps:

1. Question the necessity: Before scheduling a meeting, determine if it is required as a form of communication in project management.
2. Share relevant information: Ensure that all pertinent information has been distributed to the concerned parties involved in the meeting.
3. Define the purpose and outcome: Understand the purpose of the meeting and have a clear expectation of the outcome other than just planning the next meeting.
4. Prepare thoroughly: If the meeting is important, prepare extensively for it and strive to complete it on time.

Running a meeting effectively includes sticking to an agenda, involving the key people in preparations, coming to the meeting well prepared, and ensuring that the meeting has a defined purpose, with outcomes other than simply confirming the next meeting【9†source】.

Response using MoviSvami GPT

Source: MoviSvamiGPT based on ‘Directing Business’ book

Conclusion: Context generation is fairly similar across the tests above, however the final response variation has to do with prompts used and the unknown under-the-hood aspects of openAI assistant and GPT builder.

Here’s a sample evaluation with help of ChatGPT-4

Source: Author’s chat with ChatGPT-4
Assumptions and Disclaimers:

RAG Effectiveness: The evaluation of RAG effectiveness for the OpenAI API and Mistral-7B model is based on how well these models integrate and utilize the known provided context. For scenarios 3 and 4, the effectiveness of RAG is not verifiable due to lack of transparency.
Contextual Limitation: Assumes equal understanding of the provided context across all models where context is known. Differences in model training or context processing capabilities are not considered for unknown scenarios.
Subjectivity: Evaluation criteria, especially creativity and engagement, are subjective.
Model Specifics: Specific versions or configurations of each model, which could influence responses, are not considered.
Nature of the Query: The open-ended nature of the query might affect the responses.
Intended Use: Evaluation is based on the text's intended use for practical advice on running meetings.
Summary:

OpenAI API shows high relevance, practicality, coherence, clarity, and RAG effectiveness, indicating a strong alignment with the book's content and effective context utilization.
Mistral-7B Model offers a coherent response that captures key themes but is less detailed, indicating medium RAG effectiveness.
OpenAI Assistant and MoviSvami GPT have unknown RAG effectiveness due to the lack of transparent context handling. They vary in their approach, with the Assistant focusing on direct practical advice and MoviSvami GPT offering a creative, metaphorical angle.

Exercise Workbook 2: Implementing Query Expansion Technique

Objective: Explore the two techniques in Query Expansion — Expansion with Generated Answers and Multiple Queries — to choose the best for a non-fiction book like ‘Directing Business’.

Source: Advanced Retrieval for AI with Chroma Course (Deeplearning.ai)

Expansion with Generated Answer

Original Query: Give a movie example to explain how to drive one’s time?

Joint Query: (with hypothetical answer from gpt-turbo-3.5) Give a movie example to explain how to drive one’s time? One movie example that can be used to explain how to drive one’s time is “The Pursuit of Happyness” (2006) starring Will Smith. Based on a true story, the film showcases a struggling salesman named Chris Gardner who is determined to provide a better life for his son. Despite facing numerous setbacks and hardships, Gardner demonstrates exceptional time management skills throughout the movie. He meticulously plans his day, making the most of each minute by prioritizing tasks and maximizing productivity. His dedication to driving his time is evident as he juggles multiple jobs, studies for exams, and searches for a new job opportunity. Gardner’s unwavering focus on utilizing his time efficiently ultimately leads to his success, highlighting the importance of effective time management in achieving one’s goals.

Plot: Projected query and retrieved documents in the embedding space

Red: Original Query
Orange: Joint Query

Source: Author’s L3-Student.ipynb code execution

Response from gpt-turbo-3.5:

For the Original Query:

One movie example that can be used to explain how to drive one’s time is “The Pursuit of Happyness” (2006) starring Will Smith. The film showcases the character Chris Gardner, a struggling salesman who demonstrates exceptional time management skills. He meticulously plans his day, prioritizing tasks and maximizing productivity. Despite facing setbacks and hardships, Gardner’s unwavering focus on utilizing his time efficiently leads to his success, highlighting the importance of effective time management in achieving goals.

For the Joint query:

A movie example that demonstrates how to drive one’s time is “Groundhog Day.” In the movie, the main character Phil Connors, played by Bill Murray, relives the same day over and over again. At first, he squanders his time, but after witnessing death, he decides to make the most of each day. He develops a schedule and is motivated to help others, which gives him a sense of purpose. By structuring his time and focusing on meaningful tasks, Phil is able to execute his day with efficiency. This example highlights the importance of prioritizing tasks and using time wisely.

Source: Advanced Retrieval for AI with Chroma Course (Deeplearning.ai)

Expansion with Multiple Queries

Augmented queries:

1. How can movie examples be used to illustrate effective time management?\n"
2. What are some movies that portray characters with strong time management skills?\n"
3. How can the concept of time management be applied to movie production?\n"
4. What lessons can be learned about time management from movies?\n"
5. Can you give an example of a movie where time management played a crucial role in the plot?"

Plot: Projected query and retrieved documents in the embedding space

Red: Original Query
Orange: Joint Query

Source: Author’s L3-Student.ipynb code execution

Response from gpt-turbo-3.5:

For the joint query:

A movie example that illustrates effective time management is “The Good, the Bad and the Ugly.” In this movie, there is a scene where Tuco, a character, shoots an one-armed man with a hidden gun. This scene showcases the importance of not wasting time and taking action when necessary. It emphasizes that when you have to shoot, you should just shoot and not waste time talking. This scene conveys the message that time is a valuable commodity and it is in our hands to use it wisely. The book also mentions the importance of time management and how it is in our control to choose how we spend our time. Time management is crucial not only in movie production but also in various aspects of life.

Source: Author’s L3-Student.ipynb code execution

After testing the above approach with several other queries, it has been observed that the technique of Expansion by Multiple queries is more effective for a generic non-fiction book such as mine. However, simple RAG might do the trick.

To conclude, the “Advanced Retrieval for AI with Chroma” course is a valuable resource for anyone delving into the nuances of AI retrieval. It offers deep insights into enhancing RAG accuracy and introduces vital techniques such as query expansion and embedding fine-tuning.

Looking forward, exploring other vector databases like Pinecone and applying these concepts in diverse scenarios and various document types are logical next steps. This exploration not only broadens our understanding but also drives innovation in AI retrieval. As we integrate these learnings, sharing experiences and engaging with the AI community becomes essential for collaborative growth and advancement in this exciting field.

Related links:

  1. Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks: https://arxiv.org/pdf/2005.11401.pdf
  2. Query Expansion by Prompting Large Language Models: https://arxiv.org/pdf/2305.03653.pdf
  3. LM Studio: https://lmstudio.ai/
  4. Massive Text Embedding Benchmark (MTEB) Leaderboard: https://huggingface.co/spaces/mteb/leaderboard
  5. TheBloke/Mistral-7B-Instruct-v0.1-GGUF: https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.1-GGUF
  6. Advanced retrieval for AI with Chroma: https://learn.deeplearning.ai/advanced-retrieval-for-ai/lesson/1/introduction
  7. Advanced RAG Techniques: an Illustrated Overview: https://pub.towardsai.net/advanced-rag-techniques-an-illustrated-overview-04d193d8fec6

--

--