Member-only story
Implementing Anthropic’s Contextual Retrieval for Powerful RAG Performance
This article will show you how to implement the contextual retrieval idea proposed by Anthropic
Retrieval augmented generation (RAG) is a powerful technique that utilizes large language models (LLMs) and vector databases to create more accurate responses to user queries. RAG allows LLMs to utilize large knowledge bases when responding to user queries, improving the quality of the responses. However, RAG also has some downsides. One downside is that RAG utilizes vector similarity when retrieving context to respond to a user query. Vector similarity is not always consistent and can, for example, struggle with unique user keywords. Furthermore, RAG also struggles because the text is divided into smaller chunks, which prohibits the LLM from utilizing the full contexts of documents when responding to queries. Anthropic’s article on contextual retrieval attempts to solve both problems by using BM25 indexing and adding contexts to chunks.
Motivation
My motivation for this article is twofold. First, I would like to test out the newest models and techniques within machine learning. Keeping up to date with the latest trends within machine learning is critical for any ML engineer and data scientist to most…