Fundamentals

RAG Fundamentals First

Why mastering the basics beats advanced techniques

Paul Iusztin
Decoding ML
Published in
8 min readAug 21, 2024

--

Naive end-to-end RAG flow

To build successful and complex RAG applications, you must first deeply understand the fundamentals behind them. In this article, we will learn why we use RAG and how to design the architecture of your RAG layer.

Retrieval-augmented generation (RAG) enhances the accuracy and reliability of generative AI models with information fetched from external sources. It is a technique complementary to the internal knowledge of the LLMs. Before going into the details, let’s understand what RAG stands for:

  • Retrieval: search for relevant data;
  • Augmented: add the data as context to the prompt;
  • Generation: use the augmented prompt with an LLM for generation.

Any LLM is bound to understand the data it was trained on, sometimes called parameterized knowledge. Thus, even if the LLM can perfectly answer what happened in the past, it won’t have access to the newest data or any other external sources on which it wasn’t trained.

Let’s take the most powerful model from OpenAI as an example, which in the summer of 2024 is GPT-4o. The model is trained on data up to Oct 2023. Thus, if we ask what happened during the 2020 pandemic, it can be answered…

--

--

Paul Iusztin
Decoding ML

Senior ML & MLOps Engineer • Founder @ Decoding ML ~ Content about building production-grade ML/AI systems • DML Newsletter: https://decodingml.substack.com