What on Earth is Retrieval-Augmented Generation (RAG)?

How RAG is making LLMs more accurate, reliable, and versatile

Published in

The ML Practitioner

3 min readOct 27, 2023

Imagine you’re a student and you have to write an essay on a topic. You could try to remember everything you know about the topic and recite it yourself, but that would be quite challenging and the resulting work might not be totally accurate.

A better approach would be to go to the library and find some books and articles about the topic. You could read them through and learn about the topic from facts provided in the books. Then, you could write your essay based on what you learned.

Similarly, retrieval-augmented generation (RAG) is a way for LLMs (large language models) to write reports and answer questions more accurately and reliably.

In essence, this is how it works:

It seems complicated, right? Not really.

The diagram from the referenced, original paper depicts a RAG architecture. The RAG process begins with a Query Encoder, q, that translates input queries into a representation, q(x). q(x) is then passed to a non-parametric Retriever, pη, which searches a Document Index for relevant documents using an efficient search mechanism called MIPS (Maximum Inner Product Search). Retrieved documents are then used by a parametric Generator, ρθ, to produce a response.

Still don’t get it? Here is an even simpler summary:

The RAG system involves a model that takes a query, searches for related information from a knowledge base, and then generates an answer based on the related information.

So instead of your normal LLM query flow:

…you incorporate an external knowledge base such as a vector database to retrieve top-k relevant context during inference time:

Obviously, this requires that a knowledge base is constructed ahead of time. This can be done by passing material, such as documents and other datasets, through an embedding model to generate embeddings, saving these to a vector database. There are tons of tutorials, including this one, that detail how to do it.

RAG is a powerful framework designed to reduce hallucination in LLMs by generating truthful responses through augmentation with a pre-constructed knowledge base. While it is still under heavy development, deploying RAG (and LLMs in general) in a scalable manner remains a challenge.

Have you implemented RAG in your work?

Thank you for reading! The ML Practitioner is edited and curated by Livia Whitermore.

Don’t miss out on cutting-edge ML trends, tips, and discussions. Become part of an exclusive community of forward-thinkers. Stay connected and advance your knowledge with our weekly insights.

🚀 [Subscribe to Our Newsletter] 🚀

Connect and contribute! Find us on LinkedIn to submit an article or share your thoughts.

What on Earth is Retrieval-Augmented Generation (RAG)?

How RAG is making LLMs more accurate, reliable, and versatile

Written by Enoch Kan