Foundation Model 101 — Dumb Down Retrieval Augmented Generation (RAG)

Changsha Ma
2 min readApr 28, 2023

--

RAG is a data augmentation technique to supplement text-to-text foundation models with information via document search.

RAG is useful in cases when foundation model is not trained with up-to-date data, or when you want to add specific context to the prompt.

There are two types of foundation models needed for RAG. One is text-to-text foundation model, that generates text. The other is text embedding foundation model, that generates embeddings of your data so that it can be easily combined with model inputs and outputs.

To dumb down how RAG works, let’s use a scenario where a sports reporter has a bunch of news reports that she wants to synthesize information from but doesn’t want to read all (or any) of them. ChatGPT won’t help much because its training data is outdated, but RAG can help.

We will first use a text embedding foundation model to create embeddings for those reports, and put the embeddings in a database (it will be a vector database such as Pinecone). The RAG flow will be searching the text prompt (e.g., who won the 2022 world cup) from this database and retrieving the matched report (e.g., 2022 world cup report). Similarity matching will be based on the embeddings. Then the content of the matched report will be combined with the original text prompt, turning it into a context enriched prompt, then sending to to the text-to-text foundation model to generate text output.

With this RAG flow, the reporter can ask any questions relevant to the provided reports and get reliable answer from the text-to-text foundation model.

While RAG is a useful technology, it has its limitation. First, its usefulness relies on the embedding algorithm’s accuracy. Second, its latency depends on the performance of the database used to store and retrieve the embeddings. In addition, the amount of information RAG can add is limited by the context window size of text-to-text foundation model — which means that the number of tokens in the prompt and the completion combined can’t exceed a certain limit.

Hope this post helps you understand RAG better!

--

--