Darren Wallace
Darren Wallace

Darren Wallace

AI and LLMs

3 stories

An overview of the RAG pipeline. For documents storage: input documents -> text chunks -> encoder model -> vector database. For LLM prompting: User question -> encoder model -> vector database -> top-k relevant chunks -> generator LLM model. The LLM then answers the question with the retrieved context.
An image of an advanced RAG pipeline with two-step retrieval. First, a bi-encoder is used to find similar embedding vectors. Then, a cross-encoder model is used to narrow these candidates down to the top k most relevant documents.
Darren Wallace

Darren Wallace

Develop educational web applications by day. Wannabe tortured artist. Survivor - but only just.