The Hands-On LLMs Series

Why you must choose streaming over batch pipelines when doing RAG in LLM applications

Lesson 2: RAG, streaming pipelines, vector DBs, text processing

Paul Iusztin
Decoding ML
Published in
12 min readJan 9, 2024

--

Image by DALL-E

→ the 2nd out of 8 lessons of the Hands-On LLMs free course

By finishing the Hands-On LLMs free course, you will learn how to use the 3-pipeline architecture & LLMOps good practices to design, build, and deploy a real-time financial advisor powered by LLMs & vector DBs.

We will primarily focus on the engineering & MLOps aspects. Thus, by the end of this series, you will know how to build & deploy a real ML system, not some isolated code in Notebooks (we haven’t used any Notebooks at all).

More precisely, these are the 3 components you will learn to build:

  1. a real-time streaming pipeline (deployed on AWS) that listens to financial news, cleans & embeds the documents, and loads them to a vector DB
  2. a fine-tuning pipeline (deployed as a serverless continuous training) that fine-tunes an LLM on financial data using QLoRA, monitors the experiments using an experiment tracker and saves the best model to a model registry
  3. an inference pipeline built in LangChain (deployed as a serverless…

--

--

Paul Iusztin
Decoding ML

Senior ML & MLOps Engineer • Founder @ Decoding ML ~ Content about building production-grade ML/AI systems • DML Newsletter: https://decodingml.substack.com