TDS Archive

An archive of data science, data analytics, data engineering, machine learning, and artificial…

Member-only story

Evaluating RAG Pipelines with Ragas

Leveraging the Ragas framework to determine the performance of your retrieval augmented generation (RAG) pipeline

David Hundley
TDS Archive
Published in
21 min readJun 30, 2024

--

Title card created by the author

Artificial intelligence is really cool, but for better or worse, the outputs of all AI models are inferences. In other words, these outputs are educated guesses, and we can never be truly certain that the output is correct. In traditional machine learning contexts, we can often calculate metrics like ROC AUC, RMSE, and more to ensure that a model remains performant over time.

Unfortunately, there aren’t mathematical metrics like the aforementioned ones for the deep learning context, which also includes the outputs of LLMs. More specifically, we might be interested in determine how we can assess the effectiveness of retrieval augmented generation (RAG) use cases. Given that we can’t apply some typical mathematical formula to derive a metric, what options does that leave us with?

The first option that is always available is human evaluation. While this is certainly an effective route, it’s certainly not efficient nor always the most reliable. First, the challenge with using human evaluators is that they come with their own biases, meaning that you can’t expect one human evaluator to be consistent with another human evaluator. Additionally, it can…

--

--

TDS Archive
TDS Archive

Published in TDS Archive

An archive of data science, data analytics, data engineering, machine learning, and artificial intelligence writing from the former Towards Data Science Medium publication.

David Hundley
David Hundley

Written by David Hundley

Principal machine learning engineer at a Fortune 50 company, 5x AWS certified, 2x HashiCorp certified, 1x GCP certified, M.A. in Org Leadership, PMP, ChFC, CSM

Responses (3)