Hong T., Jing W. and Jing Z.
Previously, we shared reranker-based RAG pipeline, query augmentation and reranker filtering, RAGAS-based model evaluation, and ground truth dataset generation. In this article, we will demonstrate a use case to put all parts together to build an end-to-end RAG pipeline. It includes both RAG pipelines and model monitoring pipeline (ground truth generation, and LLM metrics generated by RAGAS). The pipeline evaluation is modular and simplified. The next step is MLOps cloud deployment and pipeline monitoring automation. The metrics can be used by project teams to make decisions on model fine-tuning, retraining, or decommissioning.
In the architecture diagram of the pipeline, user queries and documents are injected into the RAG pipeline. A subset of documents can be used to automatically generate a ground truth dataset for model monitoring. In addition, human labels can also be included in the curated ground truth dataset.
In this paper, we utilize metrics provided by RAGAS to compare the performance of a reranker-based RAG pipeline and a regular embedding-based RAG pipeline. Our current use…