Detect LLM Hallucinations in CI / CD: Evaluate your RAG pipelines using GitHub Actions + Athina / Ragas

Himanshu Bamoria
Athina AI
Published in
3 min readApr 9, 2024

If you’ve ever worked on coding projects, you know how important it is to make sure your code is solid before showing it to the world.

That’s where CI/CD pipelines come into play. They’re like your coding safety net, catching bugs and problems automatically.

So why not have the same process for your LLM pipeline?

The best teams will implement an evaluation system as part of their CI / CD system for their RAG pipelines.

This makes a lot of sense — LLMs are unpredictable at best, and tiny changes in your prompt or retrieval system can throw your whole application out of whack.

Athina can help you detect mistakes and hallucinations in your RAG pipeline your code’s quality with a really simple integration. We’re going to walk you through how to set this up using GitHub Actions.

You can use Athina evals in your CI/CD pipeline to catch regressions before they get to production.

Here is a guide for setting athina-evals in your CI/CD pipeline.

All code described here is also present in our GitHub repository.

GitHub Actions

We’re going to use GitHub Actions to create our CI/CD pipelines. GitHub Actions allows us to define workflows that are triggered by events (pull request, push, etc.) and execute a series of actions.

Our GitHub Actions are defined under our
repository’s .github/workflows directory.

We have defined a workflow for the evals too. The workflow file
is named athina_ci.yml.

The workflow is triggered on every push to the main branch.

Athina Evals Script

The run_athina_evals.py script is the entry point for our Athina Evals. It is a simple script that uses the Athina Evals SDK to evaluate and validate the Rag Application.

For example, we are testing if the response from the Rag Application answers the query using the DoesResponseAnswerQuery evaluation from athina

You can also load a golden dataset and run the evaluation on it.

You can also run a suite of evaluations on the dataset.

Secrets

We are using GitHub Secrets to store our API keys.

We have two secrets, OPENAI_API_KEY and ATHINA_API_KEY.

You can add these secrets to your repository by navigating to Settings > Secrets > New repository secret.

Further reading

We have more examples and details in our GitHub repository

Alright, we’ve covered how to add Athina to your CI/CD pipeline with GitHub Actions — with this simple modification, you can ensure your AI is top-notch before it goes live.

If you’re interested in continuous monitoring and evaluation of your AI in production, we can help.

Watch this demo video of Athina’s platform, and feel free to schedule a call with us if you’re interested in setting up safety nets for your LLM.

--

--

Himanshu Bamoria
Athina AI
0 Followers
Editor for

Co-founder, Athina AI - LLM observability platform that help developers build reliable LLM applications