A gentle journey to LLMOps : Zilliz Advent Of Code

Michael Romagne
13 min readJan 3, 2024

--

December is the month of great presents and chocolate, but also the month of the Advent of Code. The Advent of code is a good occasion to enhance your coding skills or start learning a new programming language on a playful theme. The difficulty increases with each day throughout the month, and only a few contestants make it to the end.

While the classical Advent of Code is great, I wanted to adapt this 24-day challenge to enhance my skills in the MLOps / LLMOps field. It turns out that I stumbled upon Zilliz’s creation on December 3rd, at a time when my hope was at its lowest for this challenge. The goal of Zilliz’s Advent Of Code was to encourage developers to get more involved in the open source community by testing one project per day… And they had already made the selection ! The task was to do at least the “getting started” of each project and, for additional points, contribute to the project. Great, I had no excuse not to do it, let’s discover new lands !

I will spare you all the details on the competition and will go straight to the point : The best tools that I discovered this month. Then, I would like to raise some questions to explore in the future : would I replace some of the industry-standard tools by emerging ones ? That’s the interest of this Advent Of Code : to spark curiosity, encourage critical questioning, and drive deeper engagement with projects through practical use or contribution.

A Glimpse into the Best of Open Source LLMOps

This section highlights some of the most notable tools in LLMOps, at least among the ones I have explored so far. Almost all of them are open source and the ones that are not (Mendable, Prodigy, Nvidia TensorRT) serve as references for other open source tools.

It’s challenging to categorize each tool as some of them go much beyond their primary focus, such as ClearML. My goal is to draw parallels between well-established industry standards and rising stars in the field, inspiring you to explore innovative alternatives.

Special mentions

Within the LLMOps tool map above, some libraries show innovative functionalities that are exceptional enough to detail in this section.

1. Is Pachyderm better than DVC ?
https://github.com/pachyderm/pachyderm

I have been using DVC for quite some time now, so I was eager to discover Pachyderm and what is different between these two tools. In the end, Pachyderm is a more complex solution than DVC. Huge advantages of Pachyderm compared to DVC are autoscaling, data parallelization and orchestration without a doubt. The UI is also a big plus.

But complexity also means setup issues. I had problems installing it on minikube, and I saw several posts stating that it’s pretty complicated to maintain, but worth it given the additional features it provides.

So in the end, I think as a data science team grows in size and workload complexity increases, Pachyderm is the way to go. DVC, on the other hand, is much lighter and very dedicated to versioning data but not much more. DVC is more streamlined and suitable for simpler or medium-scale projects, even extending into some production scenarios. The choice between them should be based on the specific requirements and scale of the data project : do you need autoscaling, scheduling, parallelized processing ?

2. BentoML : The go-to library to deploy machine learning models
https://github.com/bentoml/BentoML

If you want to standardize your model deployments and go to production in a blink, BentoML is here for you. Their concepts are clear, you define a service, your runners, and your API is up and running with good performances. It supports adaptive batching, multithreading out of the box with integrations with Triton or VLLM. I already use it on a day to day basis and it saves you a lot of time and allow small teams to deploy their services without the need of more developers.

Now with the OpenLLM library, you can pick trending LLMs on hugging face and deploy them as you would do with any kind of models. The VLLM integration is here to enhance inference performances. However the biggest drawback for me now is the lack of support of fine-tuned model deployments. I think that ONNX serving is still not supported by VLLM and is not a clear point on the roadmap, which would be highly beneficial.

This library will be very quick to test but will save you so much time, especially for small-size teams.

3. Phoenix : Umap visualization off-the-shelf
https://github.com/Arize-ai/phoenix

I was amazed by the embedding analysis feature of Phoenix. Based on Umap, it helps you identify clusters and dig into them, to interpret drift or compare different model versions.

It’s also possible to debug your RAG with traces, which help you to understand the internals of your application (search and retrieval in vector stores, embedding generation, external tools…):

It’s possible to go further by visualizing your vector store. You just have to upload a corpus of your knowledge base along with your LLM application’s inferences to help you troubleshoot hard to find bugs with retrieval. Bugs could be gaps in your documentation, queries that gave bad responses or failures to retrieve relevant context. It’s a perfect tool for anyone leveraging RAG.

It helps so much to understand what is going on with such complex models !

4. Ollama & Quivr : Build your personal assistant in total privacy
https://github.com/StanGirard/quivr

The experience with these two tools was so smooth, it was so impressive. Have a 7B llama-2 model running on your laptop in 5 minutes, sign up in Quivr and query your own data in 5 minutes, keeping your data in your local environment. Like all productivity tool, if you take time to set it well, it can bring you a lot. Create a few brains and be convinced !

5. ClearML : The most advanced Experiment tracking tool, and much more
https://github.com/allegroai/clearml

Every data scientist heard about Mlflow. It was probably the first standard for experiment tracking and model management, and was used in many formations. This tool definitely has a lot of advantages especially its simplicity, but I would recommend ClearML every day over Mlflow.

ClearML automates and streamlines ML workflows, offering experiment tracking with a great UI, MLOps orchestration, data management, and optimized model serving. It integrates easily into existing scripts, supports scalable and reproducible workflows, and features a cloud-ready model serving solution, optimized for GPU and backed by Nvidia-Triton. Like Mlflow, but on steroids.

6. Use DVC with Streamlit for quick and robust experimentation
https://www.sicara.fr/blog-technique/dvc-streamlit-webui-ml

To have a solid experiment setup very quickly and if you do not want to bother with ClearML setup, please have a look at the great article from Sicara above. DVC pipelines to run reproducible experiments with tracking, with Streamlit for visualization.

An example of efficient Open Source MLOps task

After seeing this tool map, you may be curious about the integration of these tools to construct a robust Open Source MLOps stack. This illustration showcases a potential setup for a Data Scientist / Machine Learning Engineer, utilizing an optimal combination of these tools in their working environment.

In Data Science projects, the workflow typically divides in two parts : Experimentation and Model deployment. These phases operate in tandem, alternating and informing each other continuously, rather than following a linear, step-by-step progression.

  1. Experimentation
  • Provision instances on AWS (or any other cloud provider) using Skypilot
  • Model training pipeline using DVC
  • And visualize results with Streamlit
  1. Model deployment
  • Optimize your Deep Learning Models (if applicable) with ONNX Runtime in a DVC Pipeline (data versioning)
  • Package your model, again in another DVC pipeline stage with BentoML
  • Containerize your application within a Gitlab CI using DVC and BentoML
  • Release your BentoML docker image using Helm and Argo CD on a AWS EKS (or any other environment)
  • Monitor your deployment with Grafana and Streamlit

In an upcoming article, I will go into more detail about this Open Source Stack, with a focus on technical aspects. For more information on all the libraries in the MLOps tool map mentioned above, please refer to the Appendix section.

Conclusion

As we conclude this journey in the landscape of LLMOps, I hope that this article broadened your technical horizons and made you want to contribute in the open-source community, just like the Advent of Code event did for me. This exploration sparked my curiosity to further investigate how emerging technologies may replace or complement established industry standards.
A huge thanks to Zilliz for organizing this event !

For those who have tools or practices that are standard in production but weren’t covered here, I am always open to suggestions and learning more. Please feel free to share your thoughts or recommend tools that you think are essential. You can find a compilation of my explorations in my GitHub repo at https://github.com/michaelromagne/advent-of-code-submissions-2023.

Appendix : Details on libraries

Performance testing

LLM raise more evaluation questions than any other kind of model in the past, especially regarding performance rankings and fairness. Performance testing can mean a lot of things : you can check if the model is not too biased, or if the predictions are accurate on some data samples and relabel these if not, you can also visualize the latent space to understand it better, or simply test your deployed service with increasing load.

Fairness

  • Giskard AI : This library automatically detects vulnerabilities in your ML models. They developed numerous test suites for all kind of models, from classical ones on tabular data to cutting-edge LLMs. With these tests, you can tackle issues from robustness to ethical biases, and let the end user send feedbacks.
    https://github.com/Giskard-AI/giskard
  • Trulens : Less popular than Giskard, Trulens is still worth the try if you are working with LLMs, as it’s specialized in this area.
    https://github.com/truera/trulens

Load testing

  • Grafana k6 : Open-source load testing tool that specializes in simulating high-load scenarios and performance benchmarking, using JavaScript for scripting and offering native support for various protocols.
    https://github.com/grafana/k6
  • Locust : Also an open-source load testing tool, renowned for its user behavior simulation capabilities and scripting flexibility in Python, though it may demand more extensive customization for complex load testing scenarios compared to k6.
    https://github.com/locustio/locust

Labeling tool

  • Prodigy : Prodigy is a data annotation tool designed for efficient and flexible annotation workflows, emphasizing active learning and custom scriptability, making it suitable for rapidly creating high-quality labeled datasets for machine learning models. However, it’s not open source.
    https://prodi.gy/
  • Label Studio : Label Studio is an open-source data labeling platform with a user-friendly interface, designed for scalable and collaborative annotation tasks across various data types like text and images.
    https://github.com/HumanSignal/label-studio

Explainability

  • Arize AI — Phoenix : Phoenix by Arize AI is a comprehensive machine learning monitoring and observability platform that offers advanced features such as embedding visualization, empowering users to effectively track, interpret, and optimize the performance of their production LLM in real-time.
    https://github.com/Arize-ai/phoenix

Build LLM Applications

RAG

  • Quivr : This open source library brings the power of generative AI on a local machine in less than 10 minutes. Based on RAG, the user can create multiple “Brains”, or personal productivy assistant, to query his own data. Multiple integrations are available, and you can couple it with Ollama to have a fully private personal assistant.
    https://github.com/StanGirard/quivr
  • Mendable : A platform designed to enhance customer and employee support by deploying RAG pipelines that will retrieve technical knowledge to answer questions, thereby reducing the need for direct team involvement. This efficient solution streamlines information access and improves overall response times, but is not open source.
    https://www.mendable.ai/

Create LLM Chains

  • Haystack by Deepset : Haystack is a LLM orchestration framework to quickly build custom LLM applications. You can connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your data, that are stored in DocumentStores. Haystack proposes already 31 integrations as of December 2023 with for instance HuggingFace, Notion extractor, Milvus, FAISS, Chroma DB…
    https://github.com/deepset-ai/haystack
  • Langchain : Langchain is designed to cut down on the heavy lifting when integrating AI into your projects. It handles a lot of the common tasks with AI models, like managing conversation memory and connecting various AI tasks seamlessly. Plus, it offers a range of ready-to-use prompts, which is a huge time-saver. In a few minutes, you are able to build powerful RAG pipelines or conversational bots, which is awesome.
    https://github.com/langchain-ai/langchain
  • Llama Index : Llama Index is an open-source platform combining Retrieval-Augmented Generation (RAG) with Large Language Models (LLMs) for advanced data indexing, enabling real-time analytics and trend forecasting in diverse market sectors by leveraging deep learning for nuanced, context-aware insights and data-driven decision-making.
    https://github.com/run-llama/llama_index

Ollama : Run Llama locally

  • Ollama is an open-source software framework that enables users to run Llama 2 and other large language models locally on various platforms such as macOS, Windows, Linux, and Docker. It offers a comprehensive model library of open-source models, allowing users to easily customize and manage these models. This library is so good !
    https://github.com/jmorganca/ollama

Vector Databases

The vector database ecosystem is very active. Here are 4 open-source vector DBs I tested so far. They are built to power embedding similarity search and AI applications, particularly RAG. These projects have different ages, Milvus being one of the oldest and more robust vector databases. It’s resilient, battle-tested by thousands of companies relying on it, and its architecture is distributed and fine-tuned for high throughput.

Orchestrators

An orchestrator is an essential component in any MLOps stack as it is responsible for running your machine learning pipelines. To do so, the orchestrator provides an environment which is set up to execute the steps of your pipeline. It also makes sure that the steps of your pipeline only get executed once all their inputs (which are outputs of previous steps of your pipeline) are available. A good ML orchestrator should provid DAGs (Directed Acyclic Graphs) describing workflows, scheduling to run workflows on time and monitoring of these workflows executions.

The game of orchestrators is highly competitive, and my reference for now is still Apache Airflow. It stands out for its robust scheduling and execution of complex data processing workflows. Airflow’s use of Python for pipeline construction offers unparalleled flexibility and simplicity, making it particularly appealing in scenarios requiring dynamic generation of workflows based on changing data or external events. Moreover, its active community and frequent updates ensure it remains a top choice for scalable and maintainable data engineering solutions.

Model deployment

Model optimization

  • ONNX Runtime : ONNX Runtime is a production-grade AI engine that optimizes training and inferencing for latency, throughput, and memory across CPU, GPU, and NPU with different techniques including quantization. It supports cross-platform deployment, including cloud, edge, and mobile.
    https://github.com/microsoft/onnxruntime
  • Nvidia TensorRT : In short, very similar to ONNX Runtime but specialized in fast inference on NVIDIA GPUs exclusively. It beats ONNX Runtime performances on these machines.
    https://github.com/NVIDIA/TensorRT

Inference servers

Model Serving

  • BentoML & OpenLLM : BentoML is a framework for building reliable, scalable, and cost-efficient AI applications. It comes with everything you need for model serving, application packaging, and production deployment. BentoML team also developed OpenLLM, an open-source platform designed to facilitate the deployment and operation of large language models (LLMs).
    https://github.com/bentoml/BentoML
    https://github.com/bentoml/OpenLLM

Others

Data versioning

  • DVC : This open source library integrates with Git for ML experiments management, allowing versioning of data and models in cloud storage. It enables fast pipeline iterations, local experiment tracking, and supports data comparison and sharing. DVC works with multiple remote storage platforms, maintaining a seamless user experience https://github.com/iterative/dvc
  • Pachyderm : Pachyderm is a robust data engineering tool operating within a Kubernetes cluster and designed for automating complex pipelines and handling sophisticated data transformations across various data types.
    https://github.com/pachyderm/pachyderm

Infrastructure provisioning

  • Skypilot : SkyPilot is a framework for running LLMs, AI, and batch jobs on any cloud, offering maximum cost savings, highest GPU availability, and managed execution. It abstracts all the cloud infra burden for Data Scientists, MLE, … Brings a lot of productivity !
    https://github.com/skypilot-org/skypilot
  • Okteto : Okteto enhances Kubernetes application development by enabling code writing locally with instant updates to Kubernetes applications. It replaces traditional deployments with a development container, maintaining original configurations and offering features like bidirectional file synchronization, automatic port forwarding, and an interactive terminal, thereby accelerating the development workflow. A tool that should be tested by any Kubernetes user.
    https://github.com/okteto/okteto

Data Visualization

Experiment tracking

  • ClearML : ClearML automates and streamlines ML workflows, offering experiment tracking, MLOps orchestration, data management, and optimized model serving. It integrates easily into existing scripts, supports scalable and reproducible workflows, and features a cloud-ready model serving solution, optimized for GPU and backed by Nvidia-Triton. Like Mlflow, but on steroids.
    https://github.com/allegroai/clearml
  • Mlflow : Maybe the first standard for experiment tracking and model management. This tool definitely has a lot of advantages especially its simplicity, but I would highly recommend ClearML because it’s much more advanced.
    https://github.com/mlflow/mlflow

--

--