The best orchestration tool for MLOps: a real story about difficult choices

Natalia Tsarkova
Exness Tech Blog
Published in
9 min readMar 10, 2023

by Danil Kuznetsov and Natalia Tsarkova.

Introduction

Machine learning has a wide range of possible applications in almost all industries. Model architecture, performance metrics improvement, and optimization of calculations have always been at the center of attention. At the same time, machine learning has not yet gone through the same stages of standardization process as software development has through the past decades. To this day, the field of machine learning does not have a single generally accepted approach to solving problems in terms of practical use of models.

Sooner or later, every company faces the question of automating machine learning processes and ensuring the reproducibility of results — as long as there’s a need to develop models and deploy them to production. To automate regular model prediction, retraining, monitoring, and more, pipelines can be used. Pipelines for ML models usually have several stages such as data preprocessing, model training, metrics calculations, predictions stage, etc. All components of a machine learning pipeline must be executed, or orchestrated, in the correct order. The solution that provides these capabilities is called an orchestrator.

The idea of orchestrators in general is providing a centralized place for repeatable and reproducible pipelines with features like scheduling pipelines, monitoring runs, and relaunching failed workflows. The workflows can be modeled as directed acyclic graphs, or DAGs, which represent each stage and the dependencies between them. An orchestration tool usually doesn’t do the hard work of translating and processing data itself, but tells other systems and frameworks what to do and monitors the status of the execution.

Choosing the right tools for orchestrating pipeline steps is difficult due to their sheer variety. We’ve narrowed our choice down to three: Kubeflow Pipelines, Airflow, and Prefect.

In this article, we’ll share our overview of each, explain our choosing process, and try to answer the questions that inevitably arose in our practice. We’ll be focusing on the issues we haven’t found solutions for and avoiding those already covered in-depth on other resources.

The problem

Our team has been using Kubeflow Pipelines. The choice of Kubeflow was due to the presence of Kubernetes and colleagues who had experience working with setting Kubeflow and Kubernetes. As often happens in life, employees moved on to other projects, and at one point we began to wonder if there is a tool that would suit our needs better than Kubeflow?

We didn’t like several things about Kubeflow. First, the need of preparing .yaml files for every component of the pipeline, and the pattern was not clear. Second, it is impossible to launch Kubeflow pipeline locally, without installing Kubernetes first and building image with your code. This significantly slowed down the debugging process. Third, the CI/CD process which built Docker images and deployed the pipeline to Kubeflow cluster took up to 10 minutes depending on the Gitlab workload. CI/CD was needed to apply even the slightest changes to the code of the models, which could lead to hours of waiting time.

In our case, we needed to deal with just machine learning workflows. At the same time, the models are different from an architecture complexity point of view and need to work with large amounts of data.

The quite obvious alternative was Airflow, as it is one of the most popular and widely used orchestration tools. We’ve also considered Prefect as a competitor, as it positions itself as a replacement for Airflow. Prefect tries to avoid many of Airflow’s shortcomings and is more flexible and easier to use.

Let’s have a look at each of the tools and their key features.

Orchestrators overview

As a measure of each instrument’s popularity, we can use the number of Github stars provided below:

Image source: star-history.com

Airflow has been one of the most popular orchestrating tools for several years.

Airflow

Airflow is a platform for creating, monitoring, and orchestrating pipelines. This open-source Python project was created in 2014 by Airbnb and since 2016 has been developed by Airflow Apache Software Foundation.

Apache Airflow lets you design, schedule, and monitor complex workflows. It can run Python code, run on a Hadoop cluster, execute Bash commands, raise Docker containers and pods to Kubernetes, query your database, and much more. Airflow is used to orchestrate pipelines in different fields, not just ML. The most common example is a data pipeline which extracts data from one source, transforms it, and saves it to another source like a database.

As of 2023, Airflow is the most popular orchestrator on the market.

Kubeflow Pipelines

Kubeflow is a free and open-source machine learning platform that allows for deployment of pipelines to Kubernetes. Kubeflow was released in 2017 by developers from Google, Cisco, IBM, Red Hat, and other tech companies. Kubeflow runs on top of Kubernetes, is fully integrated with it, and takes advantage of cluster features like scalability and flexible resource control. “Kube” in Kubeflow means an orchestration tool for containerized Kubernetes applications. “Flow” means that Kubeflow stands alongside other workflow schedulers such as ML Flow and Airflow. Kubeflow positions itself as a tool for organizing ML pipelines mostly.

Prefect

Prefect is a comparatively new but promising orchestration tool that appeared in 2018. The tool positions itself as a replacement for Airflow, featuring greater flexibility and simplicity. It is an open-source project; however, there is a paid cloud version to track workflows.

The attractive thing about Prefect for us was the ability to easily build the pipeline from functions and also debug/launch it locally without any complicated environment setup. The “hello world” pipeline took us only a few lines of code! And the setup only required installing the Prefect Python library with pip.

Our approach

To identify the positives and the negatives, we tried to implement one of our pipelines both locally and on the server, for each of the tools.

The results and conclusions of this experiment are described below.

Advantages

First of all, let’s consider the strong sides and advantages of each tool from a practical usage perspective.

Airflow:

  • The interface is very intuitive and easy to use, with various visualizations and logging of the process.
  • Organization of experiments, runs, and pipelines in the web UI: all launches and activities exist inside the pipeline (less visual congestion relative to the Kubeflow).
  • Operates in the given environment and doesn’t require Docker images for pipelines. Thus, there is no need for complex CI/CD. Also, pipelines are registered automatically.
  • Airflow is a mature and popular instrument, and its big community also brings documentation and support. The answers to most of the questions can be easily found on the Internet.
  • Airflow has integrations with many tools through Operators, such as MySqlOperator, BashOperator, SparkSubmitOperator, etc.

Kubeflow:

  • Kubeflow provides a lot of flexibility in organizing pipelines and their versions: there can be several versions of the same pipeline and all of them are stored, not only the last one. It is easy to rollback to any previous version.
  • Support for custom environments for different pipelines due to containerization. It can be very helpful in case of several ML models using different Python or module versions.
  • Each machine learning project can be a separate Git repository with CI/CD that registers pipelines on Kubeflow.
  • And last but not least: in our case, there was no need to migrate to another system.

Prefect

  • Easy to set up locally and launch the “hello world” pipeline. Overall local debugging and execution are very convenient.
  • Smooth deployment: it is easy to create a new pipeline as well as another version of an already existing one, and it requires only one line of code.
  • Further interaction with the pipeline is also intuitive regarding running, scheduling, or changing the pipelines. Pipelines are grouped together according to the version. In general, the web interface is easy to use.
  • SSO login via mail.

Weaknesses

Airflow:

  • The airflow environment must have all the libraries that are being imported in all DAGs. Without using containerization all Airflow pipelines are launched within the same environment. This leads to limitations in using exotic libraries or conflicting module versions for different projects.
  • There is no versioning of pipelines, and the latest code is always used.
  • There is a strong limitation for the code structure and organization: all DAGs must be in the same folder in order for them to get into the airflow. That means that all models must be in the same repository.

Kubeflow:

  • Local pipeline launch is only possible if you have Kubernetes installed on your machine. Setting up Kubernetes correctly, as well as maintaining it, is hard. Installing Kubeflow on top of it… Oh well.
  • Lack of structure in launches and pipelines: every single run is presented as a line in the overall list of all runs, without any grouping. At the same time, regular runs and pipelines are shown separately.
  • In general, GUI is not intuitive. and simple operations require way too many actions. For example, updating the scheduled pipeline to the latest version requires the creation of a new pipeline, putting it to the schedule, and then deleting the old one.
  • Each pipeline update requires CI/CD, including image rebuilding and pipeline re-registration, which can be time-consuming.
  • There can be a case when the user wants to run the workflows for a specified historical period. For example, if the pipeline resulted in error several times last month and we need to relaunch the failed pipeline runs. In Airflow this process is known as backfill, while Kubeflow doesn’t support automated pipeline runs for different periods of time in the past.

Prefect:

  • If the flow is not packed into one file, it is not possible to register it from one machine and launch it on another. All the auxiliary files have to be placed into a Docker container, which usually requires configuring a CI/CD pipeline. All this makes it difficult to use Prefect in the case of complex pipelines involving auxiliary files.
  • It is a comparatively new tool that doesn’t have a developed and mature community for now. In fact, there are only two sources of information available: Slack channel and official documentation. There is little information available on the internet.
  • Currently, Prefect moves from version 1 to a second version and the timelines are not so clear. As part of this transition, it may be necessary to migrate streams from version 1 to version 2.
  • The on-premise version has limited functionality: the full version requires a connection to the cloud. Using a cloud version could raise concerns within the security department. Sometimes it is undesirable for the server to be connected to the internet or to pass production data (pipeline metadata in this case) to a third party.

Summary

So, let’s sum up the pros and cons of each tool:

Conclusion. What to use?

Prefect. If you would like an easy setup for a small project, Prefect is perfect. It is suitable if you work within a small team, or have a budget.

Airflow. If you want to run a variety of tasks using a mature and developed tool, maybe with a price of investing time into learning.

Kubeflow. If you already use Kubernetes and want to run ML pipelines, Kubeflow is the right choice.

Post scriptum

We hoped that there would be a clear winner. But as we dove deeper into the aspects of each orchestrator, we realized that the choice won’t be easy. Ultimately, after researching how we can overcome some inconveniences in Kubeflow, we decided to continue using it. Even though the UI could use some improvements in terms of clarity, we didn’t want to give up the advantages of configured CI/CD and containerization, which allowed us to use different environments. Also, for our projects, it is convenient to develop each ML pipeline in separate Git repositories.

We hope that this comparison will help the reader: if not to make a choice, then at least to better understand the practical features of each of the considered orchestrators.

Please feel free to leave a comment if you have questions or stories to share about the orchestrators, or maybe recommend others to use. Thank you very much!

--

--

Natalia Tsarkova
Exness Tech Blog

ML engineer. DS, ML, Python, curiosity, continuous learning and passion for life ❤️