What to consider before choosing Argo Workflow?
To go full Kubernetes-native or not?
The recent explosion of tools including task and data orchestration tools should make you wonder if you’re still doing the right thing. Purely based on Github-stars of the open-source frameworks, Airflow is still the most popular one. This does not take into account the popularity of closed-source, or cloud vendor tools. Understanding where they overlap or differ has been described fairly well by others (this one, or that one).
This articles focuses on the least overlapping one: Argo Workflow.
Task orchestration
Imagine you are tasked with four items: cleaning data, training a model, evaluating a model, using the model to make inferences on unseen data. In the beginning, this happens ad-hoc, because you’re the only one in the organization performing them. This is fine, as you manage to deliver what the downstream consumer expects from you. It required you almost zero initial effort. Victim of your own success,
- your team grows,
- more similar use-cases are being worked out,
- and more other teams and products will depend on you performing these tasks in a stable manner.
This is when a task orchestrator comes into play. You leverage this tool to model each task as being a vertex (node) in a graph of tasks. An edge (arrow) represents an execution dependency. This type of graph is called a direct acyclic graph (DAG). You rely on your orchestrator to trigger and monitor these flows reliably.
Secondly, your orchestrator should be language or framework agnostic. You might start-off with a Python specific orchestrator (like Luigi) because it seems easy to get-started. Eventually there will be another “hot thing” that your orchestrator will need to support. Back to the drawing board… Running a task in container on k8s offers this flexibility.
Thirdly, defining a workflow should be as light as possible for end-users. The cognitive load of reading (let alone maintaining) a 250-line DAG-definition is not to be underestimated. Look for easy templating options to remove boilerplate configurations.
Using Argo Workflows
Argo Workflows is an open source container-native workflow engine for orchestrating parallel jobs on Kubernetes (k8s). Argo Workflows is implemented as a k8s custom resource definition (CRD). CRD’s are used to define custom API objects. It allows for the extension of the vanilla k8s-experience in a k8s-compliant fashion.
Argo Workflow is part of the Argo project, which offers a range of, as they like to call it, Kubernetes-native get-stuff-done tools (Workflow, CD, Events, Rollouts).
Users can interact with it through the Argo CLI, UI or via kubectl
.
To get a better feel of what the end-user will be dealing with, let’s go over a few key concepts.
Core concepts
A Workflow
is the most fundamental object. It defines and stores the state of a workflow. Consider it as a dynamic object.
apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
generateName: hello-world- # Name of this Workflow
spec:
entrypoint: whalesay # Will run "whalesay" step first
templates:
- name: whalesay
container:
image: docker/whalesay
command: [cowsay]
args: ["hello world"]
Although the basis will always be to run a container, there are other template
types, which are divided into 2 groups.
Template definitions
These define actual work to be done in a step.
container
— most popular typescript
— templatable convience wrapper forcontainer
- name: gen-random-int
script:
image: python:alpine3.6
command: [python]
source: |
import random
i = random.randint(1, 100)
print(i)
resource
— any operation on Kubernetes resources
Example here creates a ConfigMap.
- name: k8s-owner-reference
resource:
action: create
manifest: |
apiVersion: v1
kind: ConfigMap
metadata:
generateName: owned-eg-
data:
some: value
suspend
— more useful than you think
- name: delay
suspend:
duration: "20s"
Template invocations
These invoke other template types. Typically defines a structure.
steps
— define steps in a “list of lists” way
Example here runs step1 first, then step2a and step2b in parallel.
- name: hello-hello-hello
steps:
- - name: step1
template: prepare-data
- - name: step2a
template: run-data-first-half
- name: step2b
template: run-data-second-half
dag
— define steps as a dependency graph.
- name: diamond
dag:
tasks:
- name: A
template: echo
- name: B
dependencies: [A]
template: echo
- name: C
dependencies: [A]
template: echo
- name: D
dependencies: [B, C]
template: echo
WorkflowTemplate to the rescue!
In order to reduce the number of lines of text in Workflow
YAML files, use WorkflowTemplate
. This allow for re-use of common components. The basic hello-world example becomes. This is similar to the k8s-native podTemplate
.
apiVersion: argoproj.io/v1alpha1
kind: WorkflowTemplate
metadata:
name: workflow-template-submittable
spec:
arguments:
parameters:
- name: message
value: hello world
templates:
- name: whalesay-template
inputs:
parameters:
- name: message
container:
image: docker/whalesay
command: [cowsay]
args: ["{{inputs.parameters.message}}"]
More concepts and examples can be found in the documentation.
Cool, but why Argo Workflow and not just Airflow or something else?
Argo is designed to run on top of k8s. Not a VM, not AWS ECS, not Container Instances on Azure, not Google Cloud Run or App Engine. This means you get all the good of k8s, but also the bad.
If you are already quite invested in k8s, then it makes sense to first look at Argo. You will recognise all of the mechanisms known in vanilla k8s.
The good
- Resilience to container crashes and failures, inherited from k8s.
- Autoscaling and options to configure this. Simultaneously triggering 100’s or 1000’s of Argo Workflows is not a problem with minimal tuning (setting cpu and memory requirements per task correctly, etc).
- Possibility for endless configurability.
- Full support for RBAC, inherited from k8s. Their RBAC model also integrates nicely with SSO. For full isolation requirements (each project has its own k8s namespace and own privileges), common in enterprises, this is a big plus compared to Airflow.
The bad
The relevance of the following three considerations will depend on your situation at hand.
#1. Everyone will write and maintain YAML files
A short YAML file for a single project is maintainable. Once the number of workflows start increasing, and the requirements become more complex, Argo offers you tricks and templating features to keep it manageable.
If you’re organization is used to this way of working (thanks to the use of other k8s-native tools), then you might find it acceptable. Otherwise, don’t jump for it yet.
Just look at the official examples to get a feel of how your repo will look like.
#2. Users will need to be Kubernetes experts
If your team consists of seasoned k8s experts, using Argo will feel like second nature. A novice user will first need to understand containers and k8s. That burden might be a huge slow down initially. On the other side, this cost is fair if IT management is betting on the full k8s way of working.
For maintainers of the Argo setup it is even more important to know your way in k8s. Most probably also be very knowledgeable of AWS, GCP or Azure.
#3. Maintenance of a full-fledged enterprise setup is heavy
Installing Argo Workflow on an existing k8s cluster is relatively easy.
kubectl create ns argo
kubectl apply -n argo -f https://raw.githubusercontent.com/argoproj/argo-workflows/stable/manifests/namespace-install.yaml
Maintaining all the YAML files to enable for an enterprise-IT-security-compliant setup is not something to be lighthearted about. Have a look at the number of configurations options of the community-maintained Helm-chart to get a feel for the number of moving parts.
To put this more into perspective, configuring and supporting Airflow (and others) to the highest security compliance levels is equally non-trivial. This could explain the high number of “fully-managed” orchestrators out there. For example, AWS recently released Amazon Managed Workflows for Apache Airflow. The industry’s cry has been heard.
Conclusion
If you are already heavily invested in Kubernetes, then yes look into Argo Workflow (and its brothers and sisters from the parent project).
The broader and harder question you should ask yourself is: to go full k8s-native or not? Look at your team’s cloud and k8s experience, size, growth targets. Most probably you will land somewhere in the middle first, as there is no free lunch.