Workflow using ARGO & Kubernetes

argo — workflow engine for Kubernetes

Before we start you’ll need to have the following packages installed on your computer (I’ve been using Mac -High Sierra 10.13.6)

Requirements

  • Install Hypervisor (I’ve been using HyperKit)
  • Install KubeCTL
  • Install MiniKube — Great tool for running K8 locally.
  • To make sure everything was installed correctly just open a new terminal window and run: hyperkit -v , kubectl -v , minikube

Running a Kubernetes cluster

In order to run Argo you’ll have to run a Kubernetes cluster, Since we’re running it locally we’re going to use minikube.

  • minikube start --vm-driver=hyperkit --kubernetes-version v1.10.0 minikube will start the cluster passing --vm-driver which tell minikube to use hyperkit as the virtual machine driver.
    Another option is to set the virtual machine driver by default using the command: minikube config set vm-driver hyperkit
  • To check if the cluster is running, type: minikube status
  • Once the cluster is up and running you can type: minikube dashboard this will open your browser and take you to the K8 dashaboard.

Adding Argo to our Kubernetes cluster

After loading Argo configuration you should be able to find it in the Kubernetes dashboard
  • This step is optional, Let’s give argo admin permission 
    kubectl create rolebinding default-admin --clusterrole=admin --serviceaccount=default:default

Running your first Argo task

argo submit --watch https://raw.githubusercontent.com/argoproj/argo/master/examples/hello-world.yaml

the hello-world.yaml represent the task:

apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
generateName: hello-world-
spec:
entrypoint: whalesay
templates:
— name: whalesay
container:
image: docker/whalesay:latest
command: [cowsay]
args: [“hello world”]

We can print a list of the running argo tasks using argo list

NAME                STATUS      AGE   DURATION
hello-world-j6bpn Succeeded 3m 1m

To see the sepcific task in my case I had to run: argo get hello-world-j6bpn

Name: hello-world-j6bpn
Namespace: default
ServiceAccount: default
Status: Succeeded
Created: Sat Oct 27 17:46:24 -0700 (4 minutes ago)
Started: Sat Oct 27 17:46:24 -0700 (4 minutes ago)
Finished: Sat Oct 27 17:47:46 -0700 (3 minutes ago)
Duration: 1 minute 22 seconds
STEP PODNAME DURATION MESSAGE
✔ hello-world-j6bpn hello-world-j6bpn 1m

To view the logs from the task I had to type argo logs hello-world-j6bpn

Running your second Argo tasks

Look at you, you’re already a pro!

apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
generateName: loops-maps-
spec:
entrypoint: loop-map-example
templates:
— name: loop-map-example
steps:
— — name: test-linux
template: cat-os-release
arguments:
parameters:
— name: image
value: “{{item.image}}”
— name: tag
value: “{{item.tag}}”
withItems:
— { image: ‘debian’, tag: ‘9.1’ }
— { image: ‘debian’, tag: ‘8.9’ }
— { image: ‘alpine’, tag: ‘3.6’ }
— { image: ‘ubuntu’, tag: ‘17.10’ }
- name: cat-os-release
inputs:
parameters:
- name: image
- name: tag
container:
image: "{{inputs.parameters.image}}:{{inputs.parameters.tag}}"
command: [cat]
args: [/etc/os-release]

Argo UI

Argo comes with some basic UI, in order to get access to it you’ll need to port forwarding the UI port using: kubectl -n argo port-forward deployment/argo-ui 8001:8001

List of all Argo tasks
Argo specific task (That’s our second example)

What’s Next?

Argo is a great framework for processing data, running ETL processes and etc... Its very similar to AirFlow and other workflow frameworks, A good analogy will be, if Airflow is Django Argo is Flask (to a none python developers it mean that AirFlow is battery included compare to Argo).

With a simple UI and a very easy integration with Kubernetes I highly recommend using Argo.

There are number of features Argo support (taken from argo’s github page):

  • DAG or Steps based declaration of workflows
  • Artifact support (S3, Artifactory, HTTP, Git, raw)
  • Step level input & outputs (artifacts/parameters)
  • Timeouts (step & workflow level)
  • Retry (step & workflow level) and resubmit (memoized)
  • Suspend & Resume
  • Cancellation
  • K8s resource orchestration
  • Exit Hooks (notifications, cleanup)
  • Garbage collection of completed workflow
  • Scheduling (affinity/toleration/node selectors)
  • Volumes (ephemeral/existing)
  • Parallelism limits
  • Daemoned steps
  • DinD (docker-in-docker)
  • Script steps

Currently, Argo is still kinda new but I’m sure in the next year we’re going to see a whole bunch of container for processing data that run with Argo.


Errors:

  • Minikube can’t start a cluster: 
    Waiting for SSH to be available…
    I was trying to create a new or start an existing cluster I get this error:
    minikube start — logtostderr — v=3 — vm-driver=hyperkit 
    Starting local Kubernetes v1.10.0 cluster…
    Starting VM…
    I1221 14:34:17.937881 29522 utils.go:100] retry loop 0
    I1221 14:34:17.937982 29522 cluster.go:74] Skipping create…Using existing machine configuration
     cluster.go:82] Machine state: Stopped
    (minikube) Using UUID ....
    (minikube) Generated MAC ....
    (minikube) Starting with cmdline: loglevel=3 user=docker console=ttyS0 console=tty0 noembed nomodeset norestore waitusb=10 systemd.legacy_systemd_cgroup_controller=yes base host=minikube
    Waiting for SSH to be available…
  • Solution:
    Try to delete the hyperkit process id, stop minikube and delete the cluster
rm ~/.minikube/machines/minikube/hyperkit.pid
minikube stop
minikube delete

I hope you find it useful, Leave your comments below, and I encourage you to read more about Argo in their Github page and come up with your own workflow.