How to Use Docker to run Workflows in Digdag

There is a good chance that you want to allow multiple developers to run their dags/tasks on your Digdag server. It’s painful to maintain the packages/dependencies they need to run their tasks. It specially becomes difficult if they conflict with each other. One way to solve it, is by having some kind of independent virtual environments for each task to run. Even then you need to manage those on the server.

Another important thing is to isolate the underlying Digdag server environment from the tasks (workflows). This is important from the security and stability point of view.

The best way to achieve this is to run the tasks/workflows inside a container. Digdag supports Docker containers. In the below how to we will see an example. Start with adding the docker option inside your .dig file. This will force the Digdag to use the mentioned image to run your workflows.

docker_digdag_test.dig

timezone: UTC
_export:
docker:
image: docker_alpine_python:latest
pull_always: false

+setup:
echo>: start ${session_time}

+status_check:
py>: status_check.run

+teardown:
echo>: finish ${session_time}

This assumes that you have installed the Docker on the server (or local if you are testing locally) and docker daemon is running. It also assumes that image your are using to run the workflows is available for Digdag (It can also pull from any available Docker container registry). We won’t get into installing Docker but you can find about the available images by running

docker images

It should list the image you are planning to use. As you can see in the .dig file I am using **docker_alpine_python** image. Its a custom image built on the top of the base **alpine** image by adding python and some python packages that are required for my workflow to run. As you can see the workflow code is not part of the image. Just the run-time and libraries required to run. Check my Dockerfile below. Now build the image Dockerfile using

FROM alpine:latest

RUN apk add --update \
python \
python-dev \
py-pip \
build-base \
&& pip install plumbum requests\
&& rm -rf /var/cache/apk/*
docker build -t docker_alpine_python .

My workflow is called <strong>status_check</strong> which runs a python script. The python script just prints and webhooks the OS parameters that it is running under. So regardless of your server OS. It should print your Docker OS details. This is just to assure you that the workflow code is running inside the Docker container.

import sys
import json
import digdag
from plumbum import local
import requests

def run():
cat = local["cat"]
os = cat("/etc/os-release")
os_properties = {}
for part in os.split("\n"):
key_value = part.split("=")
if len(key_value) > 1:
os_properties[key_value[0]] = key_value[1]

#current date
date = local["date"]
os_properties["os_date"] = date()

#print str(version)
pip = local["pip"]
#print str(pip("list","--format","json"))
packages = json.loads(pip("list","--format","json"))

version = str(sys.version_info[0])+"."+str(sys.version_info[1])

#session time
session_time = digdag.env.params["session_time"]

payload = {"os":os_properties,"digdag" : {"session_time":session_time}, "python":{"packages":packages, "version": version}}
print str(payload)
x = requests.post('http://webhook.site/2856291a-acd7-4669-8223-2ff349186668', json=payload)

To run locally

$digdag run docker_digdag_test.dig --rerun

Now you can push the same dag to Digdag server and run it there as long as the server has the the Docker image or access to image (through for example dockerhub.com).

If your script is simple and doesn’t depend on any external library you can use one of the pre-built images like alpine. This will remove the step of building and maintaining the Docker image.

All code is on github for you to explore. Hope you found this interesting and useful.

--

--

Thejesh GN ⏚ ತೇಜೇಶ್ ಜಿ.ಎನ್
walkin
Writer for

{Thejesh GN ⏚ ತೇಜೇಶ್ ಜಿ.ಎನ್} Hacker, Maker, Blogger, InfoActivist, Developer, Free & Open SW/HW/Data/Internet, Traveller. https://thejeshgn.com