Conducto for CI/CD

Data Stores

Matt Jachowski
Conducto
Published in
3 min readApr 8, 2020

--

Many CI/CD pipelines generate artifacts like binaries, caches, or intermediate results that need to be stored for some amount of time. You cannot simply write these artifacts to the local filesystem, because each command runs in a container with it’s own filesystem that disappears when the container exits. And, in cloud mode, containers run on different machines, so there is no shared filesystem to mount. So, Conducto supports a few different approaches that work in a containerized world.

Explore our live demo, view the source code for this tutorial, or clone the demo and run it for yourself.

git clone https://github.com/conducto/demo.git
cd demo/cicd
python data_stores.py --local

Alternatively, download the zip archive here.

Your Own Data Store

There are many standard ways to store persistent data: databases, AWS S3, and in-memory caches like redis, just to name a few. An exec node can run any shell command, so it is easy to use any of these approaches. Here we populate environment variables pointing to our redis service, allowing us to write to and read from redis in a python script.

write_cmd = "python redis_example.py --write"
read_cmd = "python redis_example.py --read"
env = {"REDIS_HOST": "...", "REDIS_PORT": "6379"}
image = co.Image(
"python:3.8-alpine",
copy_dir="./code",
reqs_py=["redis", "Click"]
)
with co.Serial(image=image, env=env) as redis_store:
co.Exec(redis_write_cmd, name="redis_write")
co.Exec(redis_read_cmd, name="redis_read")

Use conducto-temp-data

Conducto’s conducto-temp-data command provides access to a pipeline-local key-value store. This data is only visible to your pipeline and persists until your pipeline is archived.

Here is the condensed interface. See the full interface here.

usage: conducto-temp-data [-h] <method> [< --arg1 val1 --arg2 val2 ...>]

methods:
delete (name)
exists (name)
get (name, file)
gets (name, byte_range:List[int]=None)
list (prefix)
put (name, file)
puts (name)
url (name)
cache-exists (name, checksum)
clear-cache (name, checksum=None)
save-cache (name, checksum, save_dir)
restore-cache (name, checksum, restore_dir)

One useful application is storing binaries in a build node, and retrieving them in a later test node. We exercise the put and get commands to do this.

build_cmd = """set -ex
go build -o bin/app src/app.go
conducto-temp-data put --name my_app_binary --file bin/app
"""
test_cmd = """set -ex
conducto-temp-data get --name my_app_binary --file /tmp/app
/tmp/app --test
"""
# Dockerfile installs golang and conducto.
dockerfile = "./docker/Dockerfile.temp_data"
image = co.Image(
dockerfile=dockerfile, context=".", copy_dir="./code"
)
with co.Serial(image=image) as build_and_test:
co.Exec(build_cmd, name="build")
co.Exec(test_cmd, name="test")

In local mode, temp data lives on your local filesystem. In cloud mode, temp data lives in AWS S3.

Use conducto-perm-data

Conducto’s conducto-perm-data provides access to a global persistent key-value store. This is just like conducto-temp-data, but data is visible in all pipelines and persists beyond the lifetime of your pipeline. You are responsible for manually clearing your data when you no longer need it.

Here is the condensed interface (the same as conducto-temp-data). See the full interface here.

usage: conducto-perm-data [-h] <method> [< --arg1 val1 --arg2 val2 ...>]

methods:
delete (name)
exists (name)
get (name, file)
gets (name, byte_range:List[int]=None)
list (prefix)
put (name, file)
puts (name)
url (name)
cache-exists (name, checksum)
clear-cache (name, checksum=None)
save-cache (name, checksum, save_dir)
restore-cache (name, checksum, restore_dir)

One useful application is restoring a python virtual environment to avoid repeatedly installing the same requirements across nodes and pipelines. We exercise the various cache commands to do this.

restore_and_test_cmd = """set -ex
checksum=$(md5sum requirements.txt | cut -d" " -f1)
conducto-perm-data restore-cache \
--name my_env --checksum $checksum --save_dir venv
. venv/bin/activate
pip list
"""
image = co.Image(
"python:3.8-alpine", copy_dir="./code", reqs_py=["conducto"]
)
test = co.Exec(restore_and_test_cmd, image=image)

Perm data is stored just like temp data. In local mode, perm data lives on your local filesystem. In cloud mode, perm data lives in AWS S3.

That’s it! Now, with the information you learned in Your First Pipeline, Execution Environment, Environment Variables and Secrets, Node Parameters, CI/CD Extras, and here, you can create arbitrarily complex pipelines.

--

--