Distributed CI/CD with werf

Flant staff
Jun 2, 2020 · 8 min read

PLEASE NOTE: our blog has MOVED to https://blog.flant.com/! New articles from Flant’s engineers will be posted there only. Check it out and subscribe to keep updated!

werf is our Open Source tool to build your applications and deploy them to Kubernetes — continuously & consistently. Today we are excited to announce that werf has learned to operate in a distributed mode!

This new mode is available in werf starting version v1.1.10 (available in v1.1 alpha, beta, ea and stable channels). To benefit from it, minimum efforts are required. Here are the notable features implemented:

  • Storing build cache layers (stages) in the Docker Registry (stages storage);
  • Advanced distributed caching of build cache layers (stages) for stapel builder*;
  • Allowing using of an arbitrary number of runners (persistent or ephemeral) to run werf;
  • Efficient and optimized stages selection and build algorithm;
  • The only external dependency for this feature is a connection to any Kubernetes instance (it will be used to synchronize multiple werf processes working distributedly and to store internal caches).

* In werf, either Dockerfile or a custom builder can be used to build application’s images. Custom builder is called stapel and boasts Ansible support and incremental rebuilds based on Git history.

Note that Dockerfile builder can also be used in distributed mode safely, but without advanced distributed caching of layers at the moment.

Okay, now we would like to start with an overview of what the distributed mode is and which components are needed for it. Then we will explain how to enable the distributed mode and how existing projects that use werf can be switched to it (from a local mode). In the end, we will review a demonstration project that uses distributed werf builds.

Overview

There are several main concepts in werf related to the build process such as stages, images and stages storage. According to the documentation, stages and images are described as follows:

We propose to divide the assembly process into steps. Every step corresponds to the intermediate image (like layers in Docker) with specific functions and assignments. In werf, we call every such step a stage. So the final image consists of a set of built stages. All stages are kept in a stages storage.

The final image consists of stages, stages are stored in a stages storage. What is a stages storage? Prior to version v1.1.10, the only answer was: the local Docker server. From now on, werf allows storing stages in the Docker Registry. Furthermore, werf supports most of the implementations of container registries available today.

By using the Docker registry as a stages storage, it is possible to build images distributedly from multiple hosts. Advanced caching of layers is available for the stapel builder, which is an alternative to Dockerfile with many useful features. For the stapel builder, werf offers an efficient and optimized stages selection and build algorithm:

  • Stages that are already built and exist in the stages storage will be reused when building a new stage.
  • Stage is only pulled from the stages storage when it is needed to build the next stage.
  • Stage that has been pulled from stages storage will remain in the local Docker images cache (automatic garbage collector will remove least used images).
  • Publishing of newly built images is way faster because, at the time when this publication occurs, the Docker registry (which is also a stages storage) already contains all stages of the image (the latter are base layers for an image being published).
  • Build algorithm based on optimistic locking for publishing newly built stages into stages storage: it is guaranteed that only one builder will publish a newly built stage making it available for other builder processes.

Dockerfile builder can also be used to build images from multiple hosts. However, advanced distributed caching of layers is not available for this builder yet (but we have plans to implement it as well).

With distributed mode, werf introduces two levels of caching of Docker images:

  1. Stages which are stored in the stages storage (Docker images in the Docker registry).
  2. Local stages (Docker images) which exist in the local Docker server of each build host.

Also there are images which are published into images repo — lets refer to these as (3).

Caches should be cleared, so werf has a werf cleanup command, which will clean (2) and (3) Docker images.

Local Docker images (1) should be cleared manually for now (only when using distributed mode). It is safe to remove these images with any tool you like (e.g., docker rmi). In future versions, werf will be able to remove these images automatically in build-related commands using LRU algorithm and keep 80% of storage usage on the build host.

Also note that werf distributed mode efficiently uses stages from stages storage (Docker Registry). It only pulls images that are needed to build a new stage layer (pulls only base image) and does not pull any images at all during idle builds (when no images are being built). So in an idle build case, werf will not pull all stages storage images to the build host.

More info about the builder algorithm and architecture is available in the documentation:

To use the distributed mode, werf requires a connection to some Kubernetes cluster. This Kubernetes instance will be used to coordinate multiple werf processes when:

  • selecting and saving stages into the stages storage;
  • publishing images into the images repo;
  • deploying applications from multiple hosts at the same time.

It doesn’t matter whether this very Kubernetes instance is used to deploy an application. The only requirement is: the same K8s instance should be used for a single project.

How is it used? werf creates a cm/werf-PROJECT_NAME ConfigMap in the werf-synchronization namespace for each project. This ConfigMap is used to store so called stages storage cache and for distributed locks. Our lockgate library is used to implement distributed locking over Kubernetes cluster.

Multiple werf processes working with the same project should use the same stages storage and Kubernetes cluster instance.

More info about synchronization can be found in the documentation.

New version of werf offers new commands to work with stages:

  1. werf stages sync copies stages between storages.
  2. werf stages switch-from-local helps migration of an existing project to the distributed mode (more info about such migration is below).

Enabling the distributed mode

To use the distributed mode, you just need to specify the --stages-storage=DOCKER_REPO_ADDRESS param for all werf commands.

Note that DOCKER_REPO_ADDRESS should be a unique Docker repository for each project and cannot be used by multiple projects at the same time (though the same Docker registry of course can be used by multiple projects).

werf ci-env command which is used to plug werf tool into CI/CD exports the WERF_STAGES_STORAGE variable. It contains the address of the Docker repository to store stages and this stages storage will be used by default for all werf invocations. An example of this variable for GitLab CI/CD: WERF_STAGES_STORAGE=CI_REGISTRY_IMAGE/stages

When DOCKER_REPO_ADDRESS has been specified, werf automatically uses werf-synchronization namespace in K8s and current context from default kubeconfig to connect to the cluster. The user can specify an arbitrary namespace with an explicit option (--synchronization=kubernetes://NAMESPACE).

More info can be found in the documentation.

If your project already uses werf with local stages storage, you can migrate it to the distributed mode. The new version of werf comes with specialized tools to perform such a task easily and effortlessly. Please check out our instruction guide for details.

We have also prepared a demonstration project to show how the distributed mode in werf can be used to build symfony-demo application. We will do it via the public GitLab repo: https://gitlab.com/distorhead/symfony-demo.

Here are the steps to set up the distributed werf mode for your project:

  1. Prepare your werf.yaml:
project: symfony-demo
configVersion: 1
---
image: ~
from: ubuntu:16.04
docker:
WORKDIR: /app
# Non-root user
USER: app
EXPOSE: "80"
ENV:
LC_ALL: en_US.UTF-8
ansible:
beforeInstall:
- name: "Install additional packages"
apt:
state: present
update_cache: yes
pkg:
- locales
- ca-certificates
- name: "Generate en_US.UTF-8 default locale"
locale_gen:
name: en_US.UTF-8
state: present
- name: "Create non-root group for the main application"
group:
name: app
state: present
gid: 242
- name: "Create non-root user for the main application"
user:
name: app
comment: "Create non-root user for the main application"
uid: 242
group: app
shell: /bin/bash
home: /app
- name: Add repository key
apt_key:
keyserver: keyserver.ubuntu.com
id: E5267A6C
- name: "Add PHP apt repository"
apt_repository:
repo: 'deb http://ppa.launchpad.net/ondrej/php/ubuntu xenial main'
update_cache: yes
- name: "Install PHP and modules"
apt:
name: "{{`{{packages}}`}}"
state: present
update_cache: yes
vars:
packages:
- php7.2
- php7.2-sqlite3
- php7.2-xml
- php7.2-zip
- php7.2-mbstring
- php7.2-intl
- name: Install composer
get_url:
url: https://getcomposer.org/download/1.6.5/composer.phar
dest: /usr/local/bin/composer
mode: a+x
install:
- name: "Install app deps"
# NOTICE: Always use `composer install` command in real world environment!
shell: composer update
become: yes
become_user: app
args:
creates: /app/vendor/
chdir: /app/
setup:
- name: "Create start script"
copy:
content: |
#!/bin/bash
php -S 0.0.0.0:8000 -t public/
dest: /app/start.sh
owner: app
group: app
mode: 0755
- raw: echo `date` > /app/version.txt
- raw: chown app:app /app/version.txt
git:
- add: /
to: /app
owner: app
group: app

(source in Git)

2. Since running an instance of Kubernetes cluster is required for distributed werf, we will use a Kubernetes cluster provided by the GKE. Prepare kube-config for your cluster and set the BASE64_KUBECONFIG secret variable:

cat .kube/config | base64 -w0 > /tmp/base64_kubeconfig
# copy /tmp/base64_kubeconfig content and set BASE64_KUBECONFIG variable in CI/CD

3. Prepare the .gitlab-ci.yml (build stage):

stages:
- build
Build:
stage: build
script:
- export KUBECONFIG=$(mktemp -d)/kubeconfig
- echo $BASE64_KUBECONFIG | base64 -d -w0 > $KUBECONFIG
- type multiwerf && source $(multiwerf use 1.1 ea --as-file)
- type werf && source $(werf ci-env gitlab --as-file)
- werf build-and-publish
tags:
- werf-demo-runner

(source in Git)

In this demo, we’ve implemented the build stage only. To make a full-featured CI/CD, you will also need the deploy, cleanup and dismiss stages — we will leave them out of the scope of this article.

If you need a complete example, please check our GitLab guide.

4. Make sure your project neither use an explicit --stages-storage param nor the WERF_STAGES_STORAGE environment variable. werf ci-env command will set WERF_STAGES_STORAGE=CI_REGISTRY_IMAGE/stages (which is registry.gitlab.com/distorhead/symfony-demo/stages in our example).

5. Check out the container registry page of the project. It shows built project stages in the Docker Registry stages storage:

6. Now you can try to modify your application source files (src/Kernel.php via merge_requests/2) and rebuild it. The build job takes existing stages from the Docker Registry (stages storage) and rebuilds the gitLatestPatch stage only:

The above output means everything works as intended!

Conclusion

The distributed mode is the next big step for the werf tool, making it more scalable (while requiring minimum efforts on the user’s side to achieve it). This feature is available starting with werf v1.1.10 version.

We’ve described how to enable this mode for your project and how to migrate your existing project that uses local werf mode. Distributed mode is recommended for CI/CD systems, thus it is enabled by default.

Try out werf and stay tuned! Our guide to fully integrate werf with GitHub Actions is coming soon.

PLEASE NOTE: our blog has MOVED to https://blog.flant.com/! New articles from Flant’s engineers will be posted there only. Check it out and subscribe to keep updated!

This article has been written by our system developer Timofey Kirillov.

Flant

We run your production.

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store