How We Built Our Own CI/CD Framework From Scratch (Pt. II)

Jorge Claro
Pipedrive R&D Blog
Published in
7 min readNov 24, 2020
From https://www.pbctoday.co.uk/

In Pt.I, I introduced some context into the reasons behind creating Fregatt. Now let’s take a look into how all of it actually works.

How were the Fregatt engine foundations built?

To make the process of running the Fregatt engine easier and more future proof, we knew that it should be capable of being executed inside a container. We started by building the Fregatt docker image and mounting the correspondent daemon socket so it could spin-up sibling containers.

/var/run/docker.sock:/var/run/docker.sock

On our laptops, this worked as expected, but the process was really slow once we started to adding complexity. For example, deciding to spin up another Fregatt container from the previous one added noticeable slowness (more details on this later).

The root cause resided in the fact that the docker client was no longer built exclusively on top of 100% static libraries, thereby making it prone to data corruption.

To overcome this, and be able to run the engine on our infrastructure, we had to mount the correct SSH keys to access docker daemon REST API, instead of passing commands through the socket.

Being able to spin-up sibling containers solved one of the major requirements specified in our previous article — an ability to execute sequential and parallel steps, where each one is run in an isolated way.

How to specify pipeline flow in a way that the engine can interpret it?

Regarding configuration, our main objective was to keep it as simple as possible. We wanted to configure it in a way that would allow us to store it on a file that is easily shared and versioned, just like we do with regular code.

All of the most common configuration formats should be able support it, but since we started with a declarative approach, the YAML format seemed the most appropriate one.

We started with the concept of pipeline, stage and step. Pipelines and stages would be used to control the flow of execution and this flow would be the first thing written on the file.

The steps on the other end would represent the actual executors. These are specified with some reference name under one or more stages as part of the flow, and its details were written on the bottom of the file, resulting in the following structure:

pipeline:
pipeline-name-1:
stages:
- name: stage-name-1
steps: ['step-name-1', 'step-name-2']
pipeline-name-2:
stages:
- name: stage-name-2
steps:
- step-name-1
- step-name-2
- step-name-3
- step-name-4
- name: stage-name-3
steps: ['step-name-5','step-name-6']
steps:
step-name-1:
...
step-name-2:
...
step-name-3:
...
step-name-4:
...

The “step-name-1” and “step-name-2” configuration is shared between “pipeline-name-1” and “pipeline-name-2” (for example), to minimize duplication of code. When applying this concept to previous assumptions of sibling containers, we can now represent the flow of the execution over time as you can see represented in the following diagram:

The container running the Fregatt engine would be the one consuming and parsing the configuration from the file and then controlling the flow of execution. Stages would be executed sequentially, while the steps in each one would be executed in parallel.

What capabilities should be implemented for each step block?

Considering that every step block is supposed to execute inside a container, the most obvious solution was to specify:
— which container should be spin-up;
— what base image should be pulled;
— what command to be executed;
— what volumes should be mounted;
— or even what environment variables should be injected.

At the end of the day, every step should be stateless, based on small images, and easily configurable. Something similar to:

steps:
step-name-1:
container:
image: image-name
command: cmd
binds:
— '/home:/home'
environment:
— EXAMPLE_ENV_VAR=example
console:
stdout: true
stderr: true

From this point, we were able to get some simple flows working with static values. We finally reached an important milestone at this point.

How could we inject dynamic values into part of the pipeline flow?

Now that we have pipelines with static values working, the next challenge was to make these yaml files behave as templates, allowing injection of dynamic values. Just like in many other CI/CD platforms, or even in docker-compose.yml files, we could see that they were using ${} delimiters to define variables as sources of information, and so we did the same.

At this point we could use the following pattern whenever needed:

environment:
— EXAMPLE_ENV_VAR=${EXAMPLE_ENV_VAR}
— NEW_ENV_VAR=${EXAMPLE_ENV_VAR_2}

To avoid repetition we decided to also add a “pass_environment” field if there was no environment variable renaming needed, resulting in:

pass_environment:
— EXAMPLE_ENV_VAR
environment:
— NEW_ENV_VAR=${EXAMPLE_ENV_VAR_2}

With the variable replacement we fixed the initial parameterization with dynamic values, fulfilling another milestone.

How can we produce & share artifacts across different steps in runtime?

At this point, the steps were only acting as consumers. The usage of environment variables proved to be simple and efficient, but for some new use cases, it was still lacking communication between steps.

We knew that we could use files with some predefined structure. Archive and store them on a known location that could be mounted by the pipeline containers, but that was too complex and difficult to implement.

It would be great if we could adapt our engine so that every step could store and consume those environment variables easily from somewhere. These artifacts should be kept available during the pipeline flow execution, and then destroyed at the end of the process.

We thought about different storage and interfacing solutions, and the result was one of the easiest to implement: a REST API storing values into a bucket, running as a sidecar container to the main execution. This process ended up achieving the requirements, yet was still recently revised, moving the API into the main engine source, integrated natively within the control of the flow.

Some other types of execution step blocks were added, to make the transition from our old pipeline to the new one easier. The execution of simple shell commands were obvious, but we also needed to trigger a Jenkins jobs as part of the pipeline flows. The draft below shows how the required values were collected to trigger the Jenkins build and how the results of it affect the pipeline result.

jenkins:
name: ${JENKINS_JOB_NAME}
parameters:
param-1: value-1
param-2: ${ENV_VAR2}
wait_for_finish: true
allow_failure: false
ignore_if_not_found: true

How to make the configuration easier for developers while maintaining control of some mandatory steps?

Even though we provided some templates, we still wanted to allow developers to implement their own pipelines if needed. The most straight forward solution was to consume a pipeline.yml file from the services repository, but this raised some questions.

The simple process of cloning the repository should be done before we could even consume the pipeline configuration file. We tried to execute our own pipeline in advance, and it worked, but now we had to handle the execution of two pipelines.

What would happen if we decided to implement some reporting in the end of the flow? This should still be configured on a developers configuration file, exactly what we wanted to avoid. We quickly realized that this solution was not scalable.

What if we could spin-up the Fregatt engine as part of the pipeline.yml file?

Is it even possible? Well, it turns out, the answer is yes. After doing some research and refactoring our engine code, we were able to spin-up a new Fregatt engine container that will consume all the environment variables, provide the REST API to sibling ones, and execute a new pipeline configuration, behaving just like a simple step from the perspective of the parent one. Parallel pipelines were now possible to configure.

With this solution, we kept control of the main pipelines, executing common tasks like the clone of the repository, or even the reporting steps in the end of the process, while allowing developers to provide their own custom configurations.

Was it possible to define the whole pipeline flows in a declarative way?

Regarding this, we actually adopted a mixed solution. While preferring the declarative definition, meaning that it is easier to understand and debug, it’s also more limitative than a solution implemented in regular code like Typescript (the language that we used to implement Fregatt engine by the way).

Some of these complex but commonly used execution blocks are provided as part of the Fregatt docker base image and can be executed as regular NPM scripts.

How can we test changes on the Fregatt engine without breaking production deployments?

Considering that the Fregatt engine is itself run as a container, it implies that its own docker image should be available from the docker registry to be pulled on production machines whenever needed.

This lead us to create a pipeline.yaml file that will take care of all the necessary steps to build the Fregatt engine. Build its own docker image, publish it to the docker registry, and validate the new version of the engine against a predefined set of workflows.

If everything works correctly we can tag the new version as the production one while keeping track of previous ones in case we need to rollback.

Conclusion and future ideas

In the end, there’s some highlighted differences from our previous solution:

  • Pipelines became more stable, due to being implemented natively without any kind of middleware, and tests were implemented at unit and functional level
  • Pipeline code is modular and re-usable, meaning that we were able to reduce our code stack and write new flows by calling previously implemented code blocks
  • New pipelines are also easier to write considering that in most cases configuration is independent from the engine. When they need to share paths, a custom docker image with the new source code of the engine can also be built and tested in a controlled environment before releasing it to production.

For the future — Kubernetes native support or event emitter capabilities to provide real-time information, would be a great addition. It’s considered a Pipedrive closed source, nevertheless we had open-source in mind while writing it. We might release it publicly one day, perhaps.

Also, check out how we run functional tests for Docker microservices.

--

--

Jorge Claro
Pipedrive R&D Blog

DevOps Engineer @ Critical Techworks | BMW Group, Microcontroller Hobbyist and Scuba Diver