The Nuvo Group CI\CD journey

Published in

Nuvo Tech

7 min readJul 18, 2018

In the beginning God created the heaven and the earth ... And God said, Let there be light … and God divided the light from the darkness…

That’s a lot of manual work. Did God ever stop to think about automation? We sure did. In this post we’ll discuss our journey towards automating our CI\CD processes, and the tool that was developed during this journey — Orca.

Why should Site Reliability Engineers care about CI\CD

Working at the Cloud Engineering team at NuvoCares, we take reliability seriously. And when talking about reliability, one important topic is tractability. We want to be able to determine the exact origin of, well everything, and we want to be able to do it quickly. For example, if a bug was found in the production environment, we want to be able to trace it back all the way to the source code.

That is why we, the SRE team, decided to take on the task of designing the CI\CD processes ourselves. Don’t get me wrong, we are not at all fans of centralizing, and our development teams are responsible for much more then just the code, but we consider CI\CD traceability to be a huge part of reliability.

To achieve traceability, we need to link together the application’s versioning and the CI\CD platform’s versioning, in order to be able to go through the links swiftly should an issue arise. As a Site Reliability Engineer, you are the first to the scene, and are responsible to involve additional relevant personnel. It is important that you can navigate through the traces with ease, to reduce the time until the issue is being worked on, and as a result, the time it takes to repair it.

To better explain our process, let’s start by describing our work environment.

Our work environment

Our Production environment is a Kubernetes cluster in AWS (created using kops). We use Helm charts to deploy all our application components (microservices), as well as the cluster infrastructure (cluster-autoscaler, ingress-controller, etc.) and middleware (kafka, cassandra, etc.).

Our source code is in Gitlab (omnibus version, installed on a different Kubernetes cluster using a Helm chart). Each microservice is located in a separate project, and the repository includes everything from source code to Helm chart. All these are written by the development teams. This can be done when the developers are not afraid to learn and take responsibility, and we are not afraid to teach and help.

As for the CI\CD process — It is also a part of each repository (using .gitlab-ci.yml file), and it is the meeting point between the developers and the SRE. The developers are responsible for what goes to the environments, and the SRE is responsible for how it gets there. That was our motivation to design the process ourselves.

CI\CD considerations

When designing a CI\CD process, the first thing to take in to consideration is the user requirements. For us, it was pretty straight forward — the developers should write code (code in this context is everything in the repository), and everything else should be seamless.

The second thing is the branching strategy. The process should be designed according to it in order to uphold the seamless-ness requirement. In our case, we use trunk based development. Each feature is developed on a short lived branch and merged to the trunk via merge request. When we release a new version, we create a release branch from the trunk, so development can continue without interrupting with the release, and vice versa.

The third thing to take into consideration is the structure of the repositories. For us, each repository contains the source code of a microservice, along with the relevant Kubernetes manifests (as a Helm chart) in a separate directory in the repository.

CI\CD process design

We soon came up with a very basic design for the process. It is the same for all repositories so we’ll only go through it once, but feel free to read the next paragraph multiple times if you feel the need. 😉

Every time a developer pushes new code, the code is compiled and a docker image is built and pushed to the registry. Once the new image is pushed, the Helm chart is updated with the new image tag, the chart is packed, uploaded to our chart museum instance, and deployed to the development, staging and production environments. We did not mention testing, but it is there, really.

A process such as the one described shouldn’t be too difficult to implement, and yet, there is much to think about. Here are a few questions we were asking ourselves:

What if the new code is just a configuration change, which only affects the Kubernetes manifests? We shouldn’t re-build the docker image. We needed to figure out where the change occurred (source code or configuration) and execute only the relevant stages, meaning we needed path filters. This is an open issue in Gitlab for quite some time now.
What if we want to have a different process on different types of branches? For example, deploy only from a release branch, and only if the current commit is tagged (which is actually a requirement in our branching strategy).
What if this is a merge request? We may want to do some additional testing during it. This is also an open issue in Gitlab.
How will we create the desired trace: Environment → Helm chart version → Source code → Docker image version → Source code

If you are an active Gitlab user, you probably understand the pain behind questions 1, 2 and 3.

CI\CD process description

We solved our first question using an initial determination stage. The process begins by checking which files were changed since the previous pipeline (using Gitlab API), and determines which kind of pipeline should this one be. For example, if the only changes are to configuration — it should skip the build stages. This stage yields a TYPE for the current pipeline. Each stage in the pipeline has a TYPE of its own, and it begins by checking if it should actually run according to the first determination stage: If pipeline’s TYPE and stage’s TYPE are matching — execute the stage.

To handle question number two, we created stages for each of the actions we wanted in the process: build_image, upload_chart, deploy, etc. Execution criteria was defined for each stage using Gitlab parameters such as only and except to have different stages running in different branches. To top it off, we even found a cool way to allow deployment only if the current commit is tagged:

$ git checkout $CI_BUILD_REF_NAME
$ git pull --tags
$ git describe --exact-match --tags HEAD

Question number three was a bit trickier: We wrote a Gitlab plugin (shooter) to trigger pipelines when a merge request is created or updated, and used the Gitlab parameter triggers to have additional stages only for merge requests.

The solution to the fourth question is a fairly common practice in which you link each action to a unique identifier in your CI\CD platform. We chose the Pipeline ID, and appended it to everything that is versioned. A docker image that was created in pipeline with ID 4345 would be tagged as 2.1.2–4345 (with 2.1.2 being the version of the application). We did the same with Helm chart versions. So if we use helm ls, the version column will lead us to the creating pipeline, which will lead us to the chart’s source code. The source code, in turn, will lead us to a Docker image tag… You get the point.

Interim

So far, so good. And when things are good — requests start to arrive. One such request was dynamic environments. Why not have each merge request create a new environment with all stable components (except for the one in the current context), and run integration tests?

This is where we said — no more. Each .gitlab-ci.yml was already ~400 lines long, and to add more was difficult as is. It was time to simplify. We decided to simplify by searching for repeating patterns and encapsulate them in a dedicated tool — Orca.

You can find it here — https://github.com/nuvo/orca

Introducing Orca

You can think of Orca as a CI\CD simplifier, the glue behind the process. Instead of writing scripts on top of scripts, Orca holds all the logic of these scripts, and has a command line interface. One example which really brings out the simplicity provided by Orca is deploying a Helm chart from a chart museum. Each chart package contains environment values files for each environment. As Helm does not handle packaged values files, the deployment stage would look like this:

$ helm repo add museum $CHART_MUSEUM_URL
$ helm fetch museum/$NAME --version $VERSION --untar
$ cd $NAME
$ helm dependency update
$ helm upgrade -i $NAME \
    -f $ENV-values.yaml \
    --kube-context $CONTEXT \
    --namespace $NS .

Whereas with Orca:

$ orca deploy chart --name $NAME \
    --version $VERSION \
    -f $ENV-values.yaml \
    --kube-context $CONTEXT \
    --namespace $NS

That’s it! One command to rule them all.

Orca for the win

We see Orca as many things: It is a CI\CD simplifier, the glue of the process. It empowers Helm, it enables us to have dynamic development environments with ease, and it is a huge time saver.

Another thing Orca already does well is — it allows us to have Environments as Code. Almost everything today is immutable and declarative, with the exception of environments. Why are these still incremental? Why not have a repository with files that describe the state of the environment? With Orca, this is already possible and simple:

$ cat charts.yaml
charts:
- name: service-1
  version: 2.1.2-4345 # <application_version>-<pipeline_id>
- name: service-2
  version: 1.2.4-5633 # <application_version>-<pipeline_id>$ orca deploy env --name $NS \
    --charts-file charts.yaml \
    --kube-context $CONTEXT \
    -f $ENV-values.yaml # packaged environment values files

Simple. It is also a way to create development environments using Kubernetes namespaces, much like the new Azure Dev Spaces, but in your own cluster.

Conclusion

Orca is currently in a prototype stage and written in Go using Cobra. The more we think about Orca, the more we realize the potential it holds.

We are really excited about Orca and we intend to continue working on it. You can check out the Github repository to better understand the use cases and usage.

Thank you for reading!