Questions along the path to Continuous Deployment

An honest account of a startup’s experience with GitOps.

ALTEOS Tech
Alteos Tech Blog
8 min readJul 27, 2020

--

by Tom Burton

A step in the right direction

The what

What started as Infrastructure as Code (I.a.C) has now morphed into something even greater. I.a.C gave us a simple way of maintaining a single source of truth for our infrastructure in Git. Combine this with leveraging Git to perform deployments and you get GitOps. A way of using Git as the arbiter of our releases.

The term “GitOps” was first coined by Weaveworks. Since then, they have gone on to release one of the more popular tools in the trade, FluxCD. I will come back to this particular software later on.

The why

Towards the end of 2019, we weren’t anywhere close to implementing a robust CI/CD process. It was paramount that we achieve this as soon as possible. The bigger you get, the harder it is to put in place.

At the time we had a GitOps-esq system in place; patch deployments in GitLab CI. While functional in their execution, they didn’t feel like a future proof solution. As many DevOps in an early stage startup will understand: the initial infrastructure setup can feel like the Wild Wild West. Some of the other issues we faced included:

  1. Engineers/DevOps releasing infrastructure changes from their computers.
  2. Engineers/DevOps not committing changes to Git.
  3. No official documented processes for making changes.
  4. State of the cluster inconsistent with what was in Git.

This list goes on. It was evident that there was an issue about how we managed infrastructure state. So, on a cold rainy November evening, we decided to begin our journey with GitOps. Along the way, we hit many hurdles and asked many questions. The intention of this article is to provide answers to those questions based on our experience.

Question 1, November 2019 — What is GitOps and how can it help us?

GitOps is a process defined as leveraging Git’s core functionality to deploy application and infrastructure code.

The first step on this journey was to gather as much information as possible on the topic. This included case studies, white papers and source code of example projects. Some of the more useful material I was able to find I listed here:

It is important to establish at this stage, how GitOps will improve what you already have. As we were to find out later on, doing so can save you a lot of time.

Take for example our patch deployments: to release a new image on the the cluster, we would create a GitLab CI job to kubectl patch the deployment.

With GitOps, we knew it was possible to have that state stored and managed. Thus the state of the cluster would be reflected by the state in Git. Problem solved.

Set out your requirements. Marry them up to the offerings of GitOps and only then proceed.

Question 2, December 2019 — What tooling is available to us?

This is always by far the most interesting part of beginning a new journey; the hunt for the right tooling. The options described here are the ones we considered. They are either backed by active communities or great teams, giving you peace of mind.

Proprietary

CodeFresh & Harness

These tools only get a short mention here. Both connect to your Git repos and can apply changes to your cluster. Not only that, they have CI/testing capabilities as well. At present, we use GitLab CI to run our build/test jobs. We felt there was no need to buy a subscription to a package and only use half of it. Both have very clean user experiences and dedicated support teams to help you on the way. Check them out if you would like more of a managed solution.

Open Source

ArgoCD

ArgoCD, built by the team at Intuit, is a fantastic tool. It comes with its own UI for managing your cluster, which you can extend to use any kind of SSO for your organisation.

Argo runs a collection of pods that manage everything from caching to authentication. You can configure it to look at an unlimited number of repositories, branches, and clusters. It comes with built-in support for most major templating tools (Helm, Kustomize, JSonnet, KSonnet etc.). Also, you can create your own customised templating tool. This could be a hybrid of Helm and Kustomize for example.

At the time of testing this, it seemed like it would be the perfect option. Unfortunately, we ran into several bugs (that they have since fixed) which made us look elsewhere. Looking back, this was a missed opportunity.

FluxCD

Built by the team at Weaveworks, FluxCD is a much simpler tool in its functionality. It runs as a single pod (with a memcache storage) and will apply any Kubernetes manifests you choose to point it at. It also allows you to customise the templating before running kubectl apply. A key difference between Flux and Argo is the ability of Flux to commit changes back to the repo. This means that the state of your cluster repo is always in sync with the state of the cluster. To get the latest image, FluxCD needs only access to your registry, be it GitLab or ECS. Flux stores these images in Memcache for quick rollbacks if necessary. From there it can update the image on cluster.

Upon finding out that Flux not only deploys the code but can manage it as well, it was the clear winner for us. It solved all the problems outlined above. It seemed simple enough to put in place and of course, it’s free.

Question 3, December 2019 — How does this tooling fit into my current setup?

Kustomize

Last year, we introduced Kustomize as our primary templating tool. I won’t go into detail about how Kustomize works, as you can read all about it here.

We realised that we had been using an outdated approach with Kustomize. The team there had rethought the best way to organise the directories. The good news was that this provided the perfect platform to try out Flux. We could have one Flux instance per environment that would manage the deployment and infrastructure changes. The bad news was that we had to refactor more or less the entire repo.

Having finished the refactoring we sat back to take a look at our handiwork. What we had was nothing short of directory hell. The duplication of folders was a nightmare. It is my opinion that Kustomize does not provide a scalable solution to microservice architecture. But that is for another post.

It was clear that we had tried to fit a square peg in a round hole. At this point we should have reconsidered whether this was the right approach. Any startup DevOps engineer reading this, I implore you to re-evaluate if you get to this stage.

Question 4, January 2020 — How can I create a viable Proof-of-Concept?

To give some context I will give a brief description of the deployment strategy at Alteos.

When a developer starts a new feature, they will create a branch and push it to GitLab. We run a collection of pipelines that create a new namespace in our staging cluster. Then we deploy an exact replica of our stage environment there, a.k.a a preview environment. All we change is the variables for ingress rules, databases etc. and away they go. Upon merging to develop we build and deploy the changes to our stage environment. After testing, we release to our sandbox environment. We complete a final round of testing before finally promoting all the way to production.

We came up with a plan for tackling preview environments and stage. Instead of running complex bash scripts to build and apply the manifests, we would simply deploy Flux to a new namespace. Flux would fire up and deploy all the application infrastructure for us. It felt like a beautiful reduction in complexity. Our stage environment was even easier. Deploy Flux, point it at the relevant directories and let it manage all deployment releases.

This was going to be our PoC.

Big mistake.

We had not considered what sandbox/production releases might look like in enough detail. As I will explain shortly, it is vital you don’t rush. The eagerness to adopt GitOps tooling can easily lead to poor planning and execution.

Question 5, March 2020 — Has the Proof-of-Concept been a success and is GitOps the best approach for us?

By the end of March we had FluxCD running in both our preview and stage environments. The road there was both challenging and interesting. However, implementing Flux for preview environments deserves its own blog post.

By all metrics the PoC had been a success:

  • Preview environments

Flux was working exactly as we had planned. Our pipelines had been simplified. Developers were happy with their functionality.

  • Stage environment

Upon merging changes from a preview branch to develop, our old build jobs would containerise and push the latest image to the registry. Flux would pick up the latest image, update the repository and then apply the changes to the cluster. Any infrastructure changes were also applied at the same time. Perfection.

The next logical step was to push Flux all the way to production. This is when the penny dropped. As many startups know, implementing microservice architecture comes with a multitude of challenges. One being, how does one create completely autonomous independent applications? The all too common compromise is to end up with a hybrid monolithic/microservice system. This means we have to release all applications at the same time.

Our first attempt to answer this yielded a rather convoluted solution. We would have one branch per environment; preview -> stage -> sandbox -> prod. Promoting changes to each environment would be a case of merging one branch to another.

One reason we disregarded this was the lack of control we had if something failed. If one microservice, that underpinned the functionality of another, failed to build, Flux wouldn’t wait for it. It would apply the changes and the system would likely crash.

Then, what if we wanted to release a hotfix? We would have to branch off prod and then merge to stage and sandbox branches. The complexity and margin for error kept increasing.

GitOps was becoming a massive headache rather than a silver bullet…

[continues in Part 2 — follow ALTEOS Tech and be part of the story!]

Tom Burton is a DevOps engineer at Alteos.

Originally a Chemical Engineering graduate, Tom decided to switch to a coding career after attending a Blockchain party in a penthouse suite of the Sheraton New York. Upon completing a 9 week bootcamp, Tom explored many avenues of software development before choosing a career as a DevOps. He currently lives in Berlin and is working on creating a fully automated CI/CD pipeline at Alteos.

--

--

ALTEOS Tech
Alteos Tech Blog

Alteos unleashes the potential of a digital insurance