How MPB uses GitOps, Helm and Kubernetes to deliver features at the speed of light

Mathew Robinson
MPB Tech
Published in
5 min readApr 9, 2021

In this post I’ll be looking at how we’ve implemented GitOps at MPB, and close with some thoughts on what we’d do differently if we had our time again.

MPB is a reseller of photographic equipment with warehouses in Britain, Germany and the USA. Our Product & Engineering team has to support customer-facing websites, product catalogue, stock control systems and more.

By moving over to a Service-Oriented Architecture (SOA) we hope to both simplify and accelerate all our systems. Core to this is acceleration is our automated environment management and continuous deployment pipelines.

We try to stay at the cutting edge of Site Reliability Engineering (SRE) techniques and technologies. To that end, our entire SOA is hosted and deployed on Google Kubernetes Engine (GKE). We use Helm to package the Kubernetes resources for each service and to manage our global deployments.

Continuous Delivery and Deployment

Core to our model are the ideas of Continuous Delivery and Continuous Deployment. While both these terms abbreviate to CD, they are two different concepts.

Continuous Delivery is the process of automatically building and pushing artefacts, often software packages, to a repository once all automated tests are passed. Continuous Deployment is the logical next step of deploying this package to an environment.

At MPB, when a pull request is merged to master, our regular Continuous Integration (CI) process fires, runs automated testing, and on success proceeds to a final “Release” step which is common to all repositories.

This step runs our internal system tool’s release command. It’s a simple script that checks out the associated Git repo for the software currently being tested and creates a Git tag of the new version number. We use SemVer version numbers but don’t follow the breaking change rules.

Once this Git tag is pushed back to GitHub by the Release step, it activates our CloudBuild trigger which executes the deployment process:

  • Run Helm Dependency Update to gather and validate the chart dependencies before putting them into the pipeline
  • Build and upload any software packages / SDKs for the given service
  • Use the helm-push plugin to push the built Helm package version to our chart museum
  • Run the system tool “deploy” command, which checks the environments repository, updates the manifest file for this service with the new version, regenerates the requirements file, and pushes those changes back to the environments repository.

Here is the end-to-end workflow:

The environments repository

All of our deployed environments map directly to a Git branch on the environments repository. This mapping of git branches to environments is the central idea of the GitOps methodology, in which Git is used to manage environments.

The main branch is our development environment. We also maintain two long-lived environment branches that correspond to our long-lived environments: release/staging and release/production.

Whenever a commit is pushed to any branch on the environments repository, this will trigger deployment of the whole system. The environments repository contains a (mostly) empty Helm package, with a requirements.yaml file defining all of the MPB Helm charts needed to deploy the SOA as a whole. This requirements.yaml file is updated by the independent deploy steps when versions are released. Pushing that change triggers a full system deployment.

This full-system package works quite well because Helm is idempotent and does not change resources in Kubernetes unless necessary. Having a centralised repository for deployments like this also gives us the benefit of a single source of truth about when a deployment happened.

All that’s necessary to do the deployment then is to run Helm Upgrade on the environments package. To get the necessary flags, we have a Python script called “generate-release-file”, which determines which Git branch is currently deploying, reads the release-config.yaml, and creates a flags file for that deployment which we use to feed flags to Helm.

An example of the release-config.yaml looks like this:

---
master:
release_name: rel5
cluster_name: testing
config_vars: vars/testing.yaml
release/staging:
release_name: staging
cluster_name: testing
config_vars: vars/staging.yaml
release/production:
release_name: production
cluster_name: production
config_vars: vars/production.yaml

Handling promotions

Since we treat the entire SOA as a single package, we tag each of these deployment commits using the same release-tagging automation we use in each service. These tags allow us to reference ‘snapshots’ of the SOA and promote them to higher environments. To do a promotion, it’s as simple as git checkout release/staging and git merge <tag-name>. When we push that merge up, it’s automatically deployed to the corresponding environment.

We have one additional task that runs on release/* branches called “reverse-promotion”. We like to have a view in each service’s repository of what version of the code is deployed in a given environment. To this end, each service has a release/staging and release/production branch. Humans don’t interact with these branches — they exist purely for this reverse-promotion step.

Once a promotion deployment has finished, the reverse-promotion script will clone each service repository and merge the corresponding version from the requirements.yaml into that service’s release/<environment> branch.

The end-to-end process looks like this:

Real-world performance

This process has proven to be reliable and fast. It’s also very extensible, since it’s made of many simple parts. We have added multiple integrations and it’s easy to keep adding more. GitOps has also given us an easy view into our deployed environments and centralised management of them.

The biggest downside by far has been Helm. We currently use Helm 2, and while Helm 3 resolves many of the issues we experience, I would recommend using simple YAML templating + kubectl apply if we were to do it over again.

Helm 2 has multiple reliability issues and does not handle concurrent deploys very well. It can lead to deadlocks, and we occasionally have to manually change the state of those Helm releases. We expect Helm 3 will be better in this regard, but it would still be simpler to just use kubectl apply.

Mathew Robinson is Lead Software Engineer at MPB, the UK’s leading reseller of photographic equipment with operations in Britain, Europe and North America. https://www.mpb.com

--

--

Mathew Robinson
MPB Tech

Linux/FOSS Enthusiast, and I do SRE stuff professionally. You can catch me writing Python, Go, and Typescript on the weekends.