Thinking GitOps — Pingdom Controller

Ido Braunstain
Yad2 Tech
Published in
4 min readJun 13, 2021

One of my first tasks at Yad2 was to redesign the current CI/CD pipelines which were pretty straightforward at the time.
We had a single On-premise Gitlab server that was responsible for:

  • Source code
  • Docker image registry
  • CI pipelines
  • CD pipelines

Not a high availability solution, to say the least.
This task was my chance to try something new and exciting, that’s why I decided to learn about GitOps.

GitOps is a way to do Kubernetes cluster management and application delivery. It works by using Git as a single source of truth for declarative infrastructure and applications.

We decided to try out Flux by Weaveworks for the implantation.
Let’s go over the four pillars of GitOps from Yad2 point of view:

1. The entire system is described declaratively

We started to recreate the entire Kubernetes’ manifests and store them in a simple-to-navigate layout inside git.

.
├── releases
│ ├── dev
│ ├── dr
│ ├── prod
└── workloads
├── dev
├── dr
└── prod

For the first time, we can easily see which version is running and what the current configuration is.
Communication with the Kubernetes API is declarative so, we don’t do things more than once. For example, deploying the complex Prometheus Operator chart, After testing the single helmreleasefile in our dev environment releasing it to production was a simple copy operation.

2. The canonical desired system state versioned in Git

Without a doubt, Git is the most widely used modern version control system in the world today. Storing our infrastructure in Git equips us with some useful features, such as:

  • Audit
  • Pull Request
  • History
  • Flexibility
  • Rollback

3. Approved changes that can be automatically applied to the system

After storing the state of our entire cluster in Git, the next step was to ensure that state changes in Git are automatically rolled out to the cluster. We don’t need to further complicate our CI process or provide any more credentials, as Flux does that for us. Flux runs on our cluster and pulls the changes from Git to be deployed straight to k8s.

4. Software agents ensure correctness and alert on divergence

Any change must go through git so, we were able to implement checks and catch any error at an early stage or even prevent it from applying to the cluster.
Simple example — YAML lint.

After the “why” and “how” we moved to the GitOps methodology, I’d like to introduce you to a simple state of mind that changed through our migration process.

The problem

We are using Pingdom to monitor our external endpoints.
For each external endpoint, we need to create a Check within the Pingdom website.
Creating aCheck looks like this:

As the number of endpoints increases, the process of creating a new Check and managing those endpoints becomes a toil, and toil should be eliminated.
The first step to solving this is writing code that automates our manual actions. We still had to think about how to solve this in a way that fits our existing work methodology — GitOps.
We thought it would be a great opportunity to create a CRD but, the overhead of managing a CRD is not justified in this case.

The solution

As a result, we decided to create a simple controller called Pingdom-controller. We will make use of the already created Ingress resource for each endpoint.
Each Ingress event will trigger an immediate Pingdom-controller reconcile loop cycle. Additionally, the controller will act according to a specific set of custom Annotations.

For example, adding just two annotations to an existing ingress will causePingdom-controller to create a matching Check:

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
annotations:
nginx.ingress.kubernetes.io/rewrite-target: /
pingdom.controller.yad2/apply: "true"
pingdom.controller.yad2/resolution: "1"

name: my-service # Check name
spec:
rules:
- host: my-service.com # Check host target
http:
paths:
- backend:
serviceName: service
servicePort: http
path: /

In order to avoid further burdening to the Kubernetes API servers (control plane), we opted to use “Informers and caching” instead of a watch command.
Scaffold for the reconcile loop:

We are using an Informer for almost every use case that requires retrieving events from Kubernetes which gives us: in-memory caching and fast, indexed lookup of objects by name or other properties in-memory.
A controller that accesses the API server using a watch mechanism every time it needs an object creates a high load on the system. In-memory caching using informers is the solution to this problem. Moreover, informers can react to changes of objects nearly in real-time instead of requiring polling requests.

--

--

Ido Braunstain
Yad2 Tech

Experienced Software Engineer working as DevOps. Skilled in Kubernetes, Docker, GO, Python and CI/CD.