Using Configuration as Data in your Cloud Operating model

Stuart Barnett
Appsbroker CTS Google Cloud Tech Blog
7 min readMar 9, 2022

(Or… “How I learned to stop worrying and love the KRM. And order pizza with YAML”)

A scene from the war room from the 1963 film “Dr Strangelove”
A still from the movie “Dr Strangelove” (Image: Sony Pictures)

Keen cinephiles may recognise where I’ve borrowed the above from — it’s Stanley Kubrick’s 1963 satirical Cold War masterpiece Dr Strangelove. If you haven’t seen it (and I can thoroughly recommend it), it portrays a horribly realistic scenario of just how out of control things can get when lines of communication don’t work; whether at any point what was intended to happen has actually happened; and what occurs when you have a whole bunch of people who don’t necessarily understand each other and see and do things differently.

At this stage you would be justified in wondering what the fudge has any of this got to do with operating on the Cloud? Well admittedly, wrapping together some of the operating challenges facing organizations as they move to the cloud with a darkly comic view of nuclear armageddon is stretching things a little, I grant you (and it is partly just a shameless ruse to mention one of my favourite films) — but bear with me.

In the film, you see that the people who are meant to be in charge and making the decisions don’t really know or understand much about the guys on the ground (or in the air, in this case) and how they operate or what they’re currently doing. They know what should be happening, but they have no reliable way of knowing whether it’s actually taken place or not.

We have the same issues in cloud land — we have stakeholders in organizations such as security teams, who want certain behaviours and policies to be in place on infrastructure — they may well write policy documents with detailed specifications and recommendations, but these then typically get handed over to someone else to implement. Similarly, the guys who provision that infra want to put consistent configurations in place across their infra — they may well use Infrastructure as code to do this — but how do they ensure that this doesn’t change over time (i.e. “configuration drift”) or that their IaC code actually succeeded, or that similar infra being provisioned by different teams is being provisioned to the same standards? Given that one of the intrinsic benefits of cloud is scale, how do you manage all of this across an ever growing number of clusters and environments? If there’s a breaking change or issue, how do you roll that change out across your estate, and how do you know if it got applied correctly?

Declaration of Interest

For application deployments on Kubernetes, this becomes an easier task to handle thanks to the declarative management model and resource based APIs of Kubernetes. Every object in Kubernetes can be represented as YAML — we manage the state of objects by passing these resource declarations to the Kubernetes APIs, from which point it is the responsibility of Kubernetes to ensure that the objects that manifest in the cluster match precisely those of the specification we supplied. Note we declare what we want the world to look like; we do not specify how this has to happen — that is the job of Kubernetes. This is what is known as “Configuration as Data”, and it differentiates between using the Kubernetes Resource Model (KRM) and its associated APIs versus more traditional imperative methods for controlling infrastructure (see IaC and Terraform). Importantly, having a declarative format, we can store our intended state in a git repo, and look to share and sync this view of the world across many clusters from a single source of truth.

It’s this decoupling of the declaration of desired state and the process of reconciliation within the cluster that helps Kubernetes to be so scalable, robust and easily extensible. In fact, it does such a good job of managing the state of native Kubernetes objects in a cluster, it’s attractive to use those same mechanisms to manage other resources — either inside the cluster, or across another set of clusters, or even outside of Kubernetes altogether. This approach can have some far reaching consequences, which we’ll see later.

To illustrate this, we can look at the example of Google’s Kubernetes Config Connector (KCC). This allows you to manage a whole host of GCP resources declaratively via a GKE cluster. Say we wish to provision a MemoryStore (i.e. GCP managed Redis) instance in our project for use by our apps. With Config Connector installed as a k8s controller in our cluster, we can declare the the details of the Redis instance we want in YAML (see below), and submit it to the Kubernetes API in the same manner we create k8s native objects (i.e. kubectl apply… ) — effectively we are creating a custom object in the cluster based on a Custom Resource Definition (CRD).

A CRD representing a GCP resource (MemoryStore)

Having received the details of the desired resource, the KCC is then responsible for making the request to the appropriate Google Cloud APIs to request the creation of the resource and (importantly) ensures that it attains and remains in that desired state, until such point someone updates or deletes that object in the cluster. Say we needed to resize the MemoryStore instance — we just update the appropriate property in the YAML definition, re-apply, and the KCC (and behind the scenes, the GCP API) does the rest. If something attempts to change the configuration of, or deletes the MemoryStore instance, the KCC will automatically detect this difference and will re-apply the configuration and monitor until it is the same as declared in the CRD.

The relation between KCC and controlled resources

This forms a key component of Anthos Config Management, Google’s solution for managing configuration, infra and policies across fleets of clusters in a GitOps driven fashion. You can install this in your own Anthos/GKE clusters, or you can even get Google to provision you a dedicated, managed cluster for this — check out the Config Controller.

“Everybody was Config Writing…”*

With such a neat idea, it’s no surprise that it’s not just Google Cloud doing this — Azure Service Operator and Amazon Controllers for Kubernetes do similar things on their respective cloud platforms. Crossplane (open sourced by Upbound) takes this a stage further by using this approach to provide a platform to enable the provisioning of composable cloud infrastructure across various cloud providers. To prove just how flexible this approach is, they have even used their controller/resource model to implement a provider for the Domino’s API — so you can even order pizza via the KRM! (I mentioned this to an Italian DevOps friend of mine the other day — he said that this should only ever be implemented with a policy blocking the addition of pineapple…)

All Together Now

It’s worth noting that this approach to leveraging the KRM is more than just a nice party trick, however — it’s about modernizing the approach to managing cloud resources, and ultimately about the way teams work. It’s Google’s belief that, given the success of Kubernetes (and its relation to Borg, the internal system used by Google to manage their internal containerized workloads), that the advantages are such that the KRM should be the common API “surface” used by both end users and software vendors. It’s a bold claim, but consider the way this could work in a modern enterprise operating on the cloud:

  1. Our app developers define their apps using immutable container images and deployment configurations, templated using Kustomize or Helm, so that they can use best practices by default. These deployments can be deployed in a GitOps flow, using tools like ArgoCD running on k8s to manage those deployments across multiple clusters
  2. Our platform engineers can use tools such as Google Config Sync, Config Connector or Crossplane to declare and manage infrastructure and normalized cluster configurations from definitions stored in git repos across fleets of clusters
  3. Our DevSecOps engineers can author policies directly in YAML to enforce required behaviours via k8s-based Policy Engines such as Anthos Policy Controller, Kyverno, or OPA Gatekeeper

You now have multiple teams in traditionally separate disciplines (dev, security, infra) in your enterprise working in a very similar fashion — declaratively, and using GitOps flows to manage their changes to code/infra/configs etc. Not only that, but we see the lines of responsibility become blurred and require greater collaboration — DevSecOps can work with platform engineers to develop secure configurations for clusters; app devs can understand how their cluster environments are configured and what policies are in place. Git is our single audited source of truth, and the KRM is working to keep everything the way we want it to be, reconciling state with our source of truth (git) and eliminating config drift.

It’s a powerful concept, and one that has a significant amount of support amongst the GitOps, Kubernetes communities (understandably) and beyond. It’s at the heart of Anthos, as well as powering everything from multicluster app deployments to provisioning bare metal.

So the next time you’re looking to normalize your cluster configurations, or provision your infra from git, or apply policies across your clusters and you’d like your teams to start working in a more collaborative and cross functional fashion — maybe you could look at using Configuration as Data and the KRM — it might not be the end of the world…

(Many thanks to Ed Adkins @ Google for providing the original inspiration for this piece)

* to be sung to the tune of Carl Douglas’ 1974 smash hit “Kung Fu Fighting” … err, I’ll get my coat..

Like what you’ve read here? We’re hiring across all GCP roles at CTS — talk to us!

--

--

Stuart Barnett
Appsbroker CTS Google Cloud Tech Blog

GCP Cloud Architect Lead at CTS. Thoughts here are my own and don’t necessarily represent my employer.