Open-Sourcing Isopod: An Expressive DSL Framework for Kubernetes Configuration

With Isopod, we achieved strongly typed Kubernetes objects, code reuse, and test coverage that was not possible before.

Charles Xu
Cruise
5 min readSep 10, 2019

--

Authors: Charles Xu and Dmitry Ilyevskiy

In a previous Cruise blog, Karl Isenberg described how the PaaS team built a multi-tenant compute platform on Kubernetes to support hundreds of engineers and the versatile and increasing demands on computing, networking, and storage resources by 3D maps, navigation services, driving simulations, machine learning, data processing, and much more.

In this blog, we explore the challenge of configuration management in Kubernetes and present our open-source Isopod as a distinct solution from existing offerings in the community. With Isopod, we achieved strongly typed Kubernetes objects, code reuse, and test coverage that was not possible before.

Today, the workloads at Cruise span several Kubernetes clusters totaling tens of thousands of cores and hundreds of TB of memory. Such a scale is possible in part thanks to the declarative abstraction of Kubernetes, which allows users to specify desired states in YAML manifests. Composing YAML, however, is cumbersome when targeting multiple similar environments. It is equivalent to filling a shared template with cluster-specific values, as illustrated in Figure 1.

Figure 1: NGINX Pod Specification for Kubernetes

Existing templating tools (Helm, Kustomize, and the likes) assume values are statically known and use CLIs to get dynamic ones, such as secrets from Hashicorp Vault. Such a scheme is not ideal because it is:

  • Hard to test, since side effects escape through CLIs.
  • Highly dependent on the execution environment, since CLI versions vary across machines or might not exist.
  • Wrong indents and typos are not detected until applied.
  • YAML manifests prescribe the eventual state but not how existing workloads will be affected. Blindly applying the manifest might cause outages.
  • Difficult to build YAML with complex control logic, such as loops and branches, as demonstrated in Figure 2. Although this example is written in Bash, the challenges of YAML fragmentation and indentation tracking remain even when other YAML-templating tools or languages are used.
Figure 2: Composing YAML with loops and branches while tracking indentations

How Isopod marks a new paradigm of configuration management in Kubernetes

Isopod approaches Kubernetes configuration differently by treating Kubernetes objects as first-class citizens. Without intermediate YAML artifacts, Isopod renders Kubernetes objects as Protocol Buffers (Protobufs), so they are strongly typed and consumed directly by the Kubernetes API.

With Isopod, Kubernetes objects and cluster targets are scripted in Starlark, a Python dialect by Google, which is also used by the Bazel and Buck distributed build systems. To replace CLI dependencies, Isopod extends Starlark with runtime built-ins to access services and utilities such as Vault secrets management, Kubernetes apiserver, HTTP requester, Base64 encoder, and UUID generator, etc. Isopod uses a separate runtime for unit tests to mock all built-ins, providing test coverage that was not possible before.

The following snippet in Figure 3 offers a peek into the expressive power of Isopod. It loads the Kubernetes API schemas using proto.package(). Reusing code is simple as it loads from another file the helper function bindingSubjects(members), which constructs a list of typed Kubernetes objects in a loop. The Starlark built-in kube communicates with the Kubernetes apiserver, and its put attribute sends the Kubernetes objects over.

Figure 3: Constructing Kubernetes Objects Isopod

The user could verify the behavior of the configuration script with unit tests, such as the one below in Figure 4. Built-in modules that allow external access — kube and vault for example — are stubbed out in unit test mode, so tests are hermetic.

Figure 4: Unit-testing Kubernetes Configuration

Going beyond testing

The hermetic property of Isopod extends beyond testing. Application secrets are stored in Vault and queried at runtime using built-in. Hence, no secrets escape to the disk. In fact, Isopod prohibits disk IO except for loading Starlark modules from other scripts. No external libraries can be loaded unless explicitly implemented as an Isopod built-in.

In addition to Kubernetes object construction, Isopod can also manage cluster target selection, with a main Starlark script such as the one in Figure 5. First, Isopod calls function clusters(ctx), whose argument is supplied by the user through the command line. For each chosen cluster, Isopod will install chosen addons returned by addons(ctx).

Figure 5: Main Isopod script defines addons and clusters

In addition, Isopod offers many other features such as rolling out to multiple clusters in parallel, and reclaiming dangling k8s objects. For each rollout, Isopod creates a tombstone ConfigMap to store the entire configuration applied. Isopod updates the ownerRefernce field of every object constructed in this rollout to point to such tombstone ConfigMap. If an object ceases to be referenced — for example, the new rollout does not include this object — the ownerRefernce field of such object still points to the previous ConfigMap. By deleting the previous ConfigMap, Isopod triggers the Kubernetes garbage collector to automatically delete all objects that once had an owner but no longer do.

In dry-run, Isopod informs about intended actions from code changes as YAML diff against live objects. For example, if an NGINX Service object is changed to NodePort type instead of ClusterIP type, Isopod will display the following diff.

Figure 6: Intended actions as YAML diff

Results from Isopod

Since the adoption of Isopod, the PaaS team at Cruise has seen the following results:

  • We migrated 14 cluster add-ons from Bash scripts, and added another 16 without outage or regression, totaling around 10,000 lines of Starlark.
  • The migration resulted in up to 60% reduction in code size due to code reuse, and 80% faster rollout by the merits of the cluster parallelism and the removal of YAML intermediaries.
  • Unit tests take less than 10 seconds to finish.
  • Tests, live YAML diff, and proto message validation prevent virtually all regressions.

Use Isopod for your team

If you are interested, Isopod is open source today. If you would like to create your own tools or work with the PaaS team, check out our open positions and join us.

We would like to thank Stephen Day and Karl Isenberg for reviewing both the design and implementation of Isopod and offering valuable comments. We are grateful for the tremendous support from Vu Pham and Adrian Macneil for the development of Isopod and for this blog. We were lucky to run into John Millikin on Caltrain, who has been the primary contributor to the Skycfg project and introduced us to that project.

--

--