Open-Sourcing Isopod: An Expressive DSL Framework for Kubernetes Configuration
With Isopod, we achieved strongly typed Kubernetes objects, code reuse, and test coverage that was not possible before.
In a previous Cruise blog, Karl Isenberg described how the PaaS team built a multi-tenant compute platform on Kubernetes to support hundreds of engineers and the versatile and increasing demands on computing, networking, and storage resources by 3D maps, navigation services, driving simulations, machine learning, data processing, and much more.
In this blog, we explore the challenge of configuration management in Kubernetes and present our open-source Isopod as a distinct solution from existing offerings in the community. With Isopod, we achieved strongly typed Kubernetes objects, code reuse, and test coverage that was not possible before.
Today, the workloads at Cruise span several Kubernetes clusters totaling tens of thousands of cores and hundreds of TB of memory. Such a scale is possible in part thanks to the declarative abstraction of Kubernetes, which allows users to specify desired states in YAML manifests. Composing YAML, however, is cumbersome when targeting multiple similar environments. It is equivalent to filling a shared template with cluster-specific values, as illustrated in Figure 1.
Existing templating tools (Helm, Kustomize, and the likes) assume values are statically known and use CLIs to get dynamic ones, such as secrets from Hashicorp Vault. Such a scheme is not ideal because it is:
- Hard to test, since side effects escape through CLIs.
- Highly dependent on the execution environment, since CLI versions vary across machines or might not exist.
- Wrong indents and typos are not detected until applied.
- YAML manifests prescribe the eventual state but not how existing workloads will be affected. Blindly applying the manifest might cause outages.
- Difficult to build YAML with complex control logic, such as loops and branches, as demonstrated in Figure 2. Although this example is written in Bash, the challenges of YAML fragmentation and indentation tracking remain even when other YAML-templating tools or languages are used.
How Isopod marks a new paradigm of configuration management in Kubernetes
Isopod approaches Kubernetes configuration differently by treating Kubernetes objects as first-class citizens. Without intermediate YAML artifacts, Isopod renders Kubernetes objects as Protocol Buffers (Protobufs), so they are strongly typed and consumed directly by the Kubernetes API.
With Isopod, Kubernetes objects and cluster targets are scripted in Starlark, a Python dialect by Google, which is also used by the Bazel and Buck distributed build systems. To replace CLI dependencies, Isopod extends Starlark with runtime built-ins to access services and utilities such as Vault secrets management, Kubernetes apiserver, HTTP requester, Base64 encoder, and UUID generator, etc. Isopod uses a separate runtime for unit tests to mock all built-ins, providing test coverage that was not possible before.
The following snippet in Figure 3 offers a peek into the expressive power of Isopod. It loads the Kubernetes API schemas using
proto.package(). Reusing code is simple as it loads from another file the helper function
bindingSubjects(members), which constructs a list of typed Kubernetes objects in a loop. The Starlark built-in
kube communicates with the Kubernetes apiserver, and its
put attribute sends the Kubernetes objects over.
The user could verify the behavior of the configuration script with unit tests, such as the one below in Figure 4. Built-in modules that allow external access —
vault for example — are stubbed out in unit test mode, so tests are hermetic.
Going beyond testing
The hermetic property of Isopod extends beyond testing. Application secrets are stored in Vault and queried at runtime using built-in. Hence, no secrets escape to the disk. In fact, Isopod prohibits disk IO except for loading Starlark modules from other scripts. No external libraries can be loaded unless explicitly implemented as an Isopod built-in.
In addition to Kubernetes object construction, Isopod can also manage cluster target selection, with a main Starlark script such as the one in Figure 5. First, Isopod calls function
clusters(ctx), whose argument is supplied by the user through the command line. For each chosen cluster, Isopod will install chosen addons returned by
In addition, Isopod offers many other features such as rolling out to multiple clusters in parallel, and reclaiming dangling k8s objects. For each rollout, Isopod creates a tombstone ConfigMap to store the entire configuration applied. Isopod updates the
ownerRefernce field of every object constructed in this rollout to point to such tombstone ConfigMap. If an object ceases to be referenced — for example, the new rollout does not include this object — the
ownerRefernce field of such object still points to the previous ConfigMap. By deleting the previous ConfigMap, Isopod triggers the Kubernetes garbage collector to automatically delete all objects that once had an owner but no longer do.
In dry-run, Isopod informs about intended actions from code changes as YAML diff against live objects. For example, if an NGINX Service object is changed to
NodePort type instead of
ClusterIP type, Isopod will display the following diff.
Results from Isopod
Since the adoption of Isopod, the PaaS team at Cruise has seen the following results:
- We migrated 14 cluster add-ons from Bash scripts, and added another 16 without outage or regression, totaling around 10,000 lines of Starlark.
- The migration resulted in up to 60% reduction in code size due to code reuse, and 80% faster rollout by the merits of the cluster parallelism and the removal of YAML intermediaries.
- Unit tests take less than 10 seconds to finish.
- Tests, live YAML diff, and proto message validation prevent virtually all regressions.
Use Isopod for your team
We would like to thank Stephen Day and Karl Isenberg for reviewing both the design and implementation of Isopod and offering valuable comments. We are grateful for the tremendous support from Vu Pham and Adrian Macneil for the development of Isopod and for this blog. We were lucky to run into John Millikin on Caltrain, who has been the primary contributor to the Skycfg project and introduced us to that project.