Buzz surrounding Kubernetes is on the rise. Like Docker, its popularity is growing and the number of folks breathless to tell you all about it is only going to continue. Hi, I’m Alex, I’m super into Kubernetes, and I’d like to tell you about how I’m trying to make it easier to get started.
As with Docker, my deep and abiding love for Kubernetes comes from its ability to solve very difficult, chore-ridden, and painful infrastructure challenges common amongst modern application architectures. One capable Kubernetes initiate can do in short order what I’ve seen seasoned operations teams struggle to accomplish.
Unlike Docker, Kubernetes addresses a much broader scope and so it requires a bigger investment of initial time and attention to reach a baseline of capabilities and start forming internal standards and practices. It’s a technology that comes with its own vocabulary. Here’s their glossary in case you start to feel lost or wonder if I’m making things up.
I’ve gotten so much from learning and applying Kubernetes that I want to help reduce the effort to adopt and use Kubernetes in the context of running small-to-medium scale application clusters. There are lots of different ways that I am trying to do this, but I thought I’d start by introducing the tool that addresses the most elementary aspect, Kubernetes manifests. If you’ve taken a peek at any example, you’ve likely noted that workloads are defined in YAML manifests.
Let’s take a look at two manifests required to deploy a simplistic, but working, ElasticSearch instance. The Service kind manifest makes the Pod discoverable behind a networking abstraction while the StatefulSet kind manifest defines the way to create a type of controller that will provision and attach storage that follows the container instances for the ElasticSearch Pod around.
Manifests provide a powerful way to define infrastructure, but the complex structure, nuance between similar kinds, changes between API versions, and repetition of metadata can make the format daunting to read and work with. Looking over a ~100 LoC listing across 2 files to deploy a single service can be a disheartening introduction to a new technology.
One of the immediate things that struck me about the format early on was the repetition of certain metadata. For average use cases, where a great deal of specialization isn’t necessary, the repetition is where the most opportunities for mistakes get introduced.
You: “I changed that value!”
Me: “Did you change it in all 4 places?”
You: “I thought there were only 3 …”
Us: stifling sobs as we watch the crash loop counter increment
There’s Got To Be A Better Way
At npm, we already make heavy use of TOML as the format of choice to encode metadata for our operational tooling. When I first built a proof-of-concept cluster using Kubernetes YAML manifests, I easily got to 5,000 lines before covering all of the workloads and ignoring concerns like resource limits, probes, network policies, and RBAC.
Even if I had plenty of time on my hands, I’d still want a terse format for expressing clusters. Based on the year I’d spent with Kubernetes up to that point, I began work on an internal RFC for a cluster specification based on TOML that addressed the following concerns:
- eliminate repetition of metadata
- reduce effort and improve readability
- support templating and promote reuse
- provide a means for ordered provisioning
- protect specifications from structural changes due to versioning
- scaling factors in definitions (one definition can work for multiple initial cluster sizes)
- auto-generation of NGiNX location blocks for specifications with a subdomain specified
- merge and injection of NGiNX location blocks into a supplied config to produce a final config in the output
The end result is mcgonagall, a Node.js library and CLI tool that can translate a cluster specification into a set of YAML manifests. It’s worth noting at this point that we have tools that work in conjunction with mcgonagall to perform the actual deployment, but I’ll save discussion of those for another post. For now, let’s look at the mcgonagall specification for the same ElasticSearch workload and service above:
Aside from the 3x line reduction (which tends to be more dramatic the more features you make use of) and the ability to put everything into a single file, the big win is the clarity that comes from focusing on the data that describes the workload. Even if you don’t understand Kubernetes, you can likely follow this format and guess how the information is used. We’ve been much happier and more effective since switching. Our experience has been that those with only passing Kubernetes familiarity are able to easily define and deploy new services as part of an existing cluster as well as participate in PR reviews to changes to the cluster definition.
Eventually I hope to share some generalized patterns for deploying specific bits of infrastructure via mcgonagall. Since it’s able to transfigure from GitHub URLs, and will prompt you for missing token values in the template, this is actually a nice way to define, share and re-use specifications. A generalized ELK stack for log aggregation that gathers log entries from every container in the cluster automatically or a TICK stack for automatically pulling in all telemetry for the cluster and its containers are things nearly any cluster can benefit from.
I hope others will get as much enjoyment and utility out of mcgonagall as we’ve gotten. It’s worth noting that mcgonagall also supports use of raw YAML files for cases where it doesn’t have support for new, evolving kinds, features or third party manifest kinds.
As with any young OSS project, mcgonagall is going to have its fair share of uncovered edge cases and bugs we haven’t found yet. In addition to that, there are some potentially unpopular opinions/limitations designed into the tool on purpose that might infuriate the occasional seasoned Kubernetes operator who doesn’t appreciate guide rails.
The decision to limit Pods to a single container has the highest potential to cause irritation. This has a lot to do with my early experiences when managing or upgrading controllers and finding that limiting their scope to a single container and keeps things simple, predictable and low impact. The only likely exception we’ll make for this in the future will be the ability to add initializer/side-car containers to help with stand-up tasks that are difficult or impossible to manage otherwise.
Have You Heard Of …
As always, when introducing some new project, the most frequent comment I get is, “why didn’t you just use Thing from Big Company?” I looked at and evaluated the available alternatives before going the route of building something new. Here are my notes on the two most commonly referenced potential alternatives.
Helm makes a lot of sense for existing clusters where you might need to deploy a set of related workloads as a package. When I started considering how I might define our system as a set of Helm packages, I made the following notes:
- the format is no easier to read or maintain
- there are very few existing packages in place I can use, I’d be rolling almost all my own
- I’d still need an orchestration tool a level above Helm to tie all the packages together and handle some of the cluster-level concerns
I also looked at KSonnet. I don’t know the priorities or drivers behind it, but the impression I got from their documentation and tutorials was that it‘s appeal is likely due to familiarity with JSonnet and other toolsets that have a similar flow and design ethos. As someone who’s written a lot of Kubernetes YAML manifests by hand, I found KSonnet disorienting and felt like it added layer of complexity to Kubernetes by introducing more CLI steps, additional system prerequisites and an another API to learn. Based on our stated goals, it was moving away from where we wanted to go without offsetting benefits. I mention it here because there’s a good chance folks will look at mcgonagall and hate the TOML, the design, or the fact that I used Node but they might fall in love with KSonnet. More tooling increases the likelihood that there’s a solution for everyone. You can tell from the site and documentation that Heptio and the community have put a lot of care and effort into KSonnet.
There were a few other tools I looked at but those were focused on deployment and didn’t solve our concerns with the manifests and wouldn’t have prevented me from writing mcgonagall. I’ll talk about those later when I write about deployment tooling.
I just spent about half a week adding network policies to our cluster and it was quite the experience. It didn’t take long before I started thinking, “the specs all have metadata about the pods they connect to, we should add a feature to infer policies.” There’d be limitations of course, but allowing mcgonagall to scan the cluster’s specs and generate recommended network policies for you would be a great starting point.
We also anticipate that eventually we’ll likely run into a situation where we’ll want to add support for initializer containers since there are real limitations to alternative approaches.
I hope mcgonagall proves useful to teams getting started with Kubernetes. I invite anyone using to share ideas for new features or improvements with us via GitHub issues on the project.