Platform Engineering: Learning From The K8s API

Shift Down, Not Left. Simplify Hard Things For Your Developers. Learn From The Popularity Of K8s And Its API

Sven Hans Knecht
5 min readOct 2, 2023

Richard Seroter published a recent article called: The Modernization Imperative: Shifting left is for suckers. Shift down instead. It’s a great article and discusses many of the problems developers have experienced over the last ten years as DevOps and cloud have taken the tech world by storm. Developers are being asked to know more and more, raising the cognitive complexity of interacting with the developer platforms they are deploying. There’s a reason developers love platforms like Heroku! They do a great job at making deployment easy and shifting down.

With the rise of the Kubernetes (K8s) platform, teams have a wonderful opportunity and tool to eliminate a lot of cognitive complexity from the developers by putting everything a service needs behind the K8s API. As discussed in Building a Successful SRE Team, focusing on self-service is critical to scaling a platform team. If you don’t, you end up staffing linearly for every development team you add versus allowing teams to onboard and you to support a significantly higher number of engineers. Focusing on self-service brings other benefits; it is also critical to making developers happy and unlocking the platform's potential. Why is the K8s API model the key to self-service? It provides the following key features:

  • It is idempotent. Submission of the same object twice doesn’t result in two objects but in one.
  • It is declarative. Engineers don’t need to write long imperative steps to achieve a particular result. They describe what they want and allow all the other concerns to be taken care of by the orchestrator.
  • It incentivizes fault tolerance. This doesn’t mean every app deployed on K8s is fault-tolerant; anyone who has lifted and shifted applications knows that isn’t true. Rather, the K8s API and model tell you to fail and retry until an operation succeeds simply. If a pod fails to start, kubelet doesn’t stop trying; rather, it continues until it succeeds, even if that condition will never be true without intervention
  • It manages reconciliation. Look, I love Terraform. I’ve written a lot of Terraform. I’ve written a lot of articles on Terraform. The big downfall of Terraform is drift. With Terraform, managing drift, especially if you cannot lock down changes made manually in a cloud environment, is nigh impossible. The same is not true in the K8s world. If someone manually deletes a pod, K8s will probably put it back.
  • It incentivizes GitOps. Once you’ve managed more than 2–3 applications in K8s, you’ll see the value of GitOps. Especially if more than one person helps manage them. Someone will forget to commit a values.yaml for a helm chart or a modification of a deployment or service. And then someone else will come along and edit it and break your resource because the changes were never committed or managed. Instead of manually making changes, using a tool like ArgoCD or Flux enables a single source of truth to describe your infrastructure.
  • It allows for the building of operators. If all you’ve ever worked with as a container orchestration engine is K8s, you may not realize the power of the operator pattern. Allowing for registering custom resources, watching changes to any resources, and acting on those resources by having a fan-out queue built into the core product enables more customization than anyone could have imagined. Things like managing Elasticsearch on K8s versus ECS or EC2 are so wildly different levels of commitment and support that it’s insane. On K8s, you can use the Elastic Operator, which handles ~90% of all the pain of managing ES. You must write all the automation yourself, host it somewhere, subscribe to events, etc., if you do it elsewhere.

Extending all of these benefits gives us tools like Config Connector (GCP Focused) and Crossplane (Cloud Agnostic) so that we can even provision all other pieces of service infrastructure that exist outside the K8s cluster. These tools allow the platform team to let developers use a single API for interacting with all the infrastructure their services need. Does the service need a Database? Use CNRM to stand up a cloud SQL instance. Does the team need a Pagerduty service connected to their K8s service? Use Crossplane’s terraform provider. Allowing teams to only interact with a single API to provision all of the resources is extremely powerful and sets your developers up for success.

The benefits don’t end there, however. From the platform team's perspective, requiring service resources to be created through the K8s API allows you to build consistent tooling for managing the creation and approval process.

Do you want to apply policies to prevent certain resources from being created, require certain metadata, restrict where things can be created, or require a certain naming schema? Use an admission controller, like Kyverno or OPA Gatekeeper. You may not even need that if you can use CEL in K8s. The benefit is that you don’t have to write one pipeline for K8s resources and another for Terraform/Cloud Formation/CDK. As a platform team, you write one consistent set of tooling for one API, which allows you to build effective RBAC and tests for any policies, limit the scope and blast radius of changes, and take advantage of all the benefits of the K8s API.

Do you, as a platform team, want to write abstractions to ensure consistency in the resources being created by service teams? Do you want to provide sane and opinionated defaults? Do you want to manage a single set of dependency upgrades? Then pick your favorite K8s package manager (helm, jsonnet, kustomize) and run wild! Write a set of composable charts that allow developers to turn on and off the infrastructure their service needs easily. They get it securely configured and set up from the start. And to get new features, they simply upgrade the chart version — there’s even automated tooling for that!

This isn’t simply me pushing the glory of K8s. I’ve seen this be incredibly successful. At Mission Lane, we had 250+ microservices using a Mission Lane Service Helm Chart. We supported 200+ developers with a very small infrastructure team. This chart allowed you to stand up a simple deployment, service, and virtual service. But if you needed a database, it used CNRM to create a CloudSQL instance in your project, stand up a CloudSQL proxy, configure IAM, and a GCP and K8s service account, all with three lines of yaml. There were plenty of opportunities for customization; developers could override almost any setting; however, most didn’t need to. They got a securely configured database out of the box. The same was true for GCS buckets, Redis instances, canary releases using Flagger, Istio configuration, open telemetry sidecars, etc. All came from the helm chart and allowed teams to quickly go from POC to fully productionized service.

Spending your time abstracting away parts of the stack that developers don’t need to interact with, being helpfully opinionated about the infrastructure their service needs, and embracing single APIs and mental models for developers to interact with will elevate your platform team from simply being effective, from good to being excellent, from being a useful, contributing team to being a force multiplier for an organization.

You don’t even have to use the K8s API. You could use Nomad or a homegrown API. But you should at least learn from what the K8s API has done so incredibly well because its usage isn’t just cargo-culting. It delivers incredible automation.

--

--

Sven Hans Knecht
Sven Hans Knecht

Written by Sven Hans Knecht

SRE/Platform Engineer Professional @Anomalo. Amateur Analytics and Sports Enthusiast

No responses yet