Fixing Kubernetes App Deployment Solves ‘Day 2' Problems

Chase Roberts
Vertex Ventures US
Published in
3 min readMay 6, 2024
Source: Midjourney

In a thoughtful post about “Kubernetes Day 2 Operations,” Justin Yue of Geodesic Capital highlights the tooling needed to streamline the notably complex work after deploying application code in containerized environments. Building on his thoughts, I’d argue that Day 2 complexity very often results from broken Day 1 operations — specifically deploying application code to Kubernetes (AKA “K8s”) infrastructure.

Talk to any DevOps person, and they’ll describe app deployment (“Day 1”) as configuration hell. These folks operate in a sea of YAML files for deployment, services, and ingress resources. Most apps are architected as microservices, each with its own dependencies and configurations. Implementing these interactions is a nightmare, so DevOps teams introduce message queues, API gateways, and service meshes. So, more complexity. Automation? Not without custom scripting.

And, as Justin points out, engineering teams purchase additional software to address over-provisioning. Datadog released a report earlier this year stating that 65% of containers use less than half of the requested CPU. That’s crazy. Smart infra folks at public companies have shared figures below 20% with me. But why? Because rightsizing infrastructure is complex!

But Day 1 operations aren’t only broken for DevOps teams; they’re broken for developers, too. How often have you heard something like: “My application worked in the development environment, but then broke in staging and production.” (Realistically, it’s more like, “WTF? My app worked in dev and broke in prod again!!!”) Consistency across environments — dev, staging, and prod — is difficult since slight variations cause apps to break.

In larger companies, developers wait in resource breadlines. It can take days or weeks for DevOps teams to provision an infrastructure resource. Self-service infrastructure is more of a dream than a reality. Troubleshooting using observability (aka “o11y”) tooling is complex when dealing with third-party infra providers, heterogeneous environments, and human-configured infrastructure — we make mistakes, after all. Cloud-specific APIs and extensions differ, yet engineering orgs struggle to shield this complexity from developers.

Justin’s post mentions Day 2 security requirements. In the 2024 Kubernetes Benchmark Report, the Cloud Native Computing Foundation wrote:

Kubernetes is famously not secure by default, requiring a review of configurations to identify potential security issues. The latest report shows that 28% of organizations have more than 90% of workloads running with insecure capabilities, down from 33% in 2023.

This next statement is undoubtedly controversial, but many of the problems that cloud security posture management (CSPM) tools resolve are self-imposed. Infra configuration is error-prone–the quote above indicates as much–and bad actors can exploit these errors! Cloud security tools are still important, as they solve more problems than infrastructure misconfigurations. However, these misconfigurations should be preventable with better guardrails on Day 1.

I’ll restate the punchline: Day 1 app deployment problems produce the need for Day 2 operational tooling. We’re spilling the milk and then looking for the best solutions to clean it up. What if we stopped spilling the milk altogether? Imagine a world where app deployment isn’t unnecessarily complex for DevOps teams and causing terrible developer experiences. The Day 1 problems are solvable. Engineering teams deserve a solution that automates golden paths to production, thus minimizing or eliminating infrastructure configuration and hiding infrastructure from developers with a consistent, self-service experience. That’s the world I want to live in.

--

--