The Hands-On Benefits of Kubernetes

Published in

Hitachi Solutions Braintrust

8 min readNov 11, 2020

When I started as a software developer, the biggest collective pain points of my team were manning the On-Call phone, and handling software deployments. The reason they were such large pain points is because they ate into the development team’s already thin “off work” hours, and it was commonplace to beg, barter, and steal favors from one another to ensure that you had coverage for any important events in your life (and heaven help you if you drew the short straw and had the phone over a major holiday, like Christmas). Deployments were just as bad, as they required a four-hour window, typically late at night or over a weekend, of sitting on the phone waiting for something to go wrong. The net of both of these things was a lot of highly paid developer time spend accomplishing very little.

If you think about why these two things are needed — developers sitting on a deployment and a developer being available whenever there is an issue — the business need comes from the same place. There’s an expectation that the code will fail, either during the deployment or at a critical time that will cost the company money. Additionally, there’s the expectation that the failure will be severe enough and difficult enough to diagnose that a developer needs to be the first point of damage control. While the first expectation is spot on — code will always fail eventually — the second is more suspect, and easily mitigated.

DevOps as a philosophy was designed to counter these very things. Developers have long realized that the biggest detractor from the quality of their software wasn’t the codebase itself, it was the method by which it was deployed and maintained. Because of this, I’ve spent countless hours writing out deployment instructions for fairly trivial things in my career. When you have developer’s code being deployed by an infrastructure guy who has never spoken to the developer, things can go wrong regularly (and usually do). Companies should not expect the infrastructure guy to be familiar with the developer’s code, nor should they expect the developer to be intimate with the ins and outs of infrastructure and how deployments are run, at least so long as the two are heavily siloed. However, both of these people get punished for this lack of communication, because they both end up on the four-hour deployment call and both get called up when anything goes wrong.

So What Does This Have To Do With Kubernetes?

My personal journey with Kubernetes started after working for two companies back to back that had the same kind of mindset that I outlined above: Developers sitting on long deployment calls, infrastructure teams siloed away from developers, and the On-Call phone ringing constantly. I had heard buzzwords like DevOps, but at the time it just sounded like a way to have the infrastructure guys hoist work onto already overworked developers. Those aspects of my job made it miserable at times, and I know I’m far from the only developer who had to work over Christmas because a configuration wasn’t put in right, or who had to miss an important event because something stopped working.

Then I started working on a team that was trying out Docker and Kubernetes as a proof of concept for migrating services to the cloud for a manufacturing company, and that completely changed my outlook on how deployments could be done. By the time I left that team, the norm was for deployments to basically be unmanned, with our product owner checking out things a couple hours later. The number of middle-of-the-night emergency calls we got around deployments was two over the course of almost three years — one due to an issue with a Kubernetes update getting stuck and the other due to a migration not applying correctly. Given that we’d done dozens of deployments over that timeframe (our cadence was two weeks), and each of those issues took less than an hour to fix, the net time saved by our development team was easily just shy of 500 hours.

Not all of that benefit can be awarded to Kubernetes itself, as we also used Azure Devops and its Build and Release systems for CICD, but Azure Devops and Azure Kubernetes Service (which is just Azure’s managed version of Kubernetes) work together so well that it a great addition to any Agile team. So what benefits are directly attributable to Kubernetes itself?

Container Orchestration

Kubernetes works on the idea of a desired state. You tell Kubernetes (through a .yaml file) what you want, and then Kubernetes tries to accomplish it. For example, let’s say I have a time and temperature service that gives out the current time and temperature outside in my home city when I call it. This might be a simple service that only I use, and I don’t care if it’s up or not, so I would give Kubernetes a .yaml file like this:

apiVersion: apps/v1 
kind: Deployment
metadata:
  name: time-and-temp-deployment
spec:
  selector:
    matchLabels:
      app: time-and-temp
  replicas: 1
  template:
    metadata:
      labels:
        app: time-and-temp
    spec:
      containers:
      - name: time-and-temp
        image: time-and-temp:1.0.0
        ports:
        - containerPort: 80

This .yaml file tells Kubernetes that my desired state is to have a deployment (a structure in Kubernetes that manages small groupings of containers called pods) called time-and-temp-deployment that runs 1 copy (replica) of a pod that uses image time-and-temp version 1.0.0. Kubernetes then takes this .yaml file and if there’s not a deployment called time-and-temp-deployment, it will create it. That deployment then looks to see if there’s at least one pod that is running time-and-temp version 1.0.0. If that pod does not exist, it creates it.

Expanding on that example, let’s say that instead of an application for personal use, my company now needs to use this service, and it has to have high availability and uptime. I can increase the number of replicas it creates and maintains in the .yaml to three or more, and Kubernetes will spin up additional pods for this deployment. I also could use something called Pod Topology and ensure that the multiple pods in my time and temperature service were running on different nodes within the Kubernetes Cluster, so if there was a critical failure in one node, my service would still be available.

What’s even cooler is that I can dynamically scale this deployment based on demand, so if this service has large spikes in use, Kubernetes can spin up additional pods to meet that demand, and spin them down when they are no longer needed. This barely scratches the surface of what you can do.

The real benefit and power behind this concept, though, is the fact that I, as a developer, can package the deployment information with the container that I built my code in, without having to write out pages of instructions for the infrastructure team to deploy. Instead, through tools like Helm, I can define exactly how I want my code to deploy, and push it. Sure, there’s some up front cost in terms of getting the CICD correct, but that cost is miniscule compared with the hours of writing deployment documents (not to mention time spent on deployment calls).

Deployments As a Package

While not unique to Kubernetes, as Docker and other containers are the driving force behind it, using Kubernetes allows you to move one image from environment to environment, and be reasonably confident that it has every configuration value it needs access to and will work when deployed to a production environment. Minor differences between environments can be a major pain to a successful deployments. Packaging everything into one image, and that identical image being what is deployed up the chain to different environments, reduces the number of things that can go wrong during a deployment. What Kubernetes does allow you to do is deploy configuration maps so that you can use one, tested image in each of your environments and have it backed by different connection strings based on what environment it is in.

The diagnostic benefit of this is huge. You can know you have an identical image running in your UAT and production environments, which means you can easily do A to B testing between those environments to figure out what is different. You can look at the config maps to see if anything is wrong/malformed. If you are using a CICD tool like Azure DevOps Build and Release, you can even compare these values directly in the tool.

Additionally, the ability to seamlessly roll back is great. Anything that is within the container itself can be swapped out for the older image by updating the .yaml to target that image instead. This is miles ahead of where things used to be, when it was standard practice to copy the entire code folder to a backup before pushing new code.

Deployments, Heal Thyself

One final thing I am going to touch on is the self-healing nature of Kubernetes. As I mentioned above, Kubernetes works based on the concept of a desired state. It then compares its current state to the desired state, and makes corrections based on that comparison. For example, if the desired state is two pods, and there’s only one, it will start another pod. This is extremely powerful.

Imagine you have three servers running your app, and one of them goes down. Traditionally, this would mean your On-Call phone ringing, and having to hop onto a call where the infrastructure team restarts IIS on the machine, or restarts it. The time to get all that to happen can sometimes be over an hour, especially if there are approvals to restart a server and other red tape. Kubernetes, on the other hand, can be configured so that if a service isn’t responding, it restarts the pod automatically. If that pod is unavailable long enough, Kubernetes could also spin up a new pod and kill the old one.

A real life example of why this automatic behavior might be useful happened to me recently. We had a service running that had a memory leak. We did not control the code, and really couldn’t do much to fix the issue, but we also couldn’t just let the memory use balloon out of control. We used some of the built-in features of Kubernetes to run two health checks: A Readiness Check (pod should or should not receive traffic) and a Liveness Check (pod is strictly healthy or not). We used the Readiness Check to cordon off a pod when it started getting close to its memory threshold, so that we wouldn’t accidentally interfere with API calls to the service, and then if the memory usage remained high, the Liveness Check would fail, causing a restart of that pod. In this way, we were doing the equivalent of a server restart any time memory usage became an issue, but in a completely automated and seamless way.

Final Thoughts

Kubernetes is a very powerful DevOps tool whose power can be used effectively to remove many of the pain points of deployments and maintenance of code. Additionally, its self-healing nature means the amount of human intervention needed (and potential human error introduced) is greatly reduced. All of this ensures that developers are spending their time writing or improving code, instead of babysitting deployments or being part of lengthy after-hours calls.