Kubernetes is a big project — it has a wide range of users and a large number of people working on it at any given time. For simple bug-fixes and small changes, issues/pull requests are great. However, when trying to convey a much larger change, where a lot of different people may need to be involved, you need something more. This is where Kubernetes Enhancement Proposals (KEPs) come in. In this blog post I’ll talk about my experience, as a relative community outsider, trying to contribute one.

Background

Image for post
Image for post
Sidecars are a relatively common design pattern when running applications in pods

We hit problems working on our Vault sidecar container when attempting to use it as part of a Job: when the primary container process completed, the sidecar wouldn’t recognise it should exit and the Job would never complete. It turned out this was a fairly generic problem — imagine you have a Job where you have two containers; one is doing a task (e.g. database migration) and the other is doing something to assist (e.g. mysql proxy). When the container doing the task has completed, the other container doesn’t know to exit so the job will never finish. …


No it doesn’t delete half your metrics.

Standalone Prometheus is pretty great: it provides a great query language along with a simple unified way of collecting and exposing metrics. Making Prometheus be highly available and scalable, however, can often be a bit of a challenge.

The key features we needed were:

  • Highly available Prometheus
  • Single place to query all of your metrics
  • Easily back up and archive data

This is where Improbable’s Thanos comes in.

Making Prometheus HA

Thanos at its most basic allows you to query multiple Prometheus instances at once and has the ability to deduplicate the same metric from multiple instances. …


How we at uSwitch managed to get all our applications to use short-lived database credentials without changing any of their code (almost).

Static database credentials tend to slowly accumulate and get spread around in most organisations and, over time, they become a security liability and need rotating. uSwitch is no different.

This often causes untold pain when you have 50 different services all using the same password with each having to get it from a different place — meaning you have to change each one individually. Not only is this a time sink, you also have no way of knowing what service is interacting with your database at a given time. …


Image for post
Image for post
Yggdrasil, from Norse mythology, is the world tree that links the nine realms.

Yggdrasil is a tool we wrote to allow our services to be load balanced across multiple Kubernetes clusters running in AWS. It behaves as an Envoy control plane, generating configuration from Kubernetes Ingress resources. Yggdrasil is agnostic to the Ingress controller allowing it to work with existing resources.

At uSwitch we’re running almost everything on Kubernetes (you can read more about that here). It’s brought us a lot of benefits, but people were concerned it would introduce a single point of failure.

In an ideal world nothing would ever break, but we all know that occasionally an upgrade can go awry or some unanticipated scenario can take out your cluster. For some of our most important applications this was a deal breaker because going down even for a few minutes could lose us a lot of money. …

About

Joseph Irving

DevOps/Platform Engineer at uSwitch.com, mainly focused on Kubernetes and Go

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store