How we run Kubernetes Engine at Travix

Jorrit Salverda
Travix Engineering
Published in
5 min readJan 17, 2018

At Travix — an Online Travel Agency — we’ve been early adopters of Kubernetes, running applications on Kubernetes Engine since it’s alpha release. In the early stages it was quite adventurous and exciting to run applications on top of it — we had some hot standby vms just in case — but these days fortunately it’s quite boring as it does what it needs to do.

What’s very useful about Kubernetes is that it has made a lot of operational actions a thing of the past. For example, we no longer have to handpick a host to run a newly built application. It also allows us to automatically scale the Kubernetes Engine clusters as a whole and the applications orchestrated by it in order to make use of the cost elasticity of the cloud.

But probably even better is the programmability of Kubernetes in order to automate all the things. We use this to configure dns records, renew TLS certificates and more. This is really what makes it so powerful.

With all of these benefits and more combined we’ve seen the rate of new applications being built and released within Travix increase massively since we started using Kubernetes. We’re currently at 252 deployments, 12 statefulsets and 6 daemonsets in our production cluster.

Our clusters

We run 11 Kubernetes Engine clusters in total. We have a separate cluster for each environment, a split for our regular applications and PCI-related ones, a CI/CD cluster. And then some. Although running these in separate clusters loses us some benefits of scale it allows us to test out new versions of Kubernetes before we upgrade the production cluster.

Each cluster runs an autoscaling node pool of preemptibles and a node pool of regular vms. The regular vm node pool is kept at 0 instances as long as we can get our hands on preemptibles, but is there to act as a safety net for when preemptibles are preempted en masse. More on those preemptibles later.

In a couple of rare scenarios we have node pools with taints to keep all but one application of off them. These are either applications that we want to keep off of autoscaling preemptible nodes because they have only 1 replica— not what you would call cloud native — or ones that have special requirements for the hardware.

In the future it would be interesting to run production applications across multiple clusters in order to test drive new Kubernetes versions with small portions of production load to take the risk out of cluster upgrades. We’ve seen too often that the first versions of a minor release have regression in parts essential to us, like cluster autoscaling or affinity rules.

Preemptibles

Anyone at Google — and newcomers at Travix — you tell that you’ve been running production on preemptibles for over 2 years thinks your mad as they’re not necessarily intended for running always-on applications. But the reality is that only on a single occasion the machines have been preempted all at once. Meanwhile the cost savings are enormous. And in essence it’s just a regular vm that gets replaced a bit more frequently. Just like autoscaling does when scaling down a cluster.

To lower the risk of losing all machines at once we’ve built the estafette-gke-preemptible-killer Kubernetes controller to distribute deletion of preemptibles over time, so they won’t hit their 24 hour limit all at once. When a new node is added it sets an expiry time in its metadata. When that time is passed the node is deleted. This spreads their lifetime between 11 and 23 hours.

The cluster autoscaler doesn’t take the lower price of the preemptibles into account and randomly scales up the preemptible pool or the regul vm pool. We’ve written the estafette-gke-node-pool-shifter controller to ‘move’ regular vms over to the preemptible node pool, by adding a preemptible and deleting a regular vm whenever one is added by the cluster autoscaler. This allows us to run the cluster at very low cost, but let the cluster autoscaler quickly add regular vms when preemptibles are in short supply.

Resiliency

An added benefit — besides the lower costs — of preemptibles is that some edge cases are hit more frequently. This forces you to make sure all of your applications handle their sudden shutdown well, making your platform more resilient to unexpected disruptions. You do this by handling SIGTERM — the signal Kubernetes sends to stop a pod — in your container and making sure you’re done in 30 seconds (the maximum time you get when a preemptible gets preempted).

This is particularly hard for 3rd party applications since their code is out of your control, but we’re running the following ones quite successfully: Prometheus, Grafana, ElasticSearch, CockroachDB. We have had less success with Couchbase and Redis so far. For us an application isn’t cloud-native until we can run it successfully in Kubernetes Engine on top of preemptibles without too much operational effort.

To make sure our application pods don’t end up on the same preemptible host and all get killed off at the same time we use podAntiAffinity rules to distribute them across hosts. And combined with PodDisruptionBudgets this makes cluster upgrades and autoscaling safe to do at any time.

Self-service

To lower the need for manual tasks even further and allow development teams to get their application from the cradle onto production without depending on other teams we’ve automated creating/updating public DNS records by annotating services and ingresses to get picked up by the estafette-cloudflare-dns controller.

For generating and renewing TLS certificates we process annotated secrets with the estafette-letsencrypt-certificate controller. Our Nginx sidecar container automatically picks up renewed certificates so expired certificates are a thing of the past and require no manual intervention.

All in all the level of self-service has increased enormously, which has taken the operational team fully out of the loop for all of our stateless applications. When it comes to storing state in a database or cloud storage the operations team is still involved for creating those and setting up roles and permissions for the application making use of it.

What’s next?

All of our clusters still run in a single zone. With the ability to create regional clusters in beta we’re looking forward to be able to convert existing clusters to regional ones to be resilient against zonal outages (never seen one so far) and have zero downtime master upgrades.

We’re also looking forward to Kubernetes on Windows to see if we can move all of our classic .NET applications into Kubernetes without having to rewrite them to .NET core at once. This will make it much easier to bin pack multiple applications on the same host and will allow them to make use of service discovery, log shipping and other automation we already have in Kubernetes. Let’s hope Google will support this in Kubernetes Engine in the future.

--

--