Misunderstanding the behaviour of one templating line — and the pain it caused our k8s clusters

How an assumption about the behaviour of a single line led to problems for all of the services receiving traffic through the ingress layer of our Kubernetes clusters

Kubernetes (often shortened to k8s) is Greek for ‘helmsman’ - the k8s logo is a ship’s wheel. Pictured: the sun setting behind an actual ship’s wheel in Marseille, France

tl;dr

Progress with ingress

Backends/Pods sorted alphabetically by Pod IP as seen in HAProxy’s stats screen in a staging environment

Hot-podding, a problem, and a patch

What the uneven request distribution looked like from the perspective of one service

“A spiral of failures”

Can you guess where we switched to Service VIPs?

The mother of all mistakes

Request distribution for the same service post patching
Randomly sorted backends in the same staging environment as above — Perfect for a good even request distribution

Lessons learned

We’re hiring

We’re hiring!

About the author

Guy Templeton (author)

We are the engineers at Skyscanner, the company changing how the world travels. Visit skyscanner.net to see how we walk the talk!

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store