Data Querying
Published in

Data Querying

Performing Web/API Service upgrades without Downtime

By leveraging Kubernetes rollouts.

This is a series of post describing how the Hue Query Service is being built.

Automation well done frees-up from repetitive manual tasks while also documenting the process: team members get more productive at working at adding value instead and keep the momentum.

Now, how to update automatically the refresh of the project websites without any downtime and manual steps. as well as (and not to forget all run in small containers in a main Kubernetes cluster. Containers might be a bit overweight for this type of static websites, but they allow the helpful pattern of being driven automatically via source code changes and harmonizing all the services to follow the exact same flow.

i.e. also reuses the same deployment logic, as well as other database engines offered in the demo website. Those websites are also driven via code changes in GitHub, not via any UI.

For example, here are the running websites:

kubectl get pods -ngethue
docs-55bf874485-vjnlf 1/1 Running 1 8h
website-5c579d4dd-kqlvt 1/1 Running 0 60m
website-jp-964f9cc57-h97gz 1/1 Running 0 6h38m

Until recently we were performing daily restarts the “hard way”:

kubectl delete pods -ngethue `kubectl get pods -ngethue | egrep ^website | cut -d" "-f1`

This “works” but induces some non required downtimes and “noise”:

Hammered by “website is down” notifications

Now, the standard kubernetesrollout command is being used, and the transition is transparent for the admins and public users!

kubectl rollout restart -ngethue deployment/website
First diagram from the Kubernetes documention demoing a rollout

Start of the new websiteinstance/pod and swapping with the old one when ready:

kubectl get pods -ngethue
docs-55bf874485-vjnlf 1/1 Running 1 13h
website-75c7446d4c-z5p6g 0/1 Running 0 6s
website-bb6fc6b6-nkzqh 1/1 Running 0 18m
website-jp-964f9cc57-h97gz 1/1 Running 0 11h

Note that latest tag is being used here, and a new image gets built daily when the repository mirror get synced. The image building of the static websites is very simple and has very low chance of failing or shipping an incorrect image. By leveraging proper tagging, all the states would be versioned and filling upgrades would automatically roll back to a previously valid state.

Current requirements are “100% automated as simple as possible with daily frequency”. But what if we would like a more “real time” rollout? (e.g. after each commit or pull request or hourly). This in the plan and will be detailed in a follow-up blog post.

Any feedback or advice? Feel free to comment!




Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store