We recently moved MicroBadger from Docker Cloud to a Kubernetes cluster on AWS. It’s early days but this post is a write up of what we’ve learnt so far and some mistakes made along the way!
First some background on MicroBadger, our tool for managing container metadata. Metadata can be a key tool for operating containers at scale, providing context on what your containers are actually doing. We’re also working with the community to develop Label-Schema.org as a shared namespace for container labels.
We started using Docker Cloud back when it was still Tutum for our Microscaling-in-a-Box site. We liked its ease of use and not having to run our own master node. So we also used it for launching the MVP version of MicroBadger.
However over time we became less happy with it. Partly this was due to cost and the per node pricing. Also, although the web UI is useful it can be a bit clunky. It was quite hard to navigate to a service and the terminal for connecting to a container is quite basic. A new version of the UI is being developed but it didn’t seem to be a major improvement.
However the main reason for switching was because we wanted an automated rolling deploy process. It’s possible to do rolling deploys with Docker Cloud but we thought it was time to move to a more powerful orchestrator.
We’ve used DC/OS before and our Microscaling Engine integrates with the Marathon scheduler. However we felt Mesos was too heavyweight for our needs. We also considered Nomad as we’re big fans of the Nomad team and Hashicorp’s tools.
But in the end it turned into a straight choice between Docker Swarm and Kubernetes. We liked the Swarm mode enhancements in Docker 1.12 particularly having service discovery as a built in capability. But the key feature we wanted was rolling deploys and this is a core feature for Kubernetes.
We’re also very fortunate to have Kelsey Hightower as one of our advisors at Microscaling. Although Kelsey didn’t directly recommend us to use Kubernetes we’ve seen a lot of his talks! ;-) We like how Kubernetes is being developed and the energy of the k8s community. So in the end we chose to go with Kubernetes.
Bootstrapping the cluster
To bootstrap the cluster on AWS we used kops (Kubernetes Ops). This worked well although there are some restrictions like having to create a new VPC for each cluster. It has some nice features like integrating with Terraform which we might use in future.
We’re running Kubernetes 1.3 but now 1.4 is out and has kubeadm which looks like a great alternative for bootstrapping clusters.
Running kubectl as a container
Wherever possible we like to run software as containers so for the kubectl client we’re using a great docker image from Lachlan Evenson. We use 2 shell aliases called kubectl-stage and kubectl-prod. The aliases start the container and connect to the correct cluster. They also mount the current directory. This is so we can use kubectl create -f to create new objects such as services or deployments.
alias kubectl-stage='docker run -it --rm -v ~/.kube:/root/.kube -v `pwd`:`pwd` -w `pwd` microscaling/k8s-kubectl:v1.3.6 --context=staging.cluster.name'
Lightweight deploy process
We’re still a small team so we don’t yet need a complex CI/CD process. Instead we’ve extended our existing Makefile to trigger the rolling deploy in Kubernetes.
$ make deploy CLUSTER=staging DEPLOY=microbadger-api
- Builds Go binary.
- Builds Docker image and tags it as [version]-[git commit sha].
- Pushes the image to Quay.io.
- Calls kubectl patch to update the image version for the deploy.
- Kubernetes does a rolling deploy for the deployment.
$ make deploy CLUSTER=production DEPLOY=microbadger-api
The production deploy has basically the same steps but does some extra validation.
- Checks there is a git tag matching the version.
- Checks the version matches the k8s deployment file.
- Image is tagged with just the version.
- Calls kubectl update to update the entire deployment.
Aargh badger down!
Deploying the MicroBadger API went smoothly but for the website deploy we were also moving from a static Angular.js site on CloudFront to a simple Rails app running in a container.
After the deploy there was a config bug so the new app wouldn’t start. When trying to fix the problem I made a rookie error. Instead of deleting the pods and letting Kubernetes recreate them I decided to delete and recreate the service. But deleting the service hung, meaning the site was down :(
The problem was because when running Kubernetes on AWS your k8s services are mapped to an ELB (Elastic Load Balancer). We also use Route 53 from AWS for our DNS with an alias record that points at the ELB. But you can’t delete an ELB if it’s the target of an alias.
So to fix the problem I needed to delete the A record for microbadger.com! This let me delete the service in Kubernetes and then recreate it. But I should have deleted the pods rather than the service, as recreating pods is a much faster process.
Its early days for us with Kubernetes but so far we’re happy users. There is a lot to learn and some complexity but I’ve found the design to be solid and powerful. For example I like the secrets support and keeping secrets separate from regular config which can be kept in a config map.
We’d love to hear your thoughts in the comments on whether we chose the right orchestrator! Also if you’re using Kubernetes, are we doing it right? How could we be using k8s more effectively?