The move to Kubernetes

Towards the end of 2016, we were at a point where the team was expanding and I needed to review our web hosting infrastructure & CI/CD pipelines.

We were running ~40 EC2 instances on AWS and configured/managed these using Ansible. I was pretty happy with our set up and with AWS in general, but as we were a small team, it meant everyone had to be trained up in Ansible, we had to manage SSH keys and we were getting to the stage where dependencies were becoming a pain.

I started to think about how I can optimise our processes and our cost. I would look at our monitoring and be frustrated to see that our server CPU utilisation sat at around 10%, but we weren’t in a place to remove servers due to limitations with the setup (dependencies & resiliency) — that meant we were spending a significant amount of money on unused resources — a clear opportunity for improvement.

I’ve been pretty happy running on AWS and have never had any major problems, so I was by no means looking to move away from AWS, or EC2. I put together a list of what the infrastructure needed to do, and came up with the following:

  • Be fast, efficient, secure & reliable
  • Be able to run containers
  • Aut-scaling
  • Self-healing
  • Allow developers to launch sites and deploy updates without any knowledge of servers

After doing some research of different options, I narrowed it down to Apache Mesos / DCOS, Amazon Elastic Container Service (ECS) and Kubernetes.

My initial view was that whichever tool I decided to use, would be used on AWS EC2 servers as we had the knowledge and experience to run EC2 in production.

I started off by trying ECS as it is already built in to AWS. I found it to be slow, expensive and honestly pretty clunky. At the time of writing (this may have changed) every site required an Elastic Load Balancer to route traffic but with the number of sites we had, the price of running ECS just didn’t make sense for us.

I then went and installed Mesos & Kubernetes onto a cluster of EC2 servers. The Mesos UI is visually great, but the documentation was lacking massively. I could launch a container, but I couldn’t get any traffic to it — not ideal! Kubernetes on the other hand had an awesome guide on getting started and has many examples of getting your site up and running.

I used a tool called Kops to set up Kubernetes on EC2, and it worked fine. I was able to get sites up and running and set up scaling for the cluster and everything was working well. There were a few issues outstanding such as access control and managing updates, but I was sure I’d find solutions for these reasonably easily.

My team and I headed down to the Google Cloud conference in London to learn a bit more about Google Cloud and Kubernetes in general. I attended a session on running containers on Google Cloud and they done a demo on Kubernetes. It was so simple to create a cluster and Google had solved my remaining issues — access control & cluster updates. I logged in to the Google Cloud console, launched a small test cluster and within an hour, knew this is where we would moving to. Access to the cluster is managed by Google Cloud access controls and you can upgrade your cluster by literally clicking a link. I decided I would rather have Google manage the underlying servers rather than have to manage them myself.

As of October 2017, I’m delighted to say we now have the majority of our traffic routed to services running on Kubernetes hosted on Google Cloud Platform. All of these sites are created by really simple Yaml files and can be deployed to by simply pushing/merging to git branches, which then triggers our build pipelines (we use Bitbucket Pipelines). All passwords, keys and general configuration are set up using secrets injected as environment variables, so we don’t need to worry about avoiding checking in config files.

I would love to hear your Kubernetes stories, and would be happy to answer any questions you have. I will be doing a follow-up post with more detail about our setup and things learned along the way — let me know if there’s anything specific you would like me to cover.

I’d like to acknowledge the following people for their help in getting set up: