Migrating our Kubernetes cluster

Guiomar Valderrama
Our developer stories
4 min readApr 22, 2019

How we moved our K8s cluster from StackPoint to DigitalOcean.

Who doesn’t love Sammy?

Preface

After the Docker Cloud orchestration disaster we managed to move everything to Kubernetes last year before the service shut down. At the time we chose to set up a managed Kubernetes cluster with stackpoint.io, one of the few services we found that offered DigitalOcean as a provider.

Since, DigitalOcean’s own managed Kubernetes service has come out of beta and is currently offered on Limited availability.

A week ago we received a notification that the SSL certificates for our Stackpoint cluster where going to expire, and a recommendation to upgrade our Kubernetes version. At Stackpoint we were running our cluster with a 1.10 Kubernetes server version, currently the latest stable version is 1.14.

A new cluster had to be created, and while we were at it, why not do it directly with DigitalOcean?

Moving Parts

Our cluster contains several namespaces each with its own services, secrets, and deployments, as well as some daemonsets and RBAC. Other than that the databases are outside the cluster and set as endpoints.

All of it is contained in yaml files.

Creating the new cluster

Creating a Kubernetes cluster with DigitalOcean is very straight forward, simply select the server version (1.13 is the latest available), the region and create node pools with selected droplet types.

To avoid the extra cost while setting up, we created a node pool with 3 droplets. Make sure you choose the droplet type you will actually use since its the only kind of droplet that can be created in that node pool, to create a different type you would need a new pool.

The tag k8sand another with the cluster hash will be added automatically. Additionally the droplet nodes for the cluster will be tagged k8s:worker.

The kubeconfig.yaml can be downloaded from the cluster view. Make sure your kubectl version is close to the server version chosen.

You must use a kubectl version that is within one minor version difference of your cluster. For example, a v1.2 client should work with v1.1, v1.2, and v1.3 master. Using the latest version of kubectl helps avoid unforeseen issues.

Migrating

Setup

Before applying our project yaml files we installed Keel (our CD tool) and the Scalyr (log aggregation+) Daemonset.

With this in place we simply went project by project applying namespaces, secrets, services, and deployments.

To not overload our 3 node cluster we lowered the replicas to 1 for nginx and wsgi deployments and 0 for celery and celery beat. A deployment with 0 replicas is still applied and can be later scaled to the desired number with ease.

Next we made sure:

  • Keel was updating the pods
  • Logs from the new pods were reaching Scalyr
  • Load balancers were created and allowed access to our services
  • The databases could be reached

Kubernetes version change

As I mentioned, we had been running with v1.10, and now updated to v1.13. This shouldn’t have been a problem, however from v1.11 upwards the default DNS-based service discovery changed from Kube-DNS to CoreDNS and fixed some existing bugs.

It just so happens Kube-DNS allowed an improper configuration using an ExternalName Service with an IP to access external services, and this doesn’t work with CoreDNS.

This meant our external databases could not be reached from within the cluster with our old configuration.

The correct way to do this is to use a ClusterIP with no IP (headless) and an Endpoint with the external IP. Depending on how you access your external service other solutions may be appropriate.

After deleting the created services we applied the new configuration and, sure enough, are databases were now accessible!

So awesome

Becoming prod-ready

We were sure everything was working, so we created more nodes (each node pool can have up to 10 droplets) and scaled our deployments.

kubeclt scale --replicas=10 deployment/{deployment_name} -n {namespace_name}

Creating the nodes can take a bit, it’s better to wait for all nodes to be ready before rescaling, or your pods might end up cramped.

We manage our DNS records with Cloudflare, where we changed the A records to point to the new load balancers.

Cleanup

Now everything is in DigitalOcean the old cluster can be destroyed.

Or that’s what we would have liked. Sadly some legacy clients were using a fixed IP to one of our load balancers. A notice has been issued to these clients to properly configure their services with the intention of shutting down the previous cluster in a month’s time.

Additional notes

Stackpoint let us know of the coming change a month in advance, offered to do the change of the SSL certificates themselves, and even provided tips and suggestions for the migration of the cluster to a greater version. Their management of the service has been excellent.

Moving to Digital Ocean was already part of our plans though and we took the chance to migrate the cluster.

More importantly, it turns out Stackpoint has ended its DigitalOcean support and now recommends using DigitalOceans’ managed service. We would not have been able to create a new cluster with DigitalOcean as provider.

--

--