Upgrading Kubernetes VMs on Azure With No Downtime!

Maxim Vorobjov
Comae Technologies
Published in
3 min readJul 3, 2017

After Kubernetes cluster is set up in Azure the VMs sizes cannot be changed via ACS interface, both portal and CLI, while the full cluster set up, will most likely involve some downtime (at least when a database is hosted in Kubernetes too). The below procedure is a guide on how to manually upgrade Kubernetes agent VMs sizes in the same ACS cluster.

  1. First of all make sure kubectl is working in the right context:
$ kubectl config current-context
acs-dev

Please note* The output of provided commands, will depend on your setup.

2. Find out which agents critical pods are running on and list of all the nodes, for instance in the example we have mongo-0 pod running MongoDB primary:

$ kubectl describe pod mongo-0 | grep agent
Node: k8s-agent-101add22-2
$ kubectl get nodes
NAME STATUS AGE VERSION
k8s-agent-101add22-0 Ready 52d v1.5.7
k8s-agent-101add22-1 Ready 52d v1.5.7
k8s-agent-101add22-2 Ready 52d v1.5.7
k8s-master-101add22-0 Ready,SchedulingDisabled 52d v1.5.7

We will manually relocate MongoDB primary. The other nodes can be drained forcefully one by one, for instance k8s-agent-101add22–0 and k8s-agent-101add22–1.

3. Start with k8s-agent-101add22–0:

$ kubectl drain k8s-agent-101add22–0
node “k8s-agent-101add22–0” cordoned
error: pods with local storage (use — delete-local-data to override): antares-646410695–26hhp, elasticsearch-logging-v1-vpcfs, monitoring-influxdb-grafana-v4-x2lt6; DaemonSet-managed pods (use — ignore-daemonsets to ignore): fluentd-es-v1.22-wpf84, kube-proxy-vqln3

In the above command, kube cluster reported that evicting the pods will result in some data loss. Most of the above pods are monitoring services and can be forcefully relocated, antares has a local disk cache in emptyDir which can be purged too. After making sure no clients will be affected we will then forcefully drain the node:

$ kubectl drain k8s-agent-101add22–0 --delete-local-data --ignore-daemonsets
node "k8s-agent-101add22-0" already cordoned
WARNING: Deleting pods with local storage: antares-646410695-26hhp, elasticsearch-logging-v1-vpcfs, monitoring-influxdb-grafana-v4-x2lt6; Ignoring DaemonSet-managed pods: fluentd-es-v1.22-wpf84, kube-proxy-vqln3
pod "http-backend-404-2860667897-094jb" evicted
...
pod "supernova-2130880644-qtm90" evicted
node "k8s-agent-101add22-0" drained

Wait until all pods recreate with the help of the following example:

$ kubectl get pods
NAME READY STATUS RESTARTS AGE
antares-646410695-97v9w 1/1 Running 2 1h
...
galaxy-86719696-pkpw3 1/1 Running 1 1h

Now you can safely change the size of VM k8s-agent-101add22–0, for instance via Azure Portal. After waiting for agent to come back online:

$ kubectl get nodes | grep k8s-agent-101add22-0
k8s-agent-101add22-0 Ready,SchedulingDisabled 52d v1.5.7

If the node does not come to a “Ready” status for longer than 5 minutes you might need to ssh to the agent VM through master VM and check that kubelet container is running using docker ps | grep kubelet.

Notice Ready, SchedulingDisabled meaning agent does not run new pods, to get it back working uncordon the agent node:

$ kubectl uncordon k8s-agent-101add22-0
node "k8s-agent-101add22-0" uncordoned
$ kubectl get nodes | grep k8s-agent-101add22-0
k8s-agent-101add22-0 Ready 52d v1.5.7

Success, agent-0 is ready! Time to repeat the same procedure with agent-1.

4. With agent-2 first manually relocate mongo-0 pod:

$ kubectl cordon k8s-agent-101add22-2
node "k8s-agent-101add22-2" cordoned
$ kubectl delete pod mongo-0
pod "mongo-0" deleted
$ kubectl get pods | grep mongo-0
mongo-0 0/2 ContainerCreating 0 13s

After mongo-0 comes online repeat the same drain/uncordon procedure as with the other nodes in the previous point.

Your upgraded cluster is ready!

If you liked this blog post you may also find the following useful https://blog.comae.io/azure-search-with-mongodb-to-build-scalable-full-text-search-af0b28f4e0e7

--

--

Maxim Vorobjov
Comae Technologies

Efficiency and quality are important aspects in software engineering