Extending your Kubernetes Cluster into a new AWS Availability Zone with kops

Published in

Appvia

6 min readJan 29, 2018

If you are not familiar with kops. It is a tool that allows you to build and manage your Kubernetes infrastructure in cloud. It is something we aligned to, as we started using Kubernetes since 0.6. We have been running Kubernetes in production for 2.5 years and like most people had created our own tool to manage the buildout of Kubernetes Clusters.

We have since favoured aligning to the industry where we can and contributing to a tool that is part of the Kubernetes ecosystem. Since then, we have been running kops happily, and this article is to help others in a similar position leveraging an additional AZ once they have a running cluster.

AWS Recently announced availability of a 3rd Availability Zone in the London Region, below is the process taken to leverage this with links to PR’s we have made to help make the process easier.

This guide assumes you have:

A working Kubernetes Cluster built with kops (https://github.com/kubernetes/kops)
VPC, Route Tables & Subnets managed by kops
A subnet range available within your VPC (you would not be able to do this if the VPC CIDR is fully allocated)
5 Master instances distributed across 2 Availability Zones
ETCDv3 with TLS
Using the following version (or more recent) of kops: https://github.com/kubernetes/kops/tree/e271f9edfb701b4aeda7b1ea7fcb64facdddc47d. We added a PR into kops to support updating the API Load Balancer when a new subnet range is added. This isn’t yet in an official release, hence why the above version is required. Official documentation is available on building kops from source.

Firstly, lets run a validate on the Cluster to make sure everything is healthy before we start:

➜ kops validate clusterUsing cluster from kubectl context: test.k8s.appvia.io
Validating cluster test.k8s.appvia.ioINSTANCE GROUPS
NAME ROLE MACHINETYPE MIN MAX SUBNETS
eu-west-2a-master1 Master t2.medium 1 1 eu-west-2a
eu-west-2a-master2 Master t2.medium 1 1 eu-west-2a
eu-west-2a-master3 Master t2.medium 1 1 eu-west-2a
eu-west-2b-master1 Master t2.medium 1 1 eu-west-2b
eu-west-2b-master2 Master t2.medium 1 1 eu-west-2b
nodes Node t2.medium 2 2 eu-west-2a,eu-west-2bNODE STATUS
NAME ROLE READY
ip-10–100–0–13.eu-west-2.compute.internal master True
ip-10–100–0–170.eu-west-2.compute.internal node True
ip-10–100–0–21.eu-west-2.compute.internal master True
ip-10–100–0–56.eu-west-2.compute.internal master True
ip-10–100–1–170.eu-west-2.compute.internal master True
ip-10–100–1–23.eu-west-2.compute.internal node True
ip-10–100–1–238.eu-west-2.compute.internal master TrueYour cluster test.k8s.appvia.io is ready

Now we need to edit the kops ClusterSpec, defining a new subnet that would be located within the 3rd Availability Zone: kops edit cluster test.k8s.appvia.io

Jump down to the ‘subnets’ section, adding the new subnet into the list as below:

At the same time you can define new Master instances to be provisioned on the AZ-C Subnet that will be created. To maintain a quorum for etcd, you must provision and persist an odd number of Master instances (3, 5, 7 etc) at all times. The kops utility will prevent you from creating an odd number of Master instances.

Note: There is a current bug in kops around the way masters and volumes work. This causes problems if you wanted to replace the master nodes with the ones in the new AZ as oppose to extend. This bug has been raised here. For ease of simplicity, we will focus on extending rather than replacing.

For this example, I will create an additional 2 instances to reside within AZ-C, bringing the total to 7. I have chosen a CoreOS AMI to provision all the nodes, however feel free to change this to whatever is more suitable for your environment.

➜ cat euw2c-master1.yaml
apiVersion: kops/v1alpha2
kind: InstanceGroup
metadata:
 labels:
 kops.k8s.io/cluster: test.k8s.appvia.io
 name: eu-west-2c-master1
spec:
 image: coreos.com/CoreOS-stable-1576.5.0-hvm
 machineType: t2.medium
 maxSize: 1
 minSize: 1
 role: Master
 subnets:
 — eu-west-2c➜ kops create -f euw2c-master1.yaml
Created instancegroup/eu-west-2c-master1
To deploy these resources, run: kops update cluster test.k8s.appvia.io --yes➜ cat euw2c-master2.yaml
apiVersion: kops/v1alpha2
kind: InstanceGroup
metadata:
 labels:
 kops.k8s.io/cluster: test.k8s.appvia.io
 name: eu-west-2c-master2
spec:
 image: coreos.com/CoreOS-stable-1576.5.0-hvm
 machineType: t2.medium
 maxSize: 1
 minSize: 1
 role: Master
 subnets:
 — eu-west-2c➜ kops create -f euw2c-master2.yaml
Created instancegroup/eu-west-2c-master2
To deploy these resources, run: kops update cluster test.k8s.appvia.io --yes

Before deploying these new Instance Groups (IGs), one more update must take place within the ClusterSpec to define new associated etcd members: kops edit cluster test.k8s.appvia.io

Jump down to the etcdClusters section, adding etcd-main and etcd-events EBS volumes for the two new IGs:

Finally you can run a kops update to interact with the AWS API, creating the new AWS resources that have been defined in the Cluster Spec:

➜ kops update cluster --yesUsing cluster from kubectl context: test.k8s.appvia.ioI0123 15:27:15.413444   11637 executor.go:91] Tasks: 0 done / 118 total; 46 can run
I0123 15:27:15.863181   11637 executor.go:91] Tasks: 46 done / 118 total; 29 can run
I0123 15:27:16.374682   11637 executor.go:91] Tasks: 75 done / 118 total; 27 can run
I0123 15:27:17.905691   11637 executor.go:91] Tasks: 102 done / 118 total; 9 can run
I0123 15:27:18.035842   11637 dnsname.go:111] AliasTarget for "api.test.k8s.appvia.io." is "api-test-k8s-fvii3v-1272530458.eu-west-2.elb.amazonaws.com."
I0123 15:27:18.153972   11637 executor.go:91] Tasks: 111 done / 118 total; 7 can run
I0123 15:27:18.856931   11637 executor.go:91] Tasks: 118 done / 118 total; 0 can run
I0123 15:27:18.856995   11637 dns.go:153] Pre-creating DNS records
I0123 15:27:19.232648   11637 update_cluster.go:253] Exporting kubecfg for cluster
kops has set your kubectl context to test.k8s.appvia.ioCluster changes have been applied to the cloud.
Changes may require instances to restart: kops rolling-update cluster

At this point, you will have:

Created 1 new Subnet within your VPC
Created 2 new Master Instance Groups, residing within the subnet named ‘eu-west-2c’
Created 4 new EBS volumes, 2x ‘etcd-main’ && ‘etcd-events’, for the new Instance Groups

Unfortunately the 2 new Master instances will not automatically join the etcd cluster without some manual intervention. The etcd member list will need updating to include hosts for the two new instances, and then these new instances will require some reconfiguration to join the existing cluster.

Firstly, SSH into an existing Master instance. Once in the instance, run the following commands:

If etcdctl is not already present on the Host OS, you can download it from here (grab the same version as what you’re running): https://github.com/coreos/etcd/releases

Validate that you can communicate with both endpoints:

Add the two new members for both ‘etcd-main’ && ‘etcd-events’:

➜ ${ETCD_MAIN} member add etcd-c1 --peer-urls="https://etcd-c1.internal.test.k8s.appvia.io:2380"Member 356cb0ee3ba9d1f6 added to cluster 5024708754869ab3➜ ${ETCD_MAIN} member add etcd-c2 --peer-urls="https://etcd-c2.internal.test.k8s.appvia.io:2380"Member 4e65c79f2332a2ae added to cluster 5024708754869ab3➜ ${ETCD_EVENTS} member add etcd-events-c1 --peer-urls="https://etcd-events-c1.internal.test.k8s.appvia.io:2381"Member fa29d33b3e164931 added to cluster df1655d3f512ef29➜ ${ETCD_EVENTS} member add etcd-events-c2 --peer-urls="https://etcd-events-c2.internal.test.k8s.appvia.io:2381"Member de67c45c59dc7476 added to cluster df1655d3f512ef29

Now that the new members have been added, they should initially be listed as unstarted:

At this point, the etcd Clusters have been configured to register the new clients once they become available on their respective addresses. The newly provisioned Master instances now require amendments to their etcd configuration to be able to join the existing Clusters.

Next, login via SSH to the new Master instances and perform the following commands:

Validate that the members have started successfully and joined the cluster by running a ‘member list’ command on any of the master nodes that you are connected to:

Once the etcd clusters have recovered as above, you can start the protokube service back up on the Master instances within the new Availability Zone: systemctl start protokube

Your cluster should eventually validate successfully via kops after a few minutes once the new Master instances have caught up:

➜ kops validate cluster
Using cluster from kubectl context: test.k8s.appvia.ioValidating cluster test.k8s.appvia.ioINSTANCE GROUPS
NAME   ROLE MACHINETYPE MIN MAX SUBNETS
eu-west-2a-master1 Master t2.medium 1 1 eu-west-2a
eu-west-2a-master2 Master t2.medium 1 1 eu-west-2a
eu-west-2a-master3 Master t2.medium 1 1 eu-west-2a
eu-west-2b-master1 Master t2.medium 1 1 eu-west-2b
eu-west-2b-master2 Master t2.medium 1 1 eu-west-2b
eu-west-2c-master1 Master t2.medium 1 1 eu-west-2c
eu-west-2c-master2 Master t2.medium 1 1 eu-west-2c
nodes   Node t2.medium 2 2 eu-west-2a,eu-west-2bNODE STATUS
NAME      ROLE READY
ip-10-100-0-13.eu-west-2.compute.internal master True
ip-10-100-0-170.eu-west-2.compute.internal node True
ip-10-100-0-21.eu-west-2.compute.internal master True
ip-10-100-0-56.eu-west-2.compute.internal master True
ip-10-100-1-170.eu-west-2.compute.internal master True
ip-10-100-1-23.eu-west-2.compute.internal node True
ip-10-100-1-238.eu-west-2.compute.internal master True
ip-10-100-2-118.eu-west-2.compute.internal master True
ip-10-100-2-140.eu-west-2.compute.internal master TrueYour cluster test.k8s.appvia.io is ready

Finally, you can now configure your node pools to make use of the new subnet. Run kops edit ig <nodes-ig> and add the new Availability Zone in:

One more kops update cluster test.k8s.appvia.io --yes and you’re good to go!

Extending your Kubernetes Cluster into a new AWS Availability Zone with kops

Written by Kashif Saadat