Building Availability Zones on Vmware with Kubernetes

Abhishek Mitra
Jan 13 · 10 min read
High Availability
Kuberntes on Vsphere

The term availability zones is normally a high availability public cloud offering like Azure and Amazon Web Services . Simply put they are unique physical locations within a region that are self sufficient (independent datacentres with their won power supply, cooling and networking)

Kubernetes supports the concept of availability zone as part of a broader feature (Cluster Federation / Ubernetes). This makes a kubernetes cluster tolerate the loss of one or more zones

A point to note is that this also implies that control plane traffic may cross availability zones and the routing and latency needs to be factored in at the time of design .


In the following section we will walk through the elements needed to build availability zones with kubernetes clusters on Vmware.

Kubernetes cluster stretched across two vSphere datacentres

Now there are a lot of enterprise solutions available to setup a Vmware Datacenter — Hyperflex,Nutanix,Vmwares own Solutions , Flexpod to name a few.

Irrespective of networking and setup complexities and nuances (and by no means am i discounting the extensive complexity there itself coming from a datacenter and networking background myself) the end goal is to have a datacenter infrastructure similar to what public clouds provide as an offering i.e hiding all the complexities and abstracting only what is needed for end users as depicted in above diagram.

As per the above sample diagram , we want to build a kubernetes infrastructure spread across vmware vcenters managing multiple datacenters , and then deploy our applications (statefulsets, deployments, individual pods, pvcs,) to zones we feel necessary based on our applications . However the kubernetes engine does not understand the implication of a datacenter, host cluster, hosts, resource pool, etc. This is where the cloud provider integration steps in.

However the concept of distributing pods/applications across datacenters sounds familiar : using labels and node selectors coupled with affinity and anti-affinity rules !! . Yes, and zoning is a broader term for the same !

When we utilize the zoning coupled with the cloud-provider support(AW/GCE/Azure) in this case Vmware , it allows kubernetes to label nodes as per the tags and tag categories we create on Vcenter !! This gives us better control to manage our applications and deployments rather than adding node selectors to individual charts/yamls. Lets see how in the next section.


Lets start with building a kubernetes cluster spread out as follows.

We will not be focussing on how to create a kubernetes cluster as there are many tools available . You can refer to my article on Cluster API or the tool i frequently use blackbolt.

Cluster Api https://medium.com/@abhishek.amjeet/clusterapi-for-kubernetes-a-detailed-look-on-vmware-cluster-api-2ddd541bafa9

BlackBolt https://github.com/theabmitra/blackbolt

vcenter01.example.com
|
|-- SJC19 (tags: k8s-region=region1)
|
|-- Cluster-1 (tags: k8s-zone=zone-a)
|-- Host-1
|-- kube-master
|-- Cluster-2 (tags: k8s-zone=zone-b)
|-- Host-2
|-- kube-node-1
|-- Host-3
|-- kube-node-2
|-- datastore-nimble (shared datastore in vcenter01 across 2 clusters)

vcenter02.example.com
|
|-- SJC19 (tags: k8s-region=region1)
|
|-- Cluster-3 (tags: k8s-zone=zone-c)
|-- Host-4
|-- kube-node-3
|-- Host-5
|-- kube-node-4
|--nfs-datastore-0 (shared datastore in vcenter02)

In the interest of time, I have already prepared a cluster with above distribution as shown below

# Master part of Vcenter01 and Cluster1 sitting on datastore datastore-nimble
kube-master-1 Ready master 18h v1.17.0
# kube-worker 1 and 2 part of Vcenter01 and Cluster2 sitting on datastore datastore-nimble
kube-worker-1 Ready <none> 18h v1.17.0
kube-worker-2 Ready <none> 18h v1.17.0
# kube-worker 1 and 2 part of Vcenter02 and Cluster1 sitting on datastore nfs-datastore
kube-worker-3 Ready <none> 18h v1.17.0
kube-worker-4 Ready <none> 18h v1.17.0

Note: For vmware make sure the disk.EnableUUID parameter must be set to TRUE for each Node VM.

You can use any CNI , in my case i have used calico. The choice of CNI is not a concern here since the focus is the distribution and zoning support of Kubernetes.

Objects to create on vSphere:

The next thing to do is to create and assign tags and tag categories on Vcenter.

  1. Create two tag categories, say k8s-zone and k8s-region
  2. Create zone tags, say zone-a and zone-b in the k8s-zone tag category on vcenter01.
  3. Create zone tags, say zone-c in the k8s-zone tag category on vcenter02
  4. Create region tags, say vc1-region in the k8s-region tag category.
  5. Apply the “vc1-region” tag to both Datacenters in both vcenters : vcenter01 and vcenter02.
  6. Apply the “zone-a” tag to Cluster-1 on vcenter01.
  7. Apply the “zone-b” tag to Cluster-2 on vcenter01.
  8. Apply the “zone-c” tag to Cluster-1 on vcenter02.
Tag categories created on Vsphere
Tags created on Vsphere

An example of how to check whether tags are applied to Clusters

Cluster tagged as “zone-c” with category “k8s-zone”

Configurations to be done on Master and Workers:

The next thing we need to do is register the cloud-provider with the Kubernetes cluster. In this case we have multiple vcenteres .

The following configuration file needs to be placed in both masters and workers :

/etc/kubernetes/pki/vsphere.conf

For the master the configuration looks something like this. Note, the master is in zone-a and cluster Cluster1 and region vc1-region on vcenter01.

[Global]
port = "443"
insecure-flag = "1"
datacenters = "DC01,DC02,DC03"
[VirtualCenter "10.10.1.1"]
user = "administrator@vsphere.local"
password = "password1"
datacenters = "DC01,DC02"
[VirtualCenter "10.10.1.2"]
user = "administrator@vsphere.local"
password = "password2"
datacenters = "DC03"
[Workspace] #This is the workspace from the masters POV
server = "10.10.1.1"
datacenter = "DC01"
default-datastore = "datastore-nimble"
resourcepool-path = "Cluster1/Resources" #Cluster where master resides
folder = "<FolderNameWhereMasterResides>"
[Labels]
region = "k8s-region"
zone = "k8s-zone"
[Disk]
scsicontrollertype = pvscsi

For the workers in zone-b the configuration looks something like this. Note, the master is in zone-b and cluster Cluster2 and region vc1-region on vcenter01.

[Global]
port = "443"
insecure-flag = "1"
datacenters = "DC01,DC02,DC03"
[VirtualCenter "10.10.1.1"]
user = "administrator@vsphere.local"
password = "password1"
datacenters = "DC01,DC02"
[VirtualCenter "10.10.1.2"]
user = "administrator@vsphere.local"
password = "password2"
datacenters = "DC03"
[Workspace] #This is the workspace from the zone-b workers POV
server = "10.10.1.1"
datacenter = "DC02"
default-datastore = "datastore-nimble"
resourcepool-path = "Cluster2/Resources" #Cluster where set of workers in zone-b resides
folder = "<FolderNameWhereWorkersOfZone-bResides>"
[Labels]
region = "k8s-region"
zone = "k8s-zone"
[Disk]
scsicontrollertype = pvscsi

For the workers in zone-c the configuration looks something like this. Note, the master is in zone-c and cluster Cluster1 and region vc1-region on vcenter02.

[Global]
port = "443"
insecure-flag = "1"
datacenters = "DC01,DC02,DC03"
[VirtualCenter "10.10.1.1"]
user = "administrator@vsphere.local"
password = "password1"
datacenters = "DC01,DC02"
[VirtualCenter "10.10.1.2"]
user = "administrator@vsphere.local"
password = "password2"
datacenters = "DC03"
[Workspace] #This is the workspace from the zone-b workers POV
server = "10.10.1.2"
datacenter = "DC03"
default-datastore = "nfs-datastore"
resourcepool-path = "Cluster1/Resources" #Cluster where set of workers in zone-c resides
folder = "<FolderNameWhereWorkersOfZone-cResides>"
[Labels]
region = "k8s-region"
zone = "k8s-zone"
[Disk]
scsicontrollertype = pvscsi

The following changes needs to be made to register the Kubernetes Cluster to both the vsphere environments.

On Master :-

Add following flags to the kubelet service configuration (/etc/systemd/system/kubelet.service.d/10-kubeadm.conf, if using kubeadm), as well as the controller-manager and api-server container manifest files on the master node (usually in /etc/kubernetes/manifests).

--cloud-provider=vsphere
--cloud-config=/etc/kubernetes/pki/vsphere.conf
systemctl daemon-reload && systemctl restart kubelet

On Worker :-

The same flags needed to be added for workers as well, followed by a restart of the kubelet.

If everything has been configured correctly, you should see the Vmware provider ids of each node visible on the Kubernetes cluster.

kubectl get nodes -o jsonpath='{range .items[*]}{.metadata.name}{"\tregion: "}{.metadata.labels.failure-domain\.beta\.kubernetes\.io/region}{"\tzone: "}{.metadata.labels.failure-domain\.beta\.kubernetes\.io/zone}{"\tproviderId: "}{.spec.providerID}{"\tSystemUUID: "}{.status.nodeInfo.systemUUID}{"\n"}{end}'
Zone Distribution

Zone Distribution in Action:

  1. We will now demonstrate dynamic volume provisioning in different zones by creating storage classes
#### Storage class for zone-b
cat <<EOF | kubectl apply -f -
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
name: sc-zone-b
provisioner: kubernetes.io/vsphere-volume
parameters:
diskformat: thin
allowedTopologies:
- matchLabelExpressions:
- key: failure-domain.beta.kubernetes.io/zone
values:
- zone-b
EOF
#### Storage class for zone-c
cat <<EOF | kubectl apply -f -
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
name: sc-zone-c
provisioner: kubernetes.io/vsphere-volume
parameters:
diskformat: thin
allowedTopologies:
- matchLabelExpressions:
- key: failure-domain.beta.kubernetes.io/zone
values:
- zone-c
EOF

2. Create a stateful set with volume claim templates pointing to the zone you wish . Below example with create the stateful set , but will also create the persistentvolumeclaims and persistentvolume using the storage class mentioned.

apiVersion: apps/v1
kind: StatefulSet
metadata:
name: web-zone-b
spec:
selector:
matchLabels:
app: nginx # has to match .spec.template.metadata.labels
serviceName: "nginx"
replicas: 3 # by default is 1
template:
metadata:
labels:
app: nginx # has to match .spec.selector.matchLabels
spec:
terminationGracePeriodSeconds: 10
containers:
- name: nginx
image: nginx:alpine
ports:
- containerPort: 80
name: web
volumeMounts:
- name: www
mountPath: /usr/share/nginx/html
volumeClaimTemplates:
- metadata:
name: www
spec:
accessModes: [ "ReadWriteOnce" ]
storageClassName: "sc-zone-b"
resources:
requests:
storage: 10Gi

3. Since we specifiedsc-zone-b in the above yaml , the pvcs (persistent volume claims, persistent volume) will be created only in zone-b i.e on vcenter1

www-web-zone-b-0   Bound    pvc-d0de26c4-30dc-4c53-b72d-fba94e635f71   10Gi       RWO            sc-zone-b      18h
www-web-zone-b-1 Bound pvc-aa9bbcd4-e45e-4add-a08d-7d6aa17cfe12 10Gi RWO sc-zone-b 18h
www-web-zone-b-2 Bound pvc-482fcd4c-88a8-4900-ae13-78d7fb2d5cbb 10Gi RWO sc-zone-b 18h

4. If we check the pv now, it will reveal the exact path of the volume.

Name:              pvc-d0de26c4-30dc-4c53-b72d-fba94e635f71
Labels: failure-domain.beta.kubernetes.io/region=vc1-region
failure-domain.beta.kubernetes.io/zone=zone-a__zone-b
Annotations: kubernetes.io/createdby: vsphere-volume-dynamic-provisioner
pv.kubernetes.io/bound-by-controller: yes
pv.kubernetes.io/provisioned-by: kubernetes.io/vsphere-volume
Finalizers: [kubernetes.io/pv-protection]
StorageClass: sc-zone-b
Status: Bound
Claim: default/www-web-zone-b-0
Reclaim Policy: Delete
Access Modes: RWO
VolumeMode: Filesystem
Capacity: 10Gi
Node Affinity:
Required Terms:
Term 0: failure-domain.beta.kubernetes.io/region in [vc1-region]
failure-domain.beta.kubernetes.io/zone in [zone-a, zone-b]
Message:
Source:
Type: vSphereVolume (a Persistent Disk resource in vSphere)
VolumePath: [datastore-nimble] kubevols/kubernetes-dynamic-pvc-d0de26c4-30dc-4c53-b72d-fba94e635f71.vmdk
FSType: ext4
StoragePolicyName:
Events: <none>

Here in the output we see the zone to be as zone-a,zone-b ? Thats because both zones zone-a and zone-b have the same shared datastore datastore-nimble !!

5. This also implies that the pod locations will be confined only to zone-b

root@kube-master-1:~/k8-zone-vmware# kubectl get pod -o wide | grep web-zone-b
web-zone-b-0 1/1 Running 0 19h 192.168.180.1 kube-worker-1 <none> <none>
web-zone-b-1 1/1 Running 0 19h 192.168.127.2 kube-worker-2 <none> <none>
web-zone-b-2 1/1 Running 0 19h 192.168.180.2 kube-worker-1 <none> <none>

Note that the pods are only distributed between kube-worker-1 and kube-worker-2 which are in zone-a. We didnt need to explicitly use node selectors here!!

5. What about zone-b ? what happens when we repeat same steps as above for zone-b ?

www-web-zone-c-0   Bound    pvc-82c27d60-a35f-4256-99c8-f9390062b7dc   10Gi       RWO            sc-zone-c      19h
www-web-zone-c-1 Bound pvc-dd90bdeb-1d30-4b12-aad7-c13bb48ed145 10Gi RWO sc-zone-c 19h
www-web-zone-c-2 Bound pvc-2c1bd6d4-3a3d-40bd-8765-3b0f81186785 10Gi RWO sc-zone-c 19h

6. The pv info shows this. The missing info in labels is currently a bug on vSphere cloud integration with Kubernetes and will be fixed in the upstream

Name:            pvc-82c27d60-a35f-4256-99c8-f9390062b7dc
Labels: <none>
Annotations: kubernetes.io/createdby: vsphere-volume-dynamic-provisioner
pv.kubernetes.io/bound-by-controller: yes
pv.kubernetes.io/provisioned-by: kubernetes.io/vsphere-volume
Finalizers: [kubernetes.io/pv-protection]
StorageClass: sc-zone-c
Status: Bound
Claim: default/www-web-zone-c-0
Reclaim Policy: Delete
Access Modes: RWO
VolumeMode: Filesystem
Capacity: 10Gi
Node Affinity: <none>
Message:
Source:
Type: vSphereVolume (a Persistent Disk resource in vSphere)
VolumePath: [nfs-datastore-0] kubevols/kubernetes-dynamic-pvc-82c27d60-a35f-4256-99c8-f9390062b7dc.vmdk
FSType: ext4
StoragePolicyName:
Events: <none>

7. The stateful set for zone-c is seen creating all the pods in zone-c only i.e kube-worker-3 and kube-worker-4

web-zone-c-0   1/1     Running   0          19h   192.168.20.129   kube-worker-3   <none>           <none>
web-zone-c-1 1/1 Running 0 19h 192.168.37.193 kube-worker-4 <none> <none>
web-zone-c-2 1/1 Running 0 19h 192.168.20.130 kube-worker-3 <none> <none>

This basically shows that we are able to deploy applications to different availability zones on Vmware based on cloud provider integration. The setup as such can tolerate loss of one/more AZ’s as well.

Pros:-

  1. Configurations can be done on vsphere for better management
  2. A good was to seggerate vSphere Datacenter,Clusters and datastores into Kuberntes AZ’s !!

Cons:-

  1. The cloud provider integration to vSphere still needs improvements as it has some bugs as indicated earlier

References:-

  1. https://vmware.github.io/vsphere-storage-for-kubernetes/documentation/zones.html
  2. https://vmware.github.io/vsphere-storage-for-kubernetes/documentation/existing.html#systemd-services
  3. https://kubernetes.io/docs/setup/best-practices/multiple-zones/

More From Medium

Related reads

Also tagged Kubernetes

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade