AKS Business Continuity Best Practices

Anoop Srivastava
15 min readSep 12, 2022

--

Overview

As you manage clusters in Azure Kubernetes Service (AKS), application uptime becomes important. By default, AKS provides high availability by using multiple nodes in a Virtual Machine Scale Set (VMSS). But these multiple nodes don’t protect your system from a region failure. To maximize your uptime, plan ahead to maintain business continuity and prepare for disaster recovery.

Recommended Strategy:

· Plan for AKS clusters in multiple regions.

· Route traffic across multiple clusters by using Azure FrontDoor.

· Use geo-replication for your container image registries.

· Plan for application state across multiple clusters.

· Replicate storage across multiple regions.

When you deploy multiple AKS clusters, choose regions where AKS is available. Use paired regions. An AKS cluster is deployed into a single region. To protect your system from region failure, deploy your application into multiple AKS clusters across different regions. When planning where to deploy your AKS cluster, consider:

AKS region availability

· Choose regions close to your users. AKS continually expands into new regions.

Azure paired regions

· For your geographic area, choose two regions paired together.

· AKS platform updates (planned maintenance) are serialized with a delay of at least 24 hours between paired regions.

· Recovery efforts for paired regions are prioritized where needed.

Service availability

· Decide whether your paired regions should be hot/hot, hot/warm, or hot/cold.

· Do you want to run both regions at the same time, with one region ready to start serving traffic? Or,

· Do you want to give one region time to get ready to serve traffic?

· AKS region availability and paired regions are a joint consideration. Deploy your AKS clusters into paired regions designed to manage region disaster recovery together. For example, AKS is available in East US and West US. These regions are paired. Choose these two regions when you’re creating an AKS BC/DR strategy.

When you deploy your application, add another step to your CI/CD pipeline to deploy to these multiple AKS clusters. Updating your deployment pipelines prevents applications from deploying into only one of your regions and AKS clusters. In that scenario, customer traffic directed to a secondary region won’t receive the latest code updates.

Pre-requisites:

· An Azure subscription (Already provided as part of this workshop)

· Visual Studio Code/Enterprise installed on your local machine — mac or windows (Installation Link)

· Docker Cli (Installation Link; Docker Desktop is optional)

· Azure Cli (Installation link)

Route Traffic with Azure Traffic Manager

Route Traffic with Azure FrontDoor

Route Traffic with Azure Traffic Manager

As you can tell, using has already streamlined the architecture. Typical patterns would have a deployment of an Application Gateway in each region downstream from the global DNS traffic service. Remember that Azure Application Gateway is a regional service, so you would have to deploy it N number of times per region hosting your application. With using you can eliminate the use of Azure Traffic Manager for global DNS-based traffic, and you can push the functionality provided by Azure Application Gateway upstream to providing me WAF, layer 7 path/url routing, and session state configuration.

Route Traffic with Azure Frontdoor

1. Deploy an AKS Cluster in two regions:

First thing we’ll do is create two Service Principles for the two AKS clusters

$ az ad sp create-for-rbac -n aks-eastus2-cluster

$ az ad sp create-for-rbac -n aks-westus2-cluster

Please take note of the output of the service principles created. We will be using both the appId and password properties in later commands. You should see output like this below for each command.

{

“appID”: “8cea7e76XXXXXXXXXXXXXXXXXXXXXXX”,

“displayName”: “demo-XXXXXXXXXXXXXXXXXXXX”,

“name”: “http://demo-XXXXXXXXXXXXXX",

“password”: “XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX”,

“tenant”: “XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX”

}

{

“appID”: “XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX”,

“displayName”: “ demo-aks-westus2-cluster”,

“name”: “http://demo-XXXXXXX",

“password”: “XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX”,

“tenant”: “XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX”

}

2. Create two resource groups. One for each Azure Region

$ az group create -l eastus2 -n aks-eastus2-cluster$ az group create -l westus2 -n aks-westus2-cluster

3. We will now create the Azure Virtual Networks for each Azure Region

For the Azure East US 2 Region

$ az network vnet create \-g aks-eastus2-cluster \-n aks-eastus2-cluster-vnet \--address-prefixes 10.50.0.0/16 \--subnet-name aks-eastus2-cluster-aks-subnet \

We’re going to create two additional subnets in each VNet. I like to create a subnet specific to the AKS service range. I do this so that if I try to use that range in Azure somewhere else Azure will tell me that it is in use, so I won’t have any conflict with other devices. We are also going to make a Azure Firewall subnet. This will be used to show you an additional configuration of using the Azure Firewall’s public IP to NAT back to a AKS service that is using an internal load balancer.

4. Create the Additional Subnet for the AKS Service Range

$ az network vnet subnet create \-g aks-eastus2-cluster \--vnet-name aks-eastus2-cluster-vnet \--name aks-eastus2-cluster-vnet-akssvc-subnet \--address-prefix 10.50.XXX/24

5. Create the Additional Subnet for the Azure Firewall

$ az network vnet subnet create \

-g aks-eastus2-cluster \

— vnet-name aks-eastus2-cluster-vnet \

— name azurefirewallsubnet \

— address-prefix 10.50XXX/24

Before we can deploy the AKS cluster, we need to add the contributor role to the service principle for the Azure Vnet we created. Make sure you are using the appID from the service principle as the assignee parameter when assigning the role.

$ VNETID=$(az network vnet show -g aks-eastus2-cluster --name aks-eastus2-cluster-vnet --query id -o tsv)$ az role assignment create --assignee 8cXXXXXXXXXXXXXXXXXXXXXX --scope $VNETID --role Contributor

We also need to identify the specific subnet in the VNet where the AKS cluster will be deployed.

$ SUBNET_ID=$(az network vnet subnet show --resource-group aks-eastus2-cluster --vnet-name aks-eastus2-cluster-vnet --name aks-eastus2-cluster-aks-subnet --query id -o tsv)

6. Deploy the AKS cluster in the Azure East US 2 region

az aks create \

— resource-group demo-aks-eastus2-cluster \

— name demo-aks-eastus2-cluster \

— kubernetes-version 1.22.11 \

— node-count 1 \

— node-vm-size Standard_B2s \

— generate-ssh-keys \

— network-plugin azure \

— network-policy azure \

— service-cidr 10.50XXXX/24 \

— dns-service-ip 10.50XXXX \

— docker-bridge-address 172.XXXXX/16 \

— vnet-subnet-id $SUBNET_ID \

— service-principal <AppID of east region service principal> \

— client-secret <Password secret of east region service principal>

— no-wait

7. We’ll build the exact same AKS cluster in Azure West US 2 Region. We already have the Resource Group and Service Principle created.

# Create the WestUS 2 AKS Cluster VNet

$ az network vnet create \

-g demo-aks-westus2-cluster \

-n demo-aks-westus2-cluster-vnet \

— address-prefixes 10.XXXX/16 \

— subnet-name demo-aks-westus2-cluster-aks-subnet \

— subnet-prefix 10.XXXX/24

# Create the Additional VNet Subnets for the AKS Service Range and Azure Firewall

$ az network vnet subnet create \

-g demo-aks-westus2-cluster \

— vnet-name demo-aks-westus2-cluster-vnet \

— name demo-aks-westus2-cluster-vnet-akssvc-subnet \

— address-prefix 10.XXXX/24

$ az network vnet subnet create \

-g demo-aks-westus2-cluster \

— vnet-name demo-aks-westus2-cluster-vnet \

— name azurefirewallsubnet \

— address-prefix 10.XXXXX/24

# Assign the West US 2 Service Principal Contributor Role to the AKS VNet

$ VNETID=$(az network vnet show -g demo-aks-westus2-cluster — name demo-aks-westus2-cluster-vnet — query id -o tsv)

$ az role assignment create — assignee ec1XXXXXXXXXXXXXXXXXXXXX — scope $VNETID — role Contributor

# Identify the AKS Cluster Subnet for Deployment

$ SUBNET_ID=$(az network vnet subnet show — resource-group demo-aks-westus2-cluster — vnet-name demo-aks-westus2-cluster-vnet — name demo-aks-westus2-cluster-aks-subnet — query id -o tsv)

# Deploy the West US 2 AKS Cluster

$ az aks create \

— resource-group demo-aks-westus2-cluster \

— name demo-aks-westus2-cluster \

— kubernetes-version 1.22.1 \

— node-count 1 \

— node-vm-size Standard_B2s \

— generate-ssh-keys \

— network-plugin azure \

— network-policy azure \

— service-cidr 10.XXXXXX \

— dns-service-ip 10.XXXXX \

— docker-bridge-address 172.XXXX /16 \

— vnet-subnet-id $SUBNET_ID \

— service-principal <AppID of west region service principal> \

— client-secret <Password secret of west region service principal> \

— no-wait

We now have two AKS clusters up and running in both the East US 2 and West US 2 Azure Regions. You can verify the AKS clusters are ready by getting the credentials and ensuring the nodes are in a ready status.

$ az aks get-credentials -n aks-eastus2-cluster -g aks-eastus2-cluster$ kubectl get nodes$ az aks get-credentials -n aks-westus2-cluster -g aks-westus2-cluster$ kubectl get nodes

8. Deploy the Sample application on both AKS clusters in the two regions

$ kubectl config use-context aks-eastus2-cluster

$ kubectl create -f https://github.com/anoopsr1/AKS-with-AFD/blob/main/azure-frontdoor-eastus2-elb-app.yaml.txt

9. Verify the deployed app has received a public IP and then browse to that endpoint.

$ kubectl get svc

Get the External IP of the Service

10. Repeat the same deployment for the West US 2 AKS cluster and verify you can browse to the endpoint.

$ kubectl config use-context aks-westus2-cluster$ kubectl create -f https://github.com/anoopsr1/AKS-with-AFD/blob/main/azure-frontdoor-westus2-elb-app.yaml.txt$ kubectl get svc

Get the External IP of the Service

11. Deploy the Azure FrontDoor Service

Now we’ll tie it all together and use as a global endpoint for the two AKS services running in separate Azure Regions (EastUS2 and WestUS2)

Create a resource group for the instance. is a global service, just like Azure Traffic Manager, and we must just associate it to a region for the ARM deployment.

$ az group create -l eastus2 -n aks-global

Most likely if this is your first time using you will need to install the Azure CLI extension, even when using Azure Cloud Shell.

$ az extension add --name front-door

has a lot of configuration options for the backend pools and routing rules. Since we’re just hosting a simple web application, we’ll keep this deployment configuration simple. I’ll create additional posts to show off some of the URL rewrite features and how you can take advantage of that with AKS for building out microservices routing.

Now we’ll deploy with the initial backend of the AKS service located in the East US 2 Azure datacenter.

$ az network front-door create \-n demo aks \-g  demo-aks-global \--backend-address 40.XXXXXXX \--backend-host-header 40.XXXXXXX \--protocol Http \--forwarding-protocol HttpOnly

Add the AKS service endpoint from the West US 2 Azure datacenter to the backend pool for

$ az network front-door backend-pool backend add \--resource-group aks-global \--front-door-name  aks \--pool-name DefaultBackendPool \--address 52.183.68.177 \--backend-host-header 52.XXXXXXXXX

You can now check that your backends have been configured

$ az network front-door backend-pool list  --front-door-name demo aks  --resource-group  demo-aks-global --query '[].backends' -o json

Before we call this complete, there’s one last thing we need to do from a hardening perspective to ensure we’re funneling all internet traffic to our AKS services endpoint from . Right now the configuration still allows someone the ability to access the AKS service endpoints by going directly to thier external public IP and essentially bypassing any WAF or other configurations we may have setup in . To ensure that the AKS services endpoints only receive traffic from , we will need to modify the default NSGs created for those services to only accept traffic from the service.

Locate the NSG of the AKS service. This will be found in the MC_* resource group for the AKS cluster. You will know you’re working on the correct NSG rule becasue the destination IP address should match the external IP address of the AKS service.

Locate the NSG of the AKS service. This will be found in the MC_* resource group for the AKS cluster. You will know you’re working on the correct NSG rule becasue the destination IP address should match the external IP address of the AKS service.

Change the Source option to ServiceTag and in the Source ServiceTag property enter the service Azure “FrontDoorBackend” and save the configuration.

12. You will need to repeat those steps for every AKS cluster service you have configured in the backend pool, and you should now not be able to go directly to the AKS service endpoint and only be able to view the AKS services endpoints through the service.

13. The only thing left to do now is to test ‘s global loadbalancing feature. You can do this easily by just stopping the cluster node(s) VM to simulate an outage for that region. Browsing the endpoint, you should immediately see the browser pick up the West US 2 AKS service endpoint. You can then test it in reverse by starting back up the East US 2 AKS cluster node and then stoping the West US 2 AKS cluster node.

14. Using with Azure Firewall to Expose a Internally (Private VNet IP) Load Balanced AKS Service

We have created two AKS services in the East US 2 and West US 2 region using the default Load Balancer option which exposes our AKS services with a public IP address. We then added the AKS services endpoints as backend pool host. In the following pattern, described in the following graphic, we will deploy the same AKS services in both regions again, but we will only expose them to the VNet using the internal load balancer annotation. To then expose the internal AKS service to the internet we will use the Azure Firewall to NAT its public IP back to the internal AKS service IP. From there the configuration is like the previous pattern. Only this time we will configure the backend pool with the public IP addresses of the Azure Firewall in each region.

Azure Firewall to Expose a Internally (Private VNet IP) Load Balanced AKS Service

15. If you’re continuing from the previous deployment, let’s go ahead and remove both region’s AKS services and the backend pool.

$ kubectl config use-context aks-eastus2-cluster

$ kubectl delete -f https://raw.githubusercontent.com/phillipgibson/Cloud-Azure-AKS-Using- -with-AKS/master/yaml/phillipgibson-azure-frontdoor-eastus2-elb-app.yaml

$ kubectl get svc # The demo service for the EastUS 2 region should now be deleted

$ kubectl config use-context demo-aks-westus2-cluster

$ kubectl delete -f https://raw.githubusercontent.com/phillipgibson/Cloud-Azure-AKS-Using- -with-AKS/master/yaml/phillipgibson-azure-frontdoor-westus2-elb-app.yaml

$ kubectl get svc # The demo service for the WestUS 2 region should now be deleted

16. Remove the backend pool and routing rules

17. Now we’ll redeploy the same application in both the East US 2 and West US 2 Azure regions, but this time they will use the Azure internal load balancer.

18. For the Azure East US 2 deployment use the following:

$ kubectl config use-context demo-aks-eastus2-cluster

$ kubectl create -f https://raw.githubusercontent.com/phillipgibson/Cloud-Azure-AKS-Using- -with-AKS/master/yaml/phillipgibson-azure-frontdoor-eastus2-ilb-app.yaml

19. Verify the deployed app has received a internal IP

$ kubectl get svc

20. Repeat the same deployment for the West US 2 AKS cluster and verify you have a internal IP as the endpoint

21. Repeat the same deployment for the West US 2 AKS cluster and verify you have a internal IP as the endpoint.

$ kubectl config use-context  demo-aks-westus2-cluster$ kubectl create -f https://raw.githubusercontent.com/phillipgibson/Cloud-Azure-AKS-Using- -with-AKS/master/yaml/phillipgibson-azure-frontdoor-westus2-ilb-app.yamlkubectl get svc

Now we need to deploy an instance of the Azure Firewall in each regions AKS VNet. To expose the internal AKS service to the internet we’ll have to configure the NAT rule from the public IP of the Azure Firewall translating back to the AKS service internal IP. We will also need to create a Route Table for the AKS cluster subnet to have its default gateway pointed to the internal IP address of the Azure Firewall. This will ensure the symmetrical route is in place

22. Make sure you have the Azure Firewall extension for the CLI

$ az extension add --name azure-firewall

23. Create a couple of public IPs for each Azure Firewall we deploy in each Azure region

$ az network public-ip create -g aks-eastus2-cluster -n aks-eastus2-fw-pip -l eastus2 — sku “Standard”

$ az network public-ip create -g aks-westus2-cluster -n aks-westus2-fw-pip -l westus2 — sku “Standard”

24. Create the two Azure Firewalls for each Azure Region

$ az network firewall create -g aks-eastus2-cluster -n aks-eastus2-firewall -l eastus2

$ az network firewall create -g aks-westus2-cluster -n aks-westus2-firewall -l westus2

25. Configure each Azure Region Azure Firewall IP configuration. This will add the public IP address to the Azure Firewall, as well as associate the Azure Firewall to the VNet in each respective Azure Region where it will recieve its private IP address

# Azure EastUS 2 Region Commandaz network firewall ip-config create -g aks-eastus2-cluster -f aks-eastus2-firewall -n aks-eastus2-fw-config --public-ip-address aks-eastus2-fw-pip --vnet-name aks-eastus2-cluster-vnet# Azure WestUS 2 Region Command$ az network firewall ip-config create -g aks-westus2-cluster -f aks-westus2-firewall -n aks-westus2-fw-config --public-ip-address aks-westus2-fw-pip --vnet-name aks-westus2-cluster-vnet

26. Create the necessary Firewall NAT Rule to map the Azure Firewall Public IP address to the internal AKS Service IP address for the East US 2 Region

# Azure EastUS 2 Region Command(s)

# Get the public IP address of the East US 2 Azure Firewall

$ EASTUS2_FWPUBLIC_IP=$(az network public-ip show -g aks-eastus2-cluster -n aks-eastus2-fw-pip — query “ipAddress” -o tsv)

# Make note of the Internal AKS Service IP (The column title says EXTERNAL_IP)

$ kubectl config use-context aks-eastus2-cluster

$ kubectl get svc

# Create the East US 2 Azure Firewall NAT rule to expose the internal AKS service

# Please remember to put the AKS service internal IP address as the translated-address parameter value

$ az network firewall nat-rule create -g aks-eastus2-cluster \

-f aks-eastus2-firewall — collection-name ‘AKS-NAT-Coll-Rule’ \

-n ‘DemoInternalAKSSvcRule’ — protocols ‘TCP’ — source-addresses ‘*’ \

— destination-addresses $EASTUS2_FWPUBLIC_IP — destination-ports 80 \

— translated-address 10.50.1.35 — translated-port 80 \

— action Dnat — priority 100

You should now be able to access the internal AKS Service in the East US 2 Region by using a browser or a simple curl command to the East US 2 Azure Firewall public IP address

$ curl $EASTUS2_FWPUBLIC_IPCreate the necessary Firewall NAT Rule to map the Azure Firewall Public IP address to the internal AKS Service IP address for the West US 2 Region# Azure WestUS 2 Region Command(s)# Get the public IP address of the West US 2 Azure Firewall$ WESTUS2_FWPUBLIC_IP=$(az network public-ip show -g aks-westus2-cluster -n aks-westus2-fw-pip --query "ipAddress" -o tsv)# Make note of the Internal AKS Service IP (The column title says EXTERNAL_IP)$ kubectl config use-context aks-westus2-cluster$ kubectl get svc# Create the West US 2 Azure Firewall NAT rule to expose the internal AKS service# Please remember to put the AKS service internal IP address as the translated-address parameter value$ az network firewall nat-rule create -g aks-westus2-cluster \-f aks-westus2-firewall --collection-name 'AKS-NAT-Coll-Rule' \-n 'DemoInternalAKSSvcRule' --protocols 'TCP' --source-addresses '*' \--destination-addresses $WESTUS2_FWPUBLIC_IP --destination-ports 80 \--translated-address 10.60.1.35 --translated-port 80 \--action Dnat --priority 100

You should now be able to access the internal AKS Service in the West US 2 Region by using a browser or a simple curl command to the West US 2 Azure Firewall public IP address

$ curl $WESTUS2_FWPUBLIC_IP27.  Update   with the inital backend host with the Azure Firewall public IP address for the East US 2 Azure datacenter$ az network front-door create \-n demoaks \-g aks-global \--backend-address $EASTUS2_FWPUBLIC_IP \--backend-host-header $EASTUS2_FWPUBLIC_IP \--protocol Http \--forwarding-protocol HttpOnly

28. Next is to add the Azure Firewall public IP address from the West US 2 Azure datacenter to the backend pool for

$ az network front-door backend-pool backend add \--resource-group aks-global \--front-door-name demoaks \--pool-name DefaultBackendPool \--address $WESTUS2_FWPUBLIC_IP \--backend-host-header $WESTUS2_FWPUBLIC_IP

You can now check that your backends have been configured

$ az network front-door backend-pool list  --front-door-name demoaks  --resource-group aks-global --query '[].backends' -o json

The last thing to do is to harden the Azure Firewalls in each region by adding the service CIDR range to each NAT rule

You are all set!!!!!!!

--

--