Okay so far the other 2 articles cover
- The basics — Deploying a Spring Boot Microservice on AKS using Terraform and Azure DevOps
- Identity & Governance — Using Azure AD Pod Managed Identities to access Azure resources without having to manage credentials and to use Azure Policy for observing governance at scale
In the third part, we focus on Networking and look into the following aspects:
a) Enabling Private Link for PAAS Services like Key Vault & SQL database
b) Restricting Egress traffic from the AKS cluster using Azure Firewall
c) Choose the right type of LoadBalancer for your Ingress Traffic
d) Securing pod traffic using Kubernetes Network Policies like denying inbound traffic to a particular namespace, or allowing outbound traffic only to a certain namespace etc.
In short, Private Link provides you with the capability to securely communicate with your Azure PAAS services without having to route your traffic over internet and utilizing Microsoft backbone network instead.
Private Link — a quintessential service for most of your use cases, in my personal/humble opinion, is the most misunderstood Azure Networking concept! Couple that with how DNS works in Azure, Auto-registration or Private Endpoints with Private DNS Zones, Split-Brain or Split-Horizon DNS resolution for the same Azure resource depending on where the client sits, it all becomes very complex very soon. If you found yourselves sailing in the same boat, I would recommend you to check out the following resources:
- Private Link MicroHack — This is an extremely good, hands-on lab where we setup a simple Hub/Spoke Azure network and simulated On-Premises environment with S2S VPN setup & then try to access a SQL Server from both Azure & on-premise VMs going through different scenarios with a focus on how DNS resolution works for your PAAS service utilizing the Private Link.
To reach this end state for the microhack, it will probably take you at least a couple of hours, but you will be happy at the end, that you spent those hours !
2. Azure Private Link & other VNET Integration patterns possible — another great piece of documentation
Enabling Private Link for Key Vault & SQL Database
For the purposes of this article, we will focus on creating Private Links for our Key Vault and SQL Server via Terraform. Remember when you create a Private Endpoint from portal and configure it for a particular PAAS service — a lot of magic happens in the backend that we will do explicitly in Terraform like:
- Creating a Private DNS Zone & linking that to the Virtual Network
- Create a Private Endpoint representing the Azure Resource
- Create an
A Recordfor the private endpoint in the Private DNS Zone
- Creating a subnet to hold our Private Endpoints in the same Virtual Network
Code snippets for changes for SQL server can be found below with succinct explanation:
Things to note:
- Apart from enabling Private Endpoint for SQL, we also disable public access from the SQL Server and remove the Service Endpoint created in the last article. The complete diff is available here. Unfortunately doing
public_network_access_enabled=falseisn’t supported yet on
azurerm_sql_serverso either switch to using
azurerm_mssql_serveror do it another way (CLI or Portal)
- The name of private DNS zone, as specified in
azurerm_private_dns_zonemust follow the Private DNS Zone name schema in the product documentation in order for the two resources to be connected successfully.
- The terraform data source for
azurerm_private_endpoint_connectioncomes in very handy to create an A Record for the SQL server with the Private IP that is created with Private endpoint — this is done automagically when using the Portal but with terraform we need to update our private DNS zone with the A Record for the Azure resource.
That’s it. Now with terraform do
$ terraform plan
$ terraform apply --auto-approve
Now we have a working Private Endpoint for SQL Server. All the code changes for the above are located in
privatelink branch of our repository. The high level architecture of the currently deployed infrastructure looks like
Using the Same Endpoint
One of the most important things to remember with Private Link is that you Must NOT Change the database URI or KeyVault Endpoint that your application originally uses to access SQL Server or Key Vault — No adding
Yes as confusing as it may sound, you MUST NOT replace your database endpoint from
myfunkydb.privatelink.database.windows.net — You Simply Must Not Do that. Continue using the original endpoint and if you have configured the Private Endpoint correctly with Azure Private DNS, the Azure DNS Server will resolve the right private endpoint for you when you look for the
myfunkydb.database.net — Don’t believe me. Do a
nslookup for your database from within your kubernetes cluster. You can use the following DNS Util Test file to deploy DNS tools like
Once this Pod is deployed you can
exec into it and perform a
nslookup as below:
Again if this is not clear, I would recommend you look at the microhack that is referenced above in this article.
As a side note, for the purposes of this article I have intentionally not enabled Private Link for Azure Container Registry — otherwise I would need to use Self-hosted DevOps Agents to be able to build & deploy images to ACR. But YMMV !
Egress Traffic with Azure Firewall
This topic is already covered in detail in the official documentation so I won’t repeat the theory & right away go into what changes we need to make
- Create a subnet — Azure Firewall lives in its own Subnet so we’ll be creating a subnet in our virtual network. Ideally the right place for the Firewall is an a Hub Virtual Network in a Hub & Spoke topology where it can act as a central place to filter traffic. But for the purposes of this article, we will deploying in our only virtual network already containing the AKS cluster.
- Create a UDR in AKS Subnet to force/route all outgoing traffic from AKS to Firewall
3. Change the outbound egress traffic routing for the AKS cluster — By default the Standard SKU Load Balancer is used for all outbound traffic but we want to force all traffic via Azure Firewall and hence we need to change the outboundType profile for the AKS cluster to
userDefinedRouting Unfortunately doing this now, will result in re-creation of the cluster and you would need to deploy your workloads again. The complete code changes for firewall and egress can be found in the
egress branch of our repository.
You would also need to configure firewall rules that apply for your use-case as per the detailed official documentation here. For our setup the rules are present here. You can now deploy the updated code by
terraform plan &
terraform apply The architecture now looks like
Please be mindful of the fact that deploying Azure Firewall can incur some significant costs for you, so please use this with caution for your PoCs. You’ve been warned !
Also note that you could ideally use any other NVA (Network Virtual Appliance) or any other third-party Firewall products like Palo Alto, F5, FortiGate, Cisco etc. but be mindful that with those products you would have to take care of High Availability yourself ( probably by deploying multiple instances of the firewall NVA running on VMs behind a load balancer ) With Azure Firewall everything is SDN based and you don’t have to worry about scaling it up or HA.
Ingress: Spoilt for choice !
The Default Option — Azure Load Balancer
By default our AKS cluster (defaults to using an outboundType of LoadBalancer) creates an instance of public Load Balancer (Standard SKU) that operates at Layer 4 and provides
- Inbound connectivity to services of type
- Outbound connectivity to cluster nodes
The backend pool of the Load Balancer consists of all the nodes in all the node pools and when a request comes into the load balancer it is redirected to any of the nodes in the backend pool and from there
kubeproxy is responsible for redirecting the traffic to the appropriate node/pod
Asymmetric Routing with Public Load Balancer
Asymmetric routing is where a packet takes one path to the destination and takes another path when returning to the source. Now we have a public load balancer (with a public IP address) which is exposing services in the AKS cluster but the outbound traffic is forced to tunnel via the Firewall. Since Azure Firewall (like most firewalls) are static, this can break and it drops the returning packet because the firewall is not aware of any such established session. There are 2 ways to fix this — either use an internal load balancer and access the service outside by Firewall PIP or create additional rules at Firewall and more UDR on AKS Subnet as described in our official documentation
Private AKS Cluster
To further secure the AKS cluster — we can deploy the cluster in Private mode. This will ensure that there is no public IP associated with the Control Plane of the kubernetes cluster (which is managed & hosted by Microsoft in another subscription) The network communication b/w the Data plane (which resides in customer’s subscription) and the Control Plane happens over the private network (thanks to Private Endpoints)
With a Private AKS Cluster we would deploy a JumpBox VM and Azure Bastion in their own subnets to access the private cluster. The code for deploying a Jumpbox VM & Azure Bastion is located here. Another option would be to use the currently in preview, AKS RUN command feature
Terraform Gotchas — Lifecycle meta argument !
So as I found out rather painfully, sometimes Terraform can be turn out to be a PITA and track all the changes that happen to your resources after provisioning as a side effect to something else. Yes since I’m doing a terrible job explaining it — check out this blog post for details.
Basically I had to ask terraform to ignore changes to
enforce_private_link_endpoint_network_policies for my subnet containing the AKS cluster — the default value of this property is
false and since I’m not deploying any Private Endpoints in my subnet manually I dont touch that value or turn it to
true But when you enable Private AKS cluster, it kinds of deploys a Private Endpoint for the AKS control plane in your subnet and turns this value to
true so next time you are making a change it may complain something like
The solution is pretty simple — use the lifecycle meta argument for ignoring changes to attributes:
Internal Load Balancer
We further do not use a Public Azure Load Balancer rather an Internal Load Balancer with an Internal or Private IP address by applying the annotation
service.beta.kubernetes.io/azure-load-balancer-internal: “true" to your service definition. Ours would look like
Network Contributor Role for the AKS cluster identity
Also you may need to assign the cluster identity of your AKS cluster the Network Contributor role on the resource group where the VNet resources are deployed. This is required so that when you try to create the internal load balancer for exposing your services, the AKS cluster is able to create a Load Balancer in the virtual network. For our use case, following would suffice:
NODE_GROUP=$(az aks show -g $AKS_RESOURCE_GROUP -n $AKS_CLUSTER_NAME --query nodeResourceGroup -o tsv)
NODES_RESOURCE_ID=$(az group show -n $NODE_GROUP -o tsv --query "id")
AKS_MANAGED_IDENTITY_ID=$(az aks show --resource-group SpringBootAppRG --name springboot-aks --query "identity" | jq -r
.principalId)az role assignment create --assignee $AKS_MANAGED_IDENTITY_ID --role "Network Contributor" --scope $NODES_RESOURCE_ID
One important caveat to remember is sometimes the role propagation for the cluster identity can take upto 60 minutes to come into effect. Yes quite annoying I agree ! Once its there, you are able to create the internal load balancer when you
kubectl apply the above yaml file
You can also use any of the popular open source ingress controllers (Layer 7) such as NGINX, Traefik etc. or look at the managed options (backed with an SLA) such as Application Gateway Ingress Controller to act as Ingress for your services.
Now you cannot use NSGs or any other Azure networking controls to control the traffic within the AKS cluster between the pods & services (east-west traffic) so we should instead use the Kubernetes Network Policies — since we are using the Azure CNI plugin the Network Policies are supported. Check out the official documentation to secure pod traffic and perform things like deny inbound traffic to certain namespace or restrict traffic from a certain namespace. I personally find Network Policies to be very useful to limit network traffic based on namespaces and label selectors.