Secure your Microservices on AKS — Part 3 — The Network !

Agraj Mangal
May 4 · 10 min read

Okay so far the other 2 articles cover

  1. The basics — Deploying a Spring Boot Microservice on AKS using Terraform and Azure DevOps
  2. Identity & Governance — Using Azure AD Pod Managed Identities to access Azure resources without having to manage credentials and to use Azure Policy for observing governance at scale

In the third part, we focus on Networking and look into the following aspects:

a) Enabling Private Link for PAAS Services like Key Vault & SQL database
b) Restricting Egress traffic from the AKS cluster using Azure Firewall
c) Choose the right type of LoadBalancer for your Ingress Traffic
d) Securing pod traffic using Kubernetes Network Policies like denying inbound traffic to a particular namespace, or allowing outbound traffic only to a certain namespace etc.

Private Link

In short, Private Link provides you with the capability to securely communicate with your Azure PAAS services without having to route your traffic over internet and utilizing Microsoft backbone network instead.

Azure Private Link

Private Link — a quintessential service for most of your use cases, in my personal/humble opinion, is the most misunderstood Azure Networking concept! Couple that with how DNS works in Azure, Auto-registration or Private Endpoints with Private DNS Zones, Split-Brain or Split-Horizon DNS resolution for the same Azure resource depending on where the client sits, it all becomes very complex very soon. If you found yourselves sailing in the same boat, I would recommend you to check out the following resources:

  1. Private Link MicroHack — This is an extremely good, hands-on lab where we setup a simple Hub/Spoke Azure network and simulated On-Premises environment with S2S VPN setup & then try to access a SQL Server from both Azure & on-premise VMs going through different scenarios with a focus on how DNS resolution works for your PAAS service utilizing the Private Link.
Courtesy: Adam Stuart’s Private Link Microhack

To reach this end state for the microhack, it will probably take you at least a couple of hours, but you will be happy at the end, that you spent those hours !

2. Azure Private Link & other VNET Integration patterns possible — another great piece of documentation

Enabling Private Link for Key Vault & SQL Database

For the purposes of this article, we will focus on creating Private Links for our Key Vault and SQL Server via Terraform. Remember when you create a Private Endpoint from portal and configure it for a particular PAAS service — a lot of magic happens in the backend that we will do explicitly in Terraform like:

  • Creating a Private DNS Zone & linking that to the Virtual Network
  • Create a Private Endpoint representing the Azure Resource
  • Create an A Record for the private endpoint in the Private DNS Zone
  • Creating a subnet to hold our Private Endpoints in the same Virtual Network

Code snippets for changes for SQL server can be found below with succinct explanation:

Private Link for SQL Server — Full code here

Things to note:

  1. Apart from enabling Private Endpoint for SQL, we also disable public access from the SQL Server and remove the Service Endpoint created in the last article. The complete diff is available here. Unfortunately doing public_network_access_enabled=false isn’t supported yet on azurerm_sql_server so either switch to using azurerm_mssql_server or do it another way (CLI or Portal)
  2. The name of private DNS zone, as specified in azurerm_private_dns_zone must follow the Private DNS Zone name schema in the product documentation in order for the two resources to be connected successfully.
  3. The terraform data source for azurerm_private_endpoint_connection comes in very handy to create an A Record for the SQL server with the Private IP that is created with Private endpoint — this is done automagically when using the Portal but with terraform we need to update our private DNS zone with the A Record for the Azure resource.

That’s it. Now with terraform do

$ terraform plan
$ terraform apply --auto-approve

Now we have a working Private Endpoint for SQL Server. All the code changes for the above are located in privatelink branch of our repository. The high level architecture of the currently deployed infrastructure looks like

Architecture So Far !

Using the Same Endpoint

One of the most important things to remember with Private Link is that you Must NOT Change the database URI or KeyVault Endpoint that your application originally uses to access SQL Server or Key Vault — No adding privatelink to it.

Yes as confusing as it may sound, you MUST NOT replace your database endpoint from myfunkydb.database.windows.net to myfunkydb.privatelink.database.windows.net — You Simply Must Not Do that. Continue using the original endpoint and if you have configured the Private Endpoint correctly with Azure Private DNS, the Azure DNS Server will resolve the right private endpoint for you when you look for the myfunkydb.database.net — Don’t believe me. Do a nslookup for your database from within your kubernetes cluster. You can use the following DNS Util Test file to deploy DNS tools like nslookup

Once this Pod is deployed you can exec into it and perform a nslookup as below:

nslookup from your k8s cluster

Again if this is not clear, I would recommend you look at the microhack that is referenced above in this article.

As a side note, for the purposes of this article I have intentionally not enabled Private Link for Azure Container Registry — otherwise I would need to use Self-hosted DevOps Agents to be able to build & deploy images to ACR. But YMMV !

Egress Traffic with Azure Firewall

This topic is already covered in detail in the official documentation so I won’t repeat the theory & right away go into what changes we need to make

  1. Create a subnet — Azure Firewall lives in its own Subnet so we’ll be creating a subnet in our virtual network. Ideally the right place for the Firewall is an a Hub Virtual Network in a Hub & Spoke topology where it can act as a central place to filter traffic. But for the purposes of this article, we will deploying in our only virtual network already containing the AKS cluster.
  2. Create a UDR in AKS Subnet to force/route all outgoing traffic from AKS to Firewall

3. Change the outbound egress traffic routing for the AKS cluster — By default the Standard SKU Load Balancer is used for all outbound traffic but we want to force all traffic via Azure Firewall and hence we need to change the outboundType profile for the AKS cluster to userDefinedRouting Unfortunately doing this now, will result in re-creation of the cluster and you would need to deploy your workloads again. The complete code changes for firewall and egress can be found in the egress branch of our repository.

Complete Code for AKS Cluster here

You would also need to configure firewall rules that apply for your use-case as per the detailed official documentation here. For our setup the rules are present here. You can now deploy the updated code by terraform plan & terraform apply The architecture now looks like

AKS with Azure Firewall for Egress/Ingress

Please be mindful of the fact that deploying Azure Firewall can incur some significant costs for you, so please use this with caution for your PoCs. You’ve been warned !

Also note that you could ideally use any other NVA (Network Virtual Appliance) or any other third-party Firewall products like Palo Alto, F5, FortiGate, Cisco etc. but be mindful that with those products you would have to take care of High Availability yourself ( probably by deploying multiple instances of the firewall NVA running on VMs behind a load balancer ) With Azure Firewall everything is SDN based and you don’t have to worry about scaling it up or HA.

Ingress: Spoilt for choice !

The Default Option — Azure Load Balancer

By default our AKS cluster (defaults to using an outboundType of LoadBalancer) creates an instance of public Load Balancer (Standard SKU) that operates at Layer 4 and provides

  • Inbound connectivity to services of type LoadBalancer
  • Outbound connectivity to cluster nodes

The backend pool of the Load Balancer consists of all the nodes in all the node pools and when a request comes into the load balancer it is redirected to any of the nodes in the backend pool and from there kubeproxy is responsible for redirecting the traffic to the appropriate node/pod

Asymmetric Routing with Public Load Balancer

Asymmetric routing is where a packet takes one path to the destination and takes another path when returning to the source. Now we have a public load balancer (with a public IP address) which is exposing services in the AKS cluster but the outbound traffic is forced to tunnel via the Firewall. Since Azure Firewall (like most firewalls) are static, this can break and it drops the returning packet because the firewall is not aware of any such established session. There are 2 ways to fix this — either use an internal load balancer and access the service outside by Firewall PIP or create additional rules at Firewall and more UDR on AKS Subnet as described in our official documentation

Courtesy: Microsoft Official Docs on Integrating Firewall with Load Balancer

Private AKS Cluster

To further secure the AKS cluster — we can deploy the cluster in Private mode. This will ensure that there is no public IP associated with the Control Plane of the kubernetes cluster (which is managed & hosted by Microsoft in another subscription) The network communication b/w the Data plane (which resides in customer’s subscription) and the Control Plane happens over the private network (thanks to Private Endpoints)

Private AKS Cluster

With a Private AKS Cluster we would deploy a JumpBox VM and Azure Bastion in their own subnets to access the private cluster. The code for deploying a Jumpbox VM & Azure Bastion is located here. Another option would be to use the currently in preview, AKS RUN command feature

Accessing AKS cluster using Bastion & Jumpbox

Terraform Gotchas — Lifecycle meta argument !

So as I found out rather painfully, sometimes Terraform can be turn out to be a PITA and track all the changes that happen to your resources after provisioning as a side effect to something else. Yes since I’m doing a terrible job explaining it — check out this blog post for details.

Basically I had to ask terraform to ignore changes to enforce_private_link_endpoint_network_policies for my subnet containing the AKS cluster — the default value of this property is false and since I’m not deploying any Private Endpoints in my subnet manually I dont touch that value or turn it to true But when you enable Private AKS cluster, it kinds of deploys a Private Endpoint for the AKS control plane in your subnet and turns this value to true so next time you are making a change it may complain something like

Terraform Plan Complaining about Changing a property on AKS Subnet

The solution is pretty simple — use the lifecycle meta argument for ignoring changes to attributes:

Lifecycle meta argument for AKS subnet

Internal Load Balancer

We further do not use a Public Azure Load Balancer rather an Internal Load Balancer with an Internal or Private IP address by applying the annotation service.beta.kubernetes.io/azure-load-balancer-internal: “true" to your service definition. Ours would look like

Internal Load Balancer

Network Contributor Role for the AKS cluster identity

Also you may need to assign the cluster identity of your AKS cluster the Network Contributor role on the resource group where the VNet resources are deployed. This is required so that when you try to create the internal load balancer for exposing your services, the AKS cluster is able to create a Load Balancer in the virtual network. For our use case, following would suffice:

NODE_GROUP=$(az aks show -g $AKS_RESOURCE_GROUP -n $AKS_CLUSTER_NAME --query nodeResourceGroup -o tsv)
NODES_RESOURCE_ID=$(az group show -n $NODE_GROUP -o tsv --query "id")
AKS_MANAGED_IDENTITY_ID=$(az aks show --resource-group SpringBootAppRG --name springboot-aks --query "identity" | jq -r
.principalId)
az role assignment create --assignee $AKS_MANAGED_IDENTITY_ID --role "Network Contributor" --scope $NODES_RESOURCE_ID

One important caveat to remember is sometimes the role propagation for the cluster identity can take upto 60 minutes to come into effect. Yes quite annoying I agree ! Once its there, you are able to create the internal load balancer when you kubectl apply the above yaml file

Able to Create Internal Load Balancer after Assigning Network Contributor Role to AKS Cluster Identity

Ingress Controllers

You can also use any of the popular open source ingress controllers (Layer 7) such as NGINX, Traefik etc. or look at the managed options (backed with an SLA) such as Application Gateway Ingress Controller to act as Ingress for your services.

Network Policies

Now you cannot use NSGs or any other Azure networking controls to control the traffic within the AKS cluster between the pods & services (east-west traffic) so we should instead use the Kubernetes Network Policies — since we are using the Azure CNI plugin the Network Policies are supported. Check out the official documentation to secure pod traffic and perform things like deny inbound traffic to certain namespace or restrict traffic from a certain namespace. I personally find Network Policies to be very useful to limit network traffic based on namespaces and label selectors.

Microsoft Azure

Any language.

Agraj Mangal

Written by

Cloud Solution Architect @ Microsoft | Full Stack Dev | Big Data Enthusiast | Ex-Adobe | Cloud-Native Citizen | https://agrajmangal.in/blog/ | Opinions my own

Microsoft Azure

Any language. Any platform. Our team is focused on making the world more amazing for developers and IT operations communities with the best that Microsoft Azure can provide. If you want to contribute in this journey with us, contact us at medium@microsoft.com

Agraj Mangal

Written by

Cloud Solution Architect @ Microsoft | Full Stack Dev | Big Data Enthusiast | Ex-Adobe | Cloud-Native Citizen | https://agrajmangal.in/blog/ | Opinions my own

Microsoft Azure

Any language. Any platform. Our team is focused on making the world more amazing for developers and IT operations communities with the best that Microsoft Azure can provide. If you want to contribute in this journey with us, contact us at medium@microsoft.com

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store