Multi-tenant and hybrid DNS with Azure Private DNS
This article covers how The Azure platform team handles registering and resolving of Azure Private Endpoints in a multi-tenant and hybrid DNS setup.
If you find yourself in a situation where you need to handle multi-tenant Domain Name System (DNS) together with an on-premises environment, look no further. In this article I’m writing how we did multi-tenant and hybrid DNS at SpareBank 1, one of Norway’s largest financial institutions.
This article is one of several articles we are writing about our brand new Azure platform we’re calling Eunomia at SpareBank 1. In simple terms we’re creating a multi-tenant platform to fit the needs of the alliance.
We did a presentation at Ignite 2022, watch it here: Spotlight on Norway | CLC08 — YouTube
Short background introduction
SpareBank 1 is an alliance of 13 banks and over 40 product companies. As individual legal entities, they choose themselves whether to collaborate in key areas such as IT operations and sytem development.
A large number of these banks and companies share an on-premises Active Directory environment. On-premises AD uses AD Connect to syncronise users and groups to their own Azure AD tenant.
The challenge
I’m not going into why there are 13 tenants and workloads running in each tenant, which means we have this requirement for cross-tenant and hybrid dns resolve.
The challenge is to support DNS across the whole architecture. DNS resolution needs to work in each tenant, from on-premises to Azure workloads (Key Vault, Storage, Web apps etc.) running in each tenant as well as internal applications on-premises.
This wouldn’t be a challenge if we could leverage public DNS for everything, but we need to keep everything on a private network. Where applicable, developers must use Azure Private Link on Azure PaaS services that support it. This is a big challenge!
Requirements:
- Resolve private endpoints FQDN’s in any tenant from any tenant and on-premises
- Automate registering of Private Endpoint FQDN’s to a an Azure Private DNS Zones
Take a look at this figure to understand the challenge a bit more.
Single tenant DNS
As you may understand from the figure above, DNS in this setting is a bit challenging. But let’s look at how we would do DNS in a single tenant.
Azure has a PaaS service called Azure Private DNS Zones. That is perfect for our use case. We can create the DNS zones we need and add records that resolves to the ip’s of the workloads we have.
Using Azure Policy we could do automatic registration of Private Endpoints Fully Qualified Domain Names (FQDN). This means that developers would create their private endpoints and after a couple of minutes the FQDN’s would be automatically registered in it’s associated private dns zone.
The figure below shows a simple architecture on how to do DNS in single tenant.
- Custom DNS in vnet’s would point to the central dns-servers hosted in the HUB vnet.
- The dns-servers(in HUB-vnet) would forward all DNS request to Azure’s own DNS service in it’s vnet. Azure Recursive resolver will take the DNS request and try to resolve it.
- Since the Azure Private DNS Zones are linked to the HUB vnet the resolver can look up records in those zones.
- The magic sauce here is actually the Azure Recursive Resolver which will look up in all available sources for the record.
The automatic registration of a private endpoint FQDN is accomplished by using Azure Policy. The Azure Policy would target all resources of type Microsoft.Network/privateEndpoints and deploy a resource of type Microsoft.Network/privateEndpoints/privateDnsZoneGroups on the private endpoint.
Microsoft has several resources available to create a deployment like this. See sources here:
Private Link and DNS integration at scale — Cloud Adoption Framework | Microsoft Learn
In the next section this architecture is expanded to work across multiple tenants together with an on-premises environment.
Multi-tenant and hybrid DNS
In this section I will explain in detail how we did multi-tenant and hybrid DNS at SpareBank 1.
HUB and spoke tenants
You have probably heard of hub and spoke topology related to Azure networking. We’re expanding on that where we introduce the concept of hub-tenant and spoke-tenants.
In the maze of all our tenants there is only one HUB-tenant and all other tenants are spoke-tenants. The HUB-tenant is used to centralize some services that can be consumed by the spoke tenants, such as DNS.
Azure Private DNS Zones
We’re using Azure Private DNS Zones to host records for all of our private endpoints. We deploy all the zones we need/for all the PaaS services we are using.
In the figure below you can see we have a subscription called core-con, this is where we host all connectivity services, such as Azure Firewall, Azure vwan, DNS, VPN to on-premises and third-party tenants. These workloads is only necessary in the HUB-tenant. Vnet’s in spoke tenants is peered to the HUB vnet.
We host the Azure Private DNS Zones in the resource group hub-core-con-pdns-nea-rg; The acronyms stands for:
hub — core — connectivity — private dns — norway east — resource group
The private dns zones is vnet-linked to our virtual network hub-core-con-net-nea-vnet in resource group hub-core-con-net-nea-rg.
Private Link and DNS registration in a multi-tenant environment
In this section I’ll go through how we manage the lifecycle of DNS records for private endpoints. The lifecycle must ensure that records are automatically created in the matching private DNS zone for the service being created. Since we have our Azure Private DNS Zones in our HUB-tenant we need a way to write spoke-tenant’s private endpoints zone configuration to our centralised private dns zones.
Writing a private endpoint zone configuration to a private DNS zone is fairly straight forward in a single tenant setup. We did that in the single tenant section above by leveraging Azure Policy to do the work for us. Take a look at the figure below to get an idea of what we want to accomplish and keep in mind how we leveraged Azure Policy earlier to write the DNS zone configuration of a private endpoint to a private dns zone.
In the single tenant design the policy assignment would deploy the zone configuration in the same tenant. In this multi-tenant design we need each spoke tenant to do the same as a single tenant, but instead of deploying to private dns zones in the same tenant, we need it to deploy to our centralised private dns zones in our HUB-tenant.
Reverse Azure Lighthouse concept
You have probably heard about Azure Lighthouse. It allows an identity in a managing tenant to have Azure Role Based Access Control(rbac) permissions in a delegated tenant. So what if we use this and let all the spoke tenants become a managing tenant for our HUB-tenant? But with limited delegated permissions.
For an identity to write zone configuration to a private DNS zone it needs the RBAC permission Private DNS Zone Contributor. We can create a managed identity in each spoke tenant and assign the identity Private DNS Zone Contributor on the resource group where our Private DNS Zones is in the HUB-tenant using our reverse Lighthouse concept. The figure below shows the reverse lighthouse concept.
The last, but most important, part is how we can now leverage Azure Policy in each spoke tenant to automatically register all Azure Private Endpoints fqdn’s in the HUB Private DNS Zones.
Azure Policy — Deploy if not exist — cross tenant
We deploy our Register private dns Azure Policy Definition to each spoke tenant and create assignments for each PaaS resource/groupid/region.
PolicyRule
"policyRule": {
"if": {
"allOf": [
{
"field": "type",
"equals": "Microsoft.Network/privateEndpoints"
},
{
"count": {
"field": "Microsoft.Network/privateEndpoints/privateLinkServiceConnections[*]",
"where": {
"allOf": [
{
"field": "Microsoft.Network/privateEndpoints/privateLinkServiceConnections[*].privateLinkServiceId",
"contains": "[parameters('privateLinkServiceId')]"
},
{
"field": "Microsoft.Network/privateEndpoints/privateLinkServiceConnections[*].groupIds[*]",
"equals": "[parameters('privateEndpointGroupId')]"
}
]
}
},
"greaterOrEquals": 1
}
]
},
The policy deploys if not exists (DINE) a resource of type Microsoft.Network/privateEndpoints/privateDnsZoneGroups.
"resources": [
{
"name": "[concat(parameters('privateEndpointName'), '/deployedByPolicy')]",
"type": "Microsoft.Network/privateEndpoints/privateDnsZoneGroups",
"apiVersion": "2022-05-01",
"location": "[parameters('location')]",
"properties": {
"privateDnsZoneConfigs": [
{
"name": "privateDnsZone",
"properties": {
"privateDnsZoneId": "[parameters('privateDnsZoneId')]"
}
}
]
}
}
]
Because our managed identity in each spoke tenant has Private DNS Zone Contributor rbac permission to the hub-tenant we only need to reference the resource id to the Azure Private DNS Zone in the policy assignment.
DNS Configuration
The figure below shows an overview on how DNS is configured on-premises, in spoke vnets (cross-tenant) and on HUB DNS server.
When setting up Conditional Forwarders from on-premises to the DNS server in Azure I recommend starting with just a few zones that you are currently using. Don’t configure the whole list of public DNS zones which Microsoft lists here: Azure Private Endpoint DNS configuration | Microsoft Learn
Closing Notes
With the configuration the benefits of the cloud are clear. Set up any PaaS service with private link across any of the multiple tenants, and we have full automation (including lifecycle management) for that private endpoint’s DNS records. Developers do not need to think about it when creating their systems, and it is also very low overhead for the Azure platform team. This works brilliantly for us!
During the design and deployment of this the Azure DNS Private Resolver was still in preview. We’re looking into moving away from VMs to the PaaS solution. The PaaS solution will contribute greatly in achieving a more resilient solution.
We’ve had this in production for a couple of months now and we’re experiencing a couple of challenges:
- Azure Static Web App has a partition id in its private dns zone name. It is not documented which partition id’s this can be. This makes it difficult to pre-provision the Private DNS zones for this and also create policy assignment to target the correct private dns zone. See issue #101133 and #99388
- Azure Machine Learning workspace creates several records utlizing two Private DNS Zones. Our Azure Policy only handle one of the zones and leaves us to handle the second manually. With some additional work on the policy I’m sure it’s possible to make it work. We have published an Github Issue on it here: #99388