Making the case for Amazon Route 53 PHZ Gateway
This post proposes a new AWS resource, potentially called the “Amazon Route 53 PHZ Gateway” to solve challenges related to associating Private Hosted Zones (PHZs) with Virtual Private Clouds (VPCs) in a large enterprise Domain Name System (DNS) architecture.
This post assumes that readers have a basic understanding of the following AWS services and topics:
Amazon Route 53 Resolver
Domain Name System is the bedrock of any IT networking architecture. It allows IT resources to discover other IT resources using logical names and translates those logical names into IP addresses, this process is referred to as DNS resolution. Therefore, every AWS customer needs a scalable, resilient, manageable and evolvable DNS architecture.
AWS greatly simplifies DNS by providing the fully managed Route 53 Resolver (previously called .2 DNS) with every VPC. It is a multi-tenant global service and works out of the box and often does not require any configuration/ management from customers in a single VPC architecture.
Managing DNS Across a Large Number of VPCs
Large enterprises often need to isolate workloads in order to satisfy requirements such as: reducing blast radius and enabling granular security or billing. This isolation can be performed by deploying workloads in multiple AWS accounts and/or VPCs (often 10s-100s). From a networking perspective, this architecture is usually implemented by creating a Hub and Spoke architecture where a Hub VPC centralizes the common networking resources consumed by multiple Spoke VPCs.
The centralization of networking resources into a Hub VPC also simplifies communication between AWS and on-premise networks in the case where hybrid networks are deployed. The architecture allows the following DNS resolution use-cases:
- between customer VPCs (within the same account and across accounts)
- between customer VPCs and on-premise network (and vice versa)
- between customer VPCs and AWS Private Link resources
Selecting the Right DNS Architecture
At AWS re:Invent 2019, Gavin McCullagh presented a session titled “Deep dive on DNS in the hybrid cloud”. The session provides excellent background information required for architecting a Route 53 based DNS solution. Towards the end of the session Gavin highlighted four possible DNS architectures with pros and cons for each solution along with AWS recommended architecture for a large number of VPCs.
Share and Associate Architecture
Gavin’s 4th proposed DNS architecture, “Share and associate” is prescribed as AWS best practice. It turned out to be the most scalable and resilient DNS architecture for a large number of VPCs. Therefore, I am only going to highlight the usage of this architecture, which is also referred to as “Multi-Account Decentralized” in the AWS whitepaper “Hybrid Cloud DNS Options for Amazon VPC”.
This architecture requires associating all customer defined PHZs with all customer VPCs so that each VPC has a consistent view of DNS. All DNS resolution queries are forwarded directly to the Route 53 service which results in:
- usage of local caching (with .2 endpoint)
- complete Availability Zone isolation (i.e. DNS queries do not traverse availability zone boundaries)
- and low costs (PHZ sharing has no cost)
The cornerstone of this architecture is the customer’s ability to successfully associate PHZs with VPCs at scale in a meshed manner.
Route 53 PHZ usage
AWS customers are required to create Route 53 Private Hosted Zones in the following use-cases:
Custom domains
AWS resources such as EC2 instances are assigned private domain names by AWS such as ip-private-ipv4-address.ec2.internal. Enterprise customers may instead want to use custom domains for instances within each VPC. A custom domain can be created by using a Route 53 PHZ and then creating an “A record” for each EC2 instance. Route 53 PHZs are then shared and associated with all other VPCs to allow DNS resolution by using custom domains in addition to the AWS provided private domain names.
Based on this design, if a customer creates a new VPC with a PHZ (for a custom domain), then the newly created PHZ needs to be shared and associated with all existing VPCs. For example, a customer with 100 VPCs with custom domains would need to manage 100 VPC x 100 PHZ = 10,000 PHZ-VPC associations.
Please note that using 100 custom domains may not be a common use case but it was used to illustrate the issues with having a large number of associations.
Sharing Private Link Endpoints
Large enterprises with many AWS accounts and VPCs often want to consolidate and share Private Link Endpoints such as Interface VPC Endpoints centrally to reduce cost.
The “Share and associate” architecture requires customers to create a PHZ and an “Alias Record” for every Interface VPC Endpoint and share these PHZs with all VPCs. The steps to setup this design is described in the AWS blog by James Devine.
Based on this design, if a customer is using 50 Interface VPC Endpoints with 100 VPCs then they need to setup 50 PHZ x 100 VPC = 5,000 PHZ-VPC associations. Note that these associations are AWS region specific and customers would need to perform the same operation in every AWS region used by the enterprise.
PHZ Sharing is Cumbersome
Most enterprise customers automate the process of creating VPCs and Interface VPC Endpoints with IaC tools such as CloudFormation/AWS CLI scripts. The PHZ association can also be performed as part of this automation, however even then, an addition or removal of an Interface VPC Endpoint or a VPC requires association/disassociation processes to take place across a large number of Interface VPC Endpoints/VPCs. Therefore, such automation processes have a large misconfiguration blast radius.
AWS Multi Account PHZ Sharing
In the case of a multi-account architecture, PHZ association is an even more complicated process where association needs to take place by issuing AWS API calls from multiple AWS accounts as explained here.
The Onus is on the Customer
Route 53 is a fully managed service but PHZ-VPC association at scale is a potentially error-prone process with a large misconfiguration blast radius and the onus is on the customer to develop, test and manage such an automation. A bug or misconfiguration in PHZ-VPC association automation can result in a DNS disruption across the customer’s AWS landscape.
Although the “Share and associate” architecture is a resilient and scalable DNS architecture and checks the box for considerations such as: local caching, AZ isolation, minimal forwarding hops, minimal cost but it is complex to manage and setup.
In any IT architecture, manageability is an equally important architectural pillar as scalability, resilience and evolvability. Therefore there is a need to simplify this architecture.
Proposing Route 53 PHZ Gateway
PHZ association with VPC is a many-to-many association. This process can be simplified by introducing a logical resource such as “Route 53 PHZ Gateway”.
With the proposed Route 53 PHZ Gateway, the PHZ-VPC association can be essentially converted from the many-to-many association approach to one-to many associations for both PHZ and VPC.
A customer can create a Route 53 PHZ Gateway and then attach all PHZs and VPCs to this gateway.
From the customer’s perspective, the addition/removal of a PHZ or a VPC only requires 1 association/disassociation i.e. associate/disassociate the new PHZ/VPC with Route 53 PHZ Gateway. The gateway can then perform the underlying many-to-many associations on behalf of the customer.
Now with Route 53 PHZ Gateway, let us re-calculate the total associations needed from the customer’s perspective for the two use-cases discussed above:
Custom Domains:
A customer using 100 VPCs with custom domains would only need to manage 100 VPC + 100 PHZ = 200 Route 53 PHZ Gateway associations as opposed to 10,000 PHZ-VPC associations (as described above).
Sharing Private Link Endpoints
A customer using 50 Interface VPC Endpoints with 100 VPCs would only need to manage 50 PHZ + 100 VPC = 150 Route 53 PHZ Gateway associations as opposed to 5,000 PHZ-VPC associations (as described above).
Route 53 PHZ Gateway is a Logical Resource
Similar to AWS Transit Gateway which simplified the meshed VPC association approach (performed by VPC Peering) via a simplified hub and spoke approach, the Route 53 PHZ Gateway can simplify PHZ-VPC association. However, in comparison to AWS Transit Gateway, Route 53 PHZ Gateway is only a logical resource and not a compute resource (i.e. it does not process DNS resolution queries).
For the sake of simplicity, the above architectural diagram does not display AWS Account boundaries. Each of the above VPCs can very well belong to separate AWS accounts and Route 53 PHZ Gateway shall be shareable with other accounts in AWS Organizations using AWS Resource Access Manager (RAM) to allow associations in a multi-account architecture.
A Managed Service
The proposed Route 53 PHZ Gateway is only used to simplify the process of PHZ-VPC associations from the customer’s perspective. From the AWS’ perspective, the underlying PHZ-VPC associations remain the same. With the proposed Gateway, the responsibility of managing a large number of PHZ-VPC associations migrates from customers to AWS and makes Route 53 PHZ Gateway a managed service.
Conclusion
The current process of Route 53 PHZ-VPC association is a not a scalable solution and the onus is on the customer to perform the associations correctly. Failure to do so can result in widespread DNS disruption. Therefore, a managed association solution from AWS can remove the risk of such disruption and would encourage customers to adopt the prescribed “Share and associate” architecture risk free.