AWS Transit Gateway — asymmetric routing, shared services VPC and beyond

Published in

The Startup

12 min readJul 8, 2019

In this article I want to discuss new possibilities opened by Transit Gateway and Route53 Resolver for multi-VPC environments. We will start from small example of multi-VPC architecture and extend it to include shared NAT service and shared VPC interface endpoints. This examples can be easily extended or adapted to many variations and requirements.

We will begin with defining requirements on our VPCs. Suppose we want to have set of private VPCs and transit VPC (in fact it is more “shared services” VPC, but I will use name “transit” to highlight its relation with transit gateway). The constraints on routing are following:

Since transit VPC provides shared services for all private VPCs, communication between them should be possible.
Private VPCs by default shouldn’t be able to reach each-other, we want to have full control on this and possibility for enabling cross-private-VPC access on demand.

Having our initial conditions set, the example setup discussed below will consists of 3 private VPCs and a transit VPC. This will allow us to have all possible routing scenarios — transit VPC accessible by all private VPCs, VPC1 and VPC2 can access each-other, VPC3 is isolated.

All VPCs can reach transit VPC, VPC1 and VPC2 can communicate with each other, VPC3 is isolated.

Now let’s have a look what tools and options AWS Transit Gateway provides us. The resources we will work with are:

Transit gateway
Transit gateway attachment (VPC, VPN and DirectConnect)
Transit gateway route table
Attachment association with route table
And the routes pointing to attachments

The way all this works together is following — everything starts with creation of transit gateway. The transit gateway can be shared with other AWS accounts. VPCs and VPNs can be attached to transit gateway, this will give them unique attachment ID. Multiple route tables can be created in transit gateway. Attachments can be associated with only one route table. When traffic from attachment (a.k.a VPC) reaches transit gateway the associated route table will be used for routing decision. And we have routes in the route table, which can point to any attachment of the transit gateway, even if the attachment is not associated with that route table. The last feature is what makes asymmetric routing possible with transit gateway.

Coming back to our example — so how can we make this setup? If we use single route table and connect all VPCs together, isolation of VPC3 will be violated, so star configuration is not going to work. The solution that will satisfy all our constraints from transit gateway routing perspective is following — for each attachment we will create separate transit gateway route table and will associate the attachment with it. This effectively gives us the possibility of making distinct and separate routing decisions for traffic received from every attachment (a.k.a VPC). So, now we only need to carefully setup routes to all desired direction for each VPC. And remember, if, for example VPC1 and VPC2 communication should be enabled, then route table of VPC1 should have route to VPC2 and vice a versa, route table of VPC2 should have route to VPC1. On diagram it looks like this:

RED — Attachment association.
GREEN—Traffic flow from transit VPC to private VPCs.
BLUE — Traffic flow from private VPCs to transit VPC.
BLACK — Traffic flow between private VPCs.

The architecture above is a solution to our initial example. By creating individual route tables for each attachment it is possible to blend any kind of of mesh and star-like sub-networks together.

Enough of the theory, let’s get our hands dirty and implement this.

Coding & testing

As usual, terraform is our automation buddy. The code structure will be following — we will create a VPC module, that will be used to deploy all 4 VPCs and on root level transit gateway and routing will be done.

First, the vpc module. I will try to keep it as small as possible, but still some code will be needed. Module will include:

VPC, private subnet and routes to transit gateway.
Transit gateway attachment, route table and association.
VPC endpoints to access test instance via AWS SSM (without internet).
IAM role for test instance and the test instance itself.

This module is kind of generic and have some variable that will not be used right now, but we will need them during extension of the example. The test instances are going to be accessible via SSM service and they will have x.y.z.10 IP address assigned.

Now let’s use the module to create VPCs and also create the transit gateway.

At this stage we have VPCs, transit gateway, transit route table, all required attachments and associations ready. The last missing piece is transit routing. We will use some terraform sorcery to keep it short.

So this is it from terraforming side for now. We can run terraform apply to deploy it and start pinging around to see if the desired setup achieved.

First, let’s check if transit can reach private VPCs. Let’s go to AWS SSM console and start session with instance in transit VPC.

All 3 VPCs are reachable from transit. Now let’s check if VPC1 can reach VPC2 but not VPC3. Start SSM Session with test instance in VPC1.

This also works as expected. Transit VPC and VPC2 are reachable, but VPC3 is not. Also VPC2 and VPC3 are working as expected, you can verify it yourself, will not make this any longer by adding more screenshots.

Transit VPC as a Default Route

Now let’s go one step further, and discuss a situation where we want to have transit VPC as a default gateway for private VPCs. The reason for doing this is that we want to eventually setup shared NAT gateway in transit VPC and use it to provide internet access to private VPCs.

Let’s make the relevant changes in routing.tf file. We will need to change local.routes variable for that.

We have change dst attribute on lines 4, 8 and 12 to 0.0.0.0/0 . As soon as we apply these changes, we will notice that suddenly VPC1 and VPC2 can reach VPC3!

Let’s have a look at how this happens — assume traffic is coming from VPC1 with destination to VPC3. Because transit VPC is now default gateway, that traffic will be forwarded to transit VPC. When traffic enters to transit VPC the route table of associated subnet will be used to decide where it goes next. If that route table contains routes to both VPC1 and VPC3, VPC route table will happily send traffic back to transit gateway, and since transit VPC can reach all VPCs, traffic will be forwarded to VPC3, thus violating our routing constraints.

To solve this, we need to use two VPC route tables in transit VPC, and therefore two subnets. The diagram below shows how it can be constructed:

Traffic from private VPCs (BLUE) will end up in transit subnet route table, and since there is only local routes it can reach only IPs from transit VPC. On the other hand, if traffic is initiated from private subnet of transit VPC, it will go through private subnet route table and be forwarded to transit gateway (GREEN) and reach desired private VPC. This setup will ensure that cross-VPC traffic is controlled only by transit gateway routes. In fact, this is another use-case of solving security problem by introducing asymmetric routing in the setup.

Let’s code this into our example. To keep the code simple, we will add new subnet and related resources on root level. The transit subnet code is following:

And also add attch_subnet attribute to transit VPC module in order to switch associated subnet in transit gateway:

We can apply these changes now and make sure that routing works as it was before introducing default routes.

Traffic routing aspect seems to be covered, now let’s dig into shared services and discuss two AWS services that can be used in this context.

Shared NAT Gateway

One of the nice features of this setup is that we can centrally deploy some of the common services and make it available for private VPCs. If your private VPCs are not heavy internet users it makes sense to have shared NAT gateway for all of them.

Let’s make a rough estimate of costs. Depending on region (eu-central-1 in this case), NAT will cost you monthly around $37.44 + traffic costs of $0.052 per GB. On the other hand, transit gateway (under the assumption that you already have it, so only traffic price matters) will cost $0.02 per GB.

Based on these prices we can calculate the monthly traffic breakpoint, till which shared NAT is cheaper: (VPCCount — 1) * NATPricePerMonth / TransitPricePerGB = 2 * 37.44 / 0.02 = 3744 GB.

For our 3 VPC example, if all VPCs together are doing less than 3744 GB traffic to internet, it is cheaper to use shared NAT Gateway. In many use-cases your EC2 instances only need internet for downloading packages or updates, so they are not going to hit this limit anywhere soon. Moreover, many distribution have S3 backed repositories, so by enabling S3 VPC Endpoint your internet traffic will be drastically reduced. But of course, always make calculations for your particular setup.

The architecture of transit VPC will become like this:

The routes from public and private subnets are not drawn here to keep diagram simple. Just keep in mind that “VPCx CIDR” routes are pointing to transit gateway.

Now let’s include it in our example setup. This requires some additional resources, because we will need to add a public subnet, internet gateway, nat gateway and related routing to transit VPC and setup routes to private VPCs.

Apply new changes and boom! we have internet in private VPCs:

But unfortunately we have more than internet — since the NAT gateway will unconditionally NAT any traffic from any source to any destination, it creates a security breach in our system, making possible to reach VPC3 from VPC1 and VPC2.

Let’s make a tcpdump during ping on test instance in VPC3:

The source IP is 10.100.0.126, which in fact is the private IP address of the NAT gateway. We need to deal with this somehow and the best place were we can deny traffic is in network ACL of transit VPC, particularly incoming traffic with source IP 10.100.0.126. This will deny any possible internal NATing. Terraform code for that will be:

To recap, we’ve used transit VPC as default gateway of private VPCs, deployed shared NAT gateway which can provide some cost-saving and still kept the required isolation and cross-VPC reachability requirements.

Shared VPC Interface Endpoints

VPC interface endpoints are great way of making AWS services available within private VPCs. The way it works involves two components — internally deployed NLB and DNS overwriting for VPC. Suppose you want to enable endpoint for EC2 service. For your particular region that service is available under DNS name ec2.{region}.amazonaws.com. When endpoint for that service is enabled, you need to pick a set of subnets from each availability zone. In each of those subnets AWS will create an elastic network interface, which internally connected to some NLB that points to EC2 service. Any HTTPS traffic sent to those ENIs will be forwarded to actual EC2 service and by that your private VPC is able to access the service using solely internal IPs. But that’s only half of the story — in order for this to work transparently, your VPC’s AWS Provided DNS servers will resolve ec2.{region}.amazonaws.com to one of the endpoint ENI IP addresses.

How can we use centrally deploy interface endpoints and does it make sense at all? Let’s first understand “how” part— we can deploy all endpoints we need in transit VPC, but that will not be enough, because private VPCs will still resolve to public IP addresses of actual services. That can be solved by using Route53 Resolver service. We can create an outbound route53 resolver endpoint in transit VPC, create forwarding rules for each endpoint’s DNS name with target IP of transit VPC’s DNS resolver and attach those rules to private VPCs. On diagram it looks like this:

When the EC2 instance want’s to resolve for example ec2.eu-central-1.amazonaws.com, it will send DNS query to Amazon Provided DNS (10.100.1.2). Because we have route53 resolver rule for that domain, the DNS query will be forwarded via Route53 Endpoint to Amazon Provided DNS (10.100.0.2) in transit VPC. In transit VPC we have VPC Interface Endpoint enabled, so it will resolve to some private IP address belonging to transit VPC. And therefore EC2 will get that private IP address as a result (via the BLUE path). After that, it’s a matter of transit routing to access the service (the BLACK path).

Before implementing it, let’s understand the pricing aspect and when it makes sense to use (all prices are for Frankfurt region) — AWS will charge $0.012 per endpoint ENI per hour, which is approx. $8.64 per month for single availability zone. On the other hand route53 outbound resolver will cost $0.125 per hour per ENI and AWS requires at least 2 of them, so it is approx. $180 per month. We can estimate, how many private VPCs and endpoints will be cheaper with centralized setup using this expression k > (180 + n * 8.64) / (n * 8.64), where k is the number of VPCs and n is the number of interface endpoints. For example for our 3 endpoint case only after 8 private VPCs it is cheaper. But if we have 3 endpoints in 3 availability zones, it will be already cheaper starting from 4 private VPCs. Note, that this is only a rough estimate, traffic and DNS queries price are not included in the calculation.

This calculations are only valid in case if we want to use route53 resolver solely for interface endpoints centralization. In practice, if you have on-prem internal DNS system, you will most probably end up using route53 outbound resolver, so bringing interface endpoint into central location will be nice benefit and provide overall cost-reduction.

You need to be careful when centralizing endpoints for some services that have resource based policies, like API Gateway. Usually, private API Gateways have resource policy allowing traffic from specific VPC endpoint. By centralizing it, you will need to update those policies to allow traffic from endpoint in transit VPC, which might introduce security breach. So make sure to put additional restriction in policy and allow traffic only from CIDR blocks of you private VPC.

Let’s migrate our existing setup to use central endpoints for ec2messages, ssm and ssmmessages. We already have those endpoint deployed in our transit VPC. What we need now, is to setup route53 resolver and rules, attach them to private VPCs and remote existing endpoints from private VPCs.

By the code above, we will create endpoint, rules and attach them to all private VPCs. We can attach n*m rules very easily here, thanks to setproduct function of terraform 0.12.

And on a final step, let’s remove existing endpoints from private VPCs. We will do that, by setting endpoints attribute on vpc module to an empty list.

Lines 6, 14 and 22 were added. We can apply these changes now and let’s see if it works.

As shown above all 3 services are resolving to IP addresses from transit VPC and actually the fact that I was able to access the instances via SSM service means that it works.

Limits

Number of transit gateway route tables is 20, but can be increased.
Number of routes in transit route table is 10000 — it should be enough :)
Number of routes in subnet route table is 50, but can be increased to 1000.

Conclusion

The transit gateway introduces new range of possibilities for constructing networks in AWS, which previously was quite hard to accomplish and in some cases impossible. The asymmetricity of transitive routing is a very flexible tool but it’s also a fairly complex one, even slight mistakes might allow traffic to reach destinations that they were not initially supposed to (as we’ve seen in default route discussion). The two relatively non-trivial examples of shared services give a feeling of how far reaching the new possibilities are and how using only AWS services, architectures can be optimized to provide cost-reduction (at least for some use-cases :)).

The examples were built on single AWS account, but they can be easily extended for multi-account scenarios using AWS RAM.

Thanks for reading.