I was aiming for managed AWS services that could provide me with a secure way for API Gateway to access an API running on EC2, while keeping the API safe from everything else. Using a NAT Gateway costs too much for the time being. Using a private VPC with VPC links, my API instances could access and install software packages, but not reach remote docker images. Using a private subnet wth an Egress Only Internet Gateway only supports ipv6, and ECR doesn’t support ipv6 yet, so still no docker image downloads. The end result: a tiny NAT instance, until there’s enough traffic to warrant a managed NAT Gateway.
Here’s what I set out to achieve:
- My backend API should be secured in either:
- a private VPC, or at the very least;
- a private subnet
- My API Gateway should be the only way to use my backend API
- Added cost should be < $10/mo
Seem like a fairly simple ask?
The first problem that I found was that the API needed to be publicly accessible for API gateway to HTTP-proxy to it (see Can Amazon API Gateway work within an Amazon VPC?). This is because you can’t use a CIDR to lock a subnet down to API Gateway, likely because it can be a CloudFront distribution. I’m not keen on it being publicly accessible, so I looked into what else I can do to secure it.
A private VPC with VPC Links
The solution that quickly came up was to use a VPC link from API Gateway to the VPC — and what’s more, maybe the VPC can be private with no public subnets. So I set about getting that done. This requires a Network Load Balancer (NLB) for the EC2 instances, and a link between the NLB in the VPC and API Gateway. I’m already paying for a load balancer, so the $18/mo isn’t a factor, and in fact, all of the solutions that I discuss use a NLB.
When my first EC2 instance in the private VPC started I remembered that when the API instances start, they install a bunch of packages from
yum, upgrade the
pip, and download a docker image from ECS — and all of that requires some kind of Internet access. I looked into what I could do to make that work on a private VPC, and found that it’s possible to set up a free, gateway type VPC link to the S3 bucket containing the
pip repos, and that works great, simply by adding the following policy as a part of the VPC link:
- Sid: Amazon Linux AMI Repository Access
However, no such thing exists for accessing ECR, so logging in and downloading docker images won’t work.
Edit: A colleague pointed out that it would be possible to make this work by downloading the image as an S3 tarball. The process would be:
- Add a S3 bucket to hold the docker image
- Add the bucket to the VPC gateway link policy
- As a part of the CI, copy the latest docker image to s3
docker importto get the image
This works if you require no other interaction with the outside world. In my case I do need a little more (NewRelic etc), so…
On to plan B.
A private subnet in a public VPC
Using this approach, I figured I would be able to easily set up a NAT gateway so that my API instances would have internet access. Well, it is easy, but it costs around $35/mo just for the NAT gateway, so while it’s a great option if you don’t need to be frugal, that’s a last resort. The other options I considered were:
Note that I’m not just adding a standard Internet Gateway to my private subnet. Many forums have “answers” that point to this, and while it will work, that is in fact the very thing that makes your subnet no longer private.
The Egress Only Internet Gateway
Egress Only Internet Gateways are free, but whereas NAT Gateways only support ipv4, EOIGs only support ipv6. I decided to see how far I could get with that anyway, since I’d prefer a managed AWS service to an instance that I have to manage myself. Obviously this will mean that any resources that I request from my API instance will have to come from services that support ipv6.
I added a EOIG and hooked it up to the route table for the private subnet, checked that I had ipv6 in there and added
::/0 (all ipv6 destinations) to the outbound ACL. I started a new instance with an ipv6 address from my pool, logged in via a bastion host that I created in my public subnet, and tested the ipv6 egress gateway with
ping6 google.com and
traceroute6 google.com. BOOM. It works, great.
Note: this is how easily it happened when I set it up via the console, but see my note on CloudFormation below. I’ll never get that time back.
I ran the startup script for the instance and it successfully installed the required packages from the
pip repositories via the S3 link that I had configured earlier. But when it got to the docker part, it stopped working. BAH! This, I found was because the service that AWS provide for ECR still doesn’t support ipv6. So the EOIG option was also a dead end. Regardless, it might be useful in the future, so the full CloudFormation template for this VPC is available on Gist.
General support for ipv6 is only relatively new on AWS, so let’s hope it expands to their own services soon.
A note about CloudFormation
All of this took my a lot longer to figure out than it might seem by reading this post, and that’s partly because when I develop infrastructure, I tend to do it in CloudFormation from the beginning, so that I can keep track of all of my changes.
Along the way I usually have to figure a few things out, and this time one of the challenges was creating subnets from the VPC’s ipv6 CIDR block, which mostly gets done for you via the console but in CloudFormation you need to generate them yourself. Again, a neat example of it couldn’t be found anywhere, so here it is:
— SubnetPart: ‘01::/64’
VpcPart: !Select [ 0, !Split [ ‘00::/56’, !Select [ 0, !GetAtt MyVPC.Ipv6CidrBlocks ]]]
What made this process exceptionally difficult though, was that there seems to be a bug in CloudFormation that doesn’t set up routing to EOIGs properly, so I spent ages looking at subnet route tables, ACLs, route tables on started instances, all of which looked correct but wouldn’t work, only to find that when I clicked up all of the same settings in the console, it worked first go.
New AWS features tend to have incomplete or incorrect documentation sprawled out across the site, with little nuggets of information here, there and everywhere. Aside from that, half of the good stuff is in comments in the AWS forum or on Stack Overflow.
This is the second time in as many months that I’ve been stung by using a new AWS feature that’s not long out of the box, and having CloudFormation bugs ruin my day! Don’t get me started on hooking up Cognito User Pools with an API Gateway Authorizer (it’s fixed now).
Anyway, Plan C it is.
A NAT instance :(
I’m not very happy about it and to be honest a far better solution is to just use the managed NAT Gateway service, but at $35 a month that’s almost doubling my monthly AWS bill for this. AWS do provide an AMI ready to go (you can find it here under
HVM (NAT) EBS-Backed 64 Bit), so that does make things quite a lot easier. Using this approach still requires the Network Load Balancer and VPC link, and I’m not sure if the new API Gateway VPC Link is a “Gateway” type (which is free), or an “interface” type (which costs around $8/mo), but either way, it’s cheaper.
The approach here is to simply start the NAT instance in the public subnet, and add an entry to the private subnet’s route table to point traffic to
0.0.0.0/0 to the instance’s private IP. It works and it’s easy, but it’s another instance to maintain and it doesn’t scale well.
- Bake an AMI with the docker image already on it: Yeah… nah.
- Use a client certificate: This would only provide service authorisation to the API from API Gateway, but wouldn’t prevent unwelcome activities, such as DDOS attacks.
- Use Lambdas instead of the EC2 API instances: The plan is to migrate to this, but it won’t happen overnight. It would make this a much easier problem to solve, given a lot of the pain here was around getting access to the docker repository.
- Put the instances in a public subnet, and use a security group to prevent access from outside: This is flimsy when it comes to security.
It’s still a challenge to have instances in a private subnet get egress access to resources such as docker images and software packages. While NAT gateways are great for this and easy to set up, they are costly. As the world moves ever closer to supporting ipv6, I’m sure that AWS services such as the Elastic Container Registry will start supporting it too, which will make using Egress Only Internet Gateways more useable.
Like everything on AWS, every decision is a trade-off between how resilient your systems are, and how much they cost. This effort helped me understand the early days of ipv6 support on AWS, and I really look forward to the near future when it’s supported more widely.
For now, I’m sticking with a tiny NAT instance, but I’ll switch over to a NAT Gateway when I need to support more traffic.