Securing an EC2 service behind API Gateway on AWS

I was aiming for managed AWS services that could provide me with a secure way for API Gateway to access an API running on EC2, while keeping the API safe from everything else. Using a NAT Gateway costs too much for the time being. Using a private VPC with VPC links, my API instances could access and install software packages, but not reach remote docker images. Using a private subnet wth an Egress Only Internet Gateway only supports ipv6, and ECR doesn’t support ipv6 yet, so still no docker image downloads. The end result: a tiny NAT instance, until there’s enough traffic to warrant a managed NAT Gateway.

The goal

Here’s what I set out to achieve:

  1. My backend API should be secured in either:
    - a private VPC, or at the very least;
    - a private subnet
  2. My API Gateway should be the only way to use my backend API
  3. Added cost should be < $10/mo

Seem like a fairly simple ask?

It begins

The first problem that I found was that the API needed to be publicly accessible for API gateway to HTTP-proxy to it (see Can Amazon API Gateway work within an Amazon VPC?). This is because you can’t use a CIDR to lock a subnet down to API Gateway, likely because it can be a CloudFront distribution. I’m not keen on it being publicly accessible, so I looked into what else I can do to secure it.

A private VPC with VPC Links

The solution that quickly came up was to use a VPC link from API Gateway to the VPC — and what’s more, maybe the VPC can be private with no public subnets. So I set about getting that done. This requires a Network Load Balancer (NLB) for the EC2 instances, and a link between the NLB in the VPC and API Gateway. I’m already paying for a load balancer, so the $18/mo isn’t a factor, and in fact, all of the solutions that I discuss use a NLB.

When my first EC2 instance in the private VPC started I remembered that when the API instances start, they install a bunch of packages from yum, upgrade the awscli using pip, and download a docker image from ECS — and all of that requires some kind of Internet access. I looked into what I could do to make that work on a private VPC, and found that it’s possible to set up a free, gateway type VPC link to the S3 bucket containing the yum and pip repos, and that works great, simply by adding the following policy as a part of the VPC link:

Statement:
- Sid: Amazon Linux AMI Repository Access
Principal: "*"
Action:
- s3:GetObject
Effect: Allow
Resource:
- arn:aws:s3:::packages.*.amazonaws.com/*
- arn:aws:s3:::repo.*.amazonaws.com/*

However, no such thing exists for accessing ECR, so logging in and downloading docker images won’t work.

Image for post

Edit: A colleague pointed out that it would be possible to make this work by downloading the image as an S3 tarball. The process would be:

  • Add a S3 bucket to hold the docker image
  • Add the bucket to the VPC gateway link policy
  • As a part of the CI, copy the latest docker image to s3
  • Use docker import to get the image

This works if you require no other interaction with the outside world. In my case I do need a little more (NewRelic etc), so…

On to plan B.

A private subnet in a public VPC

Using this approach, I figured I would be able to easily set up a NAT gateway so that my API instances would have internet access. Well, it is easy, but it costs around $35/mo just for the NAT gateway, so while it’s a great option if you don’t need to be frugal, that’s a last resort. The other options I considered were:

Note that I’m not just adding a standard Internet Gateway to my private subnet. Many forums have “answers” that point to this, and while it will work, that is in fact the very thing that makes your subnet no longer private.

The Egress Only Internet Gateway

Egress Only Internet Gateways are free, but whereas NAT Gateways only support ipv4, EOIGs only support ipv6. I decided to see how far I could get with that anyway, since I’d prefer a managed AWS service to an instance that I have to manage myself. Obviously this will mean that any resources that I request from my API instance will have to come from services that support ipv6.

Image for post

I added a EOIG and hooked it up to the route table for the private subnet, checked that I had ipv6 in there and added ::/0 (all ipv6 destinations) to the outbound ACL. I started a new instance with an ipv6 address from my pool, logged in via a bastion host that I created in my public subnet, and tested the ipv6 egress gateway with ping6 google.com and traceroute6 google.com. BOOM. It works, great.

Note: this is how easily it happened when I set it up via the console, but see my note on CloudFormation below. I’ll never get that time back.

I ran the startup script for the instance and it successfully installed the required packages from the yum and pip repositories via the S3 link that I had configured earlier. But when it got to the docker part, it stopped working. BAH! This, I found was because the service that AWS provide for ECR still doesn’t support ipv6. So the EOIG option was also a dead end. Regardless, it might be useful in the future, so the full CloudFormation template for this VPC is available on Gist.

General support for ipv6 is only relatively new on AWS, so let’s hope it expands to their own services soon.

A note about CloudFormation

All of this took my a lot longer to figure out than it might seem by reading this post, and that’s partly because when I develop infrastructure, I tend to do it in CloudFormation from the beginning, so that I can keep track of all of my changes.

Along the way I usually have to figure a few things out, and this time one of the challenges was creating subnets from the VPC’s ipv6 CIDR block, which mostly gets done for you via the console but in CloudFormation you need to generate them yourself. Again, a neat example of it couldn’t be found anywhere, so here it is:

MySubnet:
Type: AWS::EC2::Subnet
Properties:
CidrBlock: 172.31.0.0/20
Ipv6CidrBlock:
Fn::Sub:
— “${VpcPart}${SubnetPart}”
— SubnetPart: ‘01::/64’
VpcPart: !Select [ 0, !Split [ ‘00::/56’, !Select [ 0, !GetAtt MyVPC.Ipv6CidrBlocks ]]]

What made this process exceptionally difficult though, was that there seems to be a bug in CloudFormation that doesn’t set up routing to EOIGs properly, so I spent ages looking at subnet route tables, ACLs, route tables on started instances, all of which looked correct but wouldn’t work, only to find that when I clicked up all of the same settings in the console, it worked first go.

New AWS features tend to have incomplete or incorrect documentation sprawled out across the site, with little nuggets of information here, there and everywhere. Aside from that, half of the good stuff is in comments in the AWS forum or on Stack Overflow.

This is the second time in as many months that I’ve been stung by using a new AWS feature that’s not long out of the box, and having CloudFormation bugs ruin my day! Don’t get me started on hooking up Cognito User Pools with an API Gateway Authorizer (it’s fixed now).

Anyway, Plan C it is.

A NAT instance :(

I’m not very happy about it and to be honest a far better solution is to just use the managed NAT Gateway service, but at $35 a month that’s almost doubling my monthly AWS bill for this. AWS do provide an AMI ready to go (you can find it here under HVM (NAT) EBS-Backed 64 Bit), so that does make things quite a lot easier. Using this approach still requires the Network Load Balancer and VPC link, and I’m not sure if the new API Gateway VPC Link is a “Gateway” type (which is free), or an “interface” type (which costs around $8/mo), but either way, it’s cheaper.

The approach here is to simply start the NAT instance in the public subnet, and add an entry to the private subnet’s route table to point traffic to 0.0.0.0/0 to the instance’s private IP. It works and it’s easy, but it’s another instance to maintain and it doesn’t scale well.

Image for post

Other alternatives

  1. Bake an AMI with the docker image already on it: Yeah… nah.
  2. Use a client certificate: This would only provide service authorisation to the API from API Gateway, but wouldn’t prevent unwelcome activities, such as DDOS attacks.
  3. Use Lambdas instead of the EC2 API instances: The plan is to migrate to this, but it won’t happen overnight. It would make this a much easier problem to solve, given a lot of the pain here was around getting access to the docker repository.
  4. Put the instances in a public subnet, and use a security group to prevent access from outside: This is flimsy when it comes to security.

Wrap

It’s still a challenge to have instances in a private subnet get egress access to resources such as docker images and software packages. While NAT gateways are great for this and easy to set up, they are costly. As the world moves ever closer to supporting ipv6, I’m sure that AWS services such as the Elastic Container Registry will start supporting it too, which will make using Egress Only Internet Gateways more useable.

Like everything on AWS, every decision is a trade-off between how resilient your systems are, and how much they cost. This effort helped me understand the early days of ipv6 support on AWS, and I really look forward to the near future when it’s supported more widely.

For now, I’m sticking with a tiny NAT instance, but I’ll switch over to a NAT Gateway when I need to support more traffic.

Written by

Tech Strategy | Cloud Architecture | Systems of Work | Lean Enterprise

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store