How To Talk To Yourself Without Charging By The Hour

Jeff Burka
Singularity
Published in
4 min readSep 26, 2023

The Problem

At Singularity, we run a microservice architecture in AWS — many small dedicated services running in isolated containers on ECS, unaware of each others’ existence. All of these services are running in private subnets with no direct access to the internet, as required by SOC 2 and standard security best practices.

We host a public API (documented here if you’re interested), which of course needs to be available over the public internet so users can access it. Many of our internal services use this API as well. So, these isolated internal services, in a private subnet with no internet access, need to communicate with our equally isolated API servers. We did the straightforward thing that AWS recommends and set up a NAT Gateway, so that our private subnets could communicate with our public API at api.singularity.energy.

Unfortunately, AWS’s NAT Gateway pricing is famously borderline extortionate. Not only do you pay $0.045 per hour (~$33/month) for the privilege of having the gateway, but you pay an additional $0.045 per GB of data processed. This adds up more quickly than you’d imagine, and it scales linearly with the amount of traffic — not ideal for a growing software company looking for economies of scale.

In addition to this pricing issue, taking a round trip through the internet in this scenario adds significant unnecessary complexity. Requests have to be encrypted and decrypted for HTTPS, and potentially hop through numerous network layers just to arrive back at the very same VPC (maybe even the same exact data center).

The Solution

Easy: don’t use the NAT Gateway.

We just want two different services inside the same private network to be able to talk to each other. To do that, we created a private hosted zone in Route 53, like this:

A screenshot of the AWS Route 53 console, creating a new hosted zone with name my-private-zone.example.com and “Private hosted zone” checked
resource aws_route53_zone private_hosted_zone {
name = "my-private-zone.example.com"

vpc {
vpc_id = ...
}
}

In this example, my-private-zone.example.com is a URL that can be resolved by services within your VPC (and only within, that’s why it’s private). Right now nothing is hosted at this URL, but we can fix that with an additional internal load balancer:

resource aws_lb ecs-internal-load-balancer {
name = "my-internal-load-balancer"
internal = true
load_balancer_type = "application"
...
}

We can associate any of our services with this internal load balancer. Here’s how you might do that with some abridged Terraform:


resource aws_ecs_service my_ecs_service {
name = "my-ecs-service"
cluster = var.cluster
task_definition = aws_ecs_task_definition.my_ecs_service.arn

network_configuration {
subnets = var.subnets
security_groups = [var.security_group,]
assign_public_ip = false
}

load_balancer {
target_group_arn = aws_lb_target_group.my_ecs_service_target_group.arn
...
}
...
}


resource aws_lb_target_group my_ecs_service_target_group {
name = "my-ecs-service-internal"
port = 80
protocol = "HTTP"
protocol_version = "HTTP1"
vpc_id = var.vpc_id
target_type = "ip"
...
}

resource aws_lb_listener_rule internal_listener_rule {
listener_arn = var.internal_load_balancer_listener_arn
priority = 10

action {
type = "forward"
target_group_arn = aws_lb_target_group.my_ecs_service_target_group.arn
}

condition {
host_header {
values = ...
}
}
}

Now, any service in our VPC can make requests to endpoints hosted on my-private-zone.example.com, and they’ll be routed through this internal load balancer straight to the desired ECS service. Requests won’t use the public internet or public IP addresses, everything will be contained within the VPC. That means nothing goes through the NAT Gateway, saving money and time on every request.

Bonus Gotcha

We’ve got our microservices talking to each other as efficiently as possible, so we’re good, right? No, we missed one small detail: how does our code get to these isolated private containers in the first place?

We use a common pattern — build a Docker image, upload it to ECR, and deploy ECS task definitions that pull that image from ECR and run it. Unfortunately, those images are hosted at something like 123456789.dkr.ecr.us-east-1.amazonaws.com/my-image, which eagle-eyed readers may notice is on the public internet. That means every single time we deploy an ECS service, and periodically thereafter, each task is pulling a sizeable image from amazonaws.com, through the NAT Gateway, and into a private subnet. In our case these are half-gigabyte images, but for some common use cases they could be much larger.

The easiest way to alleviate this backdoor charge is with a VPC Endpoint, which allows VPC services to privately connect to AWS services like ECR for less than a quarter the price of a NAT Gateway. Less than a quarter is still more than zero though — more zealous cost-cutters may opt to host their own Docker registry within the VPC, or slim down their images, or something else.

Thoughts? Ideas? Did I say something dumb? We’re hiring! Send a resume over to jobs@singularity.energy and come work here so you can fix it yourself.

--

--