Setting up cron tasks on ECS & Fargate with VPC endpoints

Legalstart
Legalstart
Published in
8 min readJul 5, 2022

At Legalstart, in order to provide the best service to our customers, we are maintaining a lot of CRON tasks. Morever, as most of our applications are deployed using Docker containers, we took the infrastructure decision to use ECS tasks to run these cron jobs, scheduled using AWS EventBridge, a serverless event bus that can trigger ECS tasks.

EC2 versus Fargate for running our cron tasks

An AWS ECS cluster can be configured to use several capacity providers for the nodes, such as EC2-On Demand, EC2-Spot instance, Fargate-On Demand, Fargate-Spot instance.

Limitations of ECS with EC2

Originally, we were using EC2 instances to run hundreds of jobs, but faced some issues:

  • Memory issues:
    -
    We are running around 40 jobs every 5 to 10 minutes, but ECS was sometimes not able to clean the containers, images, and we had several memory issues, which could lead to many task failing or not even starting
    - Many tasks did not start in due time, mostly due to above memory issues
  • Cost issues: We were paying for instances 100% of the time while some tasks were running only at specific periods
  • Scaling issue: We had a fixed number of instances, that we could have scaled with autoscaling groups, but we didn’t do it at this time.

Solving those issues with Fargate

For those reasons, we decided to switch to AWS Fargate as our ECS capacity provider. Fargate is the serverless version of AWS EC2, which:

  • Prevent from managing and configuring virtual machines.
  • Easily scales as ECS creates one fargate per task instead of letting EC2 instances manage all tasks
  • Cost is based on the usage time. As our tasks do not run 100% of the time, we will only pay for the time we use instances

Very quickly, we were able to see major improvements in our infrastructure:

  • ✅ Less expensive in terms of computing instances
  • ✅ All our tasks are now consistent: the only errors we have on our cron tasks are related to application code issues.
  • ⚠️ However, the switch came with a price: we realized that our NAT Gateway costs increased drastically after the change.

The cons of Fargate when using a NAT Gateway

All our cron tasks are running on private subnets and thus reach the internet through a NAT gateway. A NAT Gateway is a Network Address Translation (NAT) service that allow servers from a private subnet inside a VPC to connect to services outside the VPC, without making the instance publicly accessible to the Internet. The billing of NAT Gateway is mostly based on the ingress and egress bytes transiting through the gateway. However, as Fargate is serverless and each task launched is independent from one another, it needs to pull a new docker image on our ECR registry for each cron task.

How ECS instances on a private subnet reach AWS Services by default

It means Fargate pulls around 40 docker images of 1 gb every 5 minutes, for at least 10 hours per day. See below the exploding NAT Gateway costs (EC2-Other category in AWS Cost explorer) in April, 2022: +150%!

EC2-other billing with default networking

Reducing NAT costs using VPC endpoints

Fortunately, there is a solution to enjoy Fargate without spending a lot of money on NAT gateway within ECS: VPC endpoints.

Introduction to VPC endpoints

VPC endpoints are AWS components that allow a VPC to connect to AWS services through private connections. In other words, VPC endpoints allow us to stop using public endpoints to call AWS service. This network isolation allows us to pull docker images from ECR without using the NAT gateway.

Reaching AWS Services with VPC endpoints

Sometimes, a picture is worth a thousand words, so here is the positive impact it had on our AWS Billing for the category EC2-others which is the pricing category that includes NAT Gateway billing (April 2022 vs June 2022):

NAT Gateway Costs

We fixed the extra costs related to NAT Gateway.

In addition, we can see that we pay less computing costs:

EC2 Computing Costs

However, we must not neglect the price of VPC endpoints.

The cost of VPC endpoints vs the cost of NAT Gateway

Note that VPC endpoints also have a considerable price. As of June, 2022:

  • One VPC endpoint costs 0.011$/hour, and as you will see in the next part, you need several VPC endpoints. For one VPC, assume you will pay something like 100$ per month.
  • A NAT Gateway will cost 0.05$ per GB processed. For example, a cron task that runs on a hourly basis, with a 1Gb docker image will cost, for 30 days: 24*30*1*0.05 = 36$. VPC endpoints will thus make you save money starting from 3 hourly cron tasks that use a 1Gb docker image.

So, when you have a lot of ingress in your private subnets from AWS services and you use Fargate, as we do, this is worth the price to have VPC endpoints. Nevertheless, we don’t really pay less after having Fargate + VPC endpoints versus having EC2 instances without VPC endpoints. Indeed:

  • With EC2 instances and no VPC endpoints, you don’t have much NAT Gateway usage
  • With Fargate and VPC endpoints, you avoid NAT Gateway usage and pay less computing costs, BUT you pay for VPC endpoints. In our experience, the final price is equivalent before and after Fargate and VPC endpoints.

See the final price when including VPC, computing costs and NAT Gateway costs. The price is very close between March and June 2022 😀 :

VPC + Computing + NAT Gateway costs

Now we:

  • ✅ Have more consistent cron tasks
  • ✅ Have less instances management to do
  • 👉 Have similar costs than before

At same cost, consistency and stability are still a huge improvement!

Implementing VPC endpoints in your VPC: The terraform code

Now, the fun part! You can find below the terraform code to implement VPC endpoints in your own AWS private cloud. Here, I am assuming that:

  • You already have a VPC with existing private subnets and route tables for each subnet
  • You are using ECS to run your tasks, within private subnets
  • You are using the ECR registry

Applying below code with the appropriate VPC, subnets and route tables is all you have to do to implement VPC endpoints in your own AWS infrastructure. The full repository for this code can be found at the end of the article.

Implement Security Groups

First, we implement a security group, with a single ingress rule on port 443: This is used by all internal addresses to communicate with the interfaces

resource "aws_security_group" "vpce" {
name = "vpce-sg"
vpc_id = YOUR_VPC_ID
ingress {
from_port = 443
to_port = 443
protocol = "tcp"
cidr_blocks = [YOUR_VPC_CIDR_BLOCK]
}
}

Implement VPC endpoint resources

Then, we need to implement several VPC endpoints resources:

The com.amazonaws.region.ecr.dkr and com.amazonaws.region.ecr.api Amazon ECR VPC endpoints: Amazon ECS tasks hosted on Fargate using platform version 1.4.0 or later require two endpoints.

resource "aws_vpc_endpoint" "ecr_endpoint" {
vpc_id = YOUR_VPC_ID
private_dns_enabled = true
service_name = "com.amazonaws.<YOUR-AWS-REGION>.ecr.dkr"
vpc_endpoint_type = "Interface"
security_group_ids = [
aws_security_group.vpce.id,
]
subnet_ids = LIST_OF_YOUR_PRIVATE_SUBNET_IDS
}
resource "aws_vpc_endpoint" "ecr_api_endpoint" {
vpc_id = YOUR_VPC_ID
private_dns_enabled = true
service_name = "com.amazonaws.<YOUR-AWS-REGION>.ecr.api"
vpc_endpoint_type = "Interface"
security_group_ids = [
aws_security_group.vpce.id,
]
subnet_ids = LIST_OF_YOUR_PRIVATE_SUBNET_IDS
}

A S3 gateway VPC endpoint: It allows accessing S3 through a gateway endpoint instead of the public endpoint. We need it because all ECR docker image layers are stored on S3. Note that for this resource, you need to provide the route table id for each of your private subnets. Indeed, we want the traffic from our VPC to S3 to be routed to the gateway endpoint. AWS thus update the specified route tables

resource "aws_vpc_endpoint" "s3" {
vpc_id = YOUR_VPC_ID
service_name = "com.amazonaws.<YOUR-AWS-REGION>.s3"
vpc_endpoint_type = "Gateway"
route_table_ids = PRIVATE_SUBNETS_ROUTE_TABLES
}

The Cloudwatch logs VPC endpoint: This one is only required if your tasks use the awslogs log driver to send logs to cloudwatch

resource "aws_vpc_endpoint" "logs" {
vpc_id = YOUR_VPC_ID
private_dns_enabled = true
service_name = "com.amazonaws.<YOUR-AWS-REGION>.logs"
vpc_endpoint_type = "Interface"
security_group_ids = [
aws_security_group.vpce.id,
]
subnet_ids = LIST_OF_YOUR_PRIVATE_SUBNET_IDS
}

The ECS endpoint: used by both the CLI, SDK, etc. as well as what the ECS agent itself uses to signal various state changes in tasks.

resource "aws_vpc_endpoint" "ecs_endpoint" {
vpc_id = YOUR_VPC_ID
private_dns_enabled = true
service_name = "com.amazonaws.<YOUR-AWS-REGION>.ecs"
vpc_endpoint_type = "Interface"
security_group_ids = [
aws_security_group.vpce.id,
]
subnet_ids = LIST_OF_YOUR_PRIVATE_SUBNET_IDS
}

ECS agent & ECS telemetry VPC endpoints: While AWS does not provide much information, the documentation suggests to also have these two endpoints. When using privateLink, The ECS agent uses these endpoints for orchestration activities. Without privateLink, the ECS agent would use external endpoints (by external endpoints, I mean endpoints outside the VPC).
Based on my researches, the ECS, ECS Agent and Telemetry endpoints are not required when you use Fargate, but are required when you use EC2 instances in your cluster.

resource "aws_vpc_endpoint" "ecs_agent" {
vpc_id = YOUR_VPC_ID
private_dns_enabled = true
service_name = "com.amazonaws.<YOUR-AWS-REGION>.ecs-agent"
vpc_endpoint_type = "Interface"
security_group_ids = [
aws_security_group.vpce.id,
]
subnet_ids = LIST_OF_YOUR_PRIVATE_SUBNET_IDS
}
resource "aws_vpc_endpoint" "ecs_telemetry" {
vpc_id = YOUR_VPC_ID
private_dns_enabled = true
service_name = "com.amazonaws.<YOUR-AWS-REGION>.ecs-telemetry"
vpc_endpoint_type = "Interface"
security_group_ids = [
aws_security_group.vpce.id,
]
subnet_ids = LIST_OF_YOUR_PRIVATE_SUBNET_IDS
}

That’s it! By replacing the VPC, subnets and route table ids by your own values, you have all the terraform code needed to implement VPC endpoints in your AWS VPC.

Conclusion

If you are running several cron tasks with AWS, it’s worth considering ECS with Fargate instances, instead of EC2 instances, to have more consistent results, less OPS responsabilities and cheaper computing costs. However, if your tasks are running in a private subnet, in order to avoid high NAT bills, use AWS VPC endpoints.

You probably DON’T need VPC endpoints if:

  • Your cron tasks are running of a public subnet
  • You only have a few cron tasks, since the cost of VPC endpoints would be higher than the cost of NAT instances
    – 1 hourly task * 1Gb Docker Image => 36$ which is lower than 100$
    – If you are below 3 hourly tasks with 1 Gb Docker Image => no need for VPC endpoints

Thanks for reading!
The code: https://github.com/legalstart/aws-vpc-endpoints-terraform

Nolwen Brosson

--

--

Legalstart
Legalstart

1st online legal services platform for #entrepreneurs!