The Big Problem with AWS Fargate and Auto Scaling

Michael Bamesberger
Ancestry Product & Technology
5 min readMar 16, 2021

Software developers like to find ways to do more with less code.

That’s part of the reason why my team at Ancestry has been looking at AWS ECS with Fargate to run containers for many of our applications in the cloud. Fargate handles a bunch of things we used to have to worry about, like provisioning and scaling instances, which means there’s less infrastructure code to write and maintain.

Yet, in our experiments with ECS and Fargate, there’s one aspect that’s caused a bit of a headache: scaling tasks.

I read that scaling tasks with Target Tracking was a breeze. It works like a thermostat — you set a target CPU or memory usage for your service, and ECS will scale tasks up or down to maintain that metric. There’s no need to think about the underlying EC2 instances at all.

The tasks my team runs have predictable memory and CPU usage. Instead, we want to scale tasks based on the number of samples that have arrived in a queue for processing. And this is where things get tricky.

Here’s what we’ve attempted while using AWS’s serverless architecture with auto scaling.

Long-Running Tasks

When Ancestry receives sequenced DNA from our lab, we use AWS SQS to queue the samples while they await processing through our bioinformatic algorithms.

One such algorithm that matches a customer’s DNA to all others in our database can take anywhere from 5 to 30 minutes to execute, depending on the number of matches a customer has. The long-running nature of this process effectively ruled out our usage of AWS Lambda, which has a strict 15 minute execution limit.

If I wanted to use another serverless option, ECS with Fargate seemed like the way to go. I first needed to think about how I wanted this ECS cluster to consume messages from our queue, and specifically how it would scale tasks to handle the load of messages.

Ideally, ECS looks at how many samples are waiting in the queue, and then scales up EC2 instances and tasks from zero to handle them. When there are no more messages, the tasks and instances scale back down to zero, ensuring that we’re not spending money on resources when they aren’t in use.

Auto Scaling 101

Before we go any further, let’s quickly talk about the differences in auto scaling options between serverless and non-serverless ECS clusters.

Before AWS offered Fargate, you needed to configure EC2 instances for your ECS cluster using EC2 auto scaling, which scales EC2 instances in your cluster based on some metric tied to the load of your application.

If you configure an ECS cluster to use Fargate, however, the EC2 configuration is abstracted from you. So, you’ll need to configure Service Auto Scaling, which scales up and down the number of tasks in your ECS service based on a metric. The scaling of EC2 instances happens entirely behind the scenes.

Yet, Service Auto Scaling does not have built-in support for queue-based scaling. It does, however, allow you to use a custom Cloudwatch metric to configure scaling. If this metric was based on the number of messages in our queue, I thought this could be our solution.

The set up would look like this:

Every minute a lambda function will fetch the number of messages available in my queue, as well as the number of tasks currently running in my ECS cluster. Then, the function will publish to Cloudwatch the number of messages divided by the number of tasks running. AWS calls this metric the “backlog per instance”, or in our case, backlog_per_task.

  • For example, if I have 100 messages in my queue, and I have 2 tasks currently running, my backlog_per_task metric would be published as 50.

In Terraform, I’ll create a Target Tracking auto scaling policy that takes as input the custom metric I created, as well as a “target value” for my cluster to maintain.

  • For example, if I specified a target value of 5, and my custom metric was logging 50, my ECS cluster would scale up tasks that would consume messages from the queue. This would also bring the custom metric down.

When testing this set up, ECS successfully scaled tasks up and down in proportion to the number of messages in our queue. But it didn’t take us long to find a big problem with this configuration.

Protecting tasks from termination

Like I mentioned earlier, the tasks running in this ECS cluster can take anywhere from 5 to 30 minutes to complete. What happens when a scale-in event occurs when my tasks are still running? ECS could kill a task mid-execution, causing errors or forcing the message back to the queue, wasting resources. To my team, this was an unacceptable risk to take.

If I wasn’t using Fargate and spent the time configuring EC2 instances, I could prevent ECS from killing a long-running task during a scaling event by simply stopping the instance the task is running on from being terminated. AWS calls this “instance protection.

You can enable this instance protection via an API, or configure the relatively new type of EC2 auto scaling called Cluster Auto Scaling.

Cluster Auto Scaling abstracts some EC2 scaling configuration away from you much like Fargate does. In fact, Cluster Auto Scaling uses AWS’s “capacity provider” feature to manage scaling, just as Fargate does.

It offers “managed instance termination protection,” which automatically prevents EC2 instances that contain running tasks from being scaled in.

Since I’m using ECS with Fargate, I don’t have that sort of control over my EC2 instances, and there’s no way to protect my tasks from termination.

Maybe going serverless wasn’t the right choice for this specific use case. In the end, I decided to use ECS with EC2 Auto Scaling, and skip serverless architecture entirely.

Sometimes writing a little extra code is worth the extra control.

If you’re interested in joining Ancestry, we’re hiring! Feel free to check out our careers page for more info.

--

--