Beat The Burst: Optimizing AWS ALB and ECS Fargate for Sudden Traffic Spikes— Part 1

Oleksandr Hanhaliuk
3 min readApr 18, 2024

AWS ECS Fargate is a serverless tool for managing your containers without the need to manage infrastructure. It is a very nice combination of configurable infrastructure that allows you to choose various CPU and memory numbers, provides robust autoscaling, and requires almost zero knowledge of DevOps.

However, in high-traffic systems with high bursts of traffic, autoscaling of Fargate can be slow, and this might lead to overloading of existing containers.

Problem

Imagine the following scenario:

  • You have an Application Load Balancer with a target group which targets to ECS service
  • ECS service has Fargate, which serves customers with a specific response
  • Fargate tasks contain an application that consumes a lot of resources (CPU and memory).
  • You have autoscaling based on CPU or connection per target or Task memory usage
  • When traffic suddenly spikes, the CloudWatch Alarm triggers Autoscaling.

Seems like a very common scenario for web applications, right? Nothing wrong here.

Hoverer when CloudWatch alarm triggers scaling to time Fargate task scaled and become healthy can be few minutes.

During this time, your existing tasks might be overloaded, which can lead to 503 responses to customers.

We don’t want that, right?

Solution

There are multiple solutions to this problem. Some of them we will cover in the next articles. In this article, we will cover the solution that allows traffic to be offloaded to Lambda during traffic spikes.

Diagram

Steps:

  1. Create Lambda, which will serve traffic and have the same business as your Fargate container
  2. Create an additional target group and point it to the Lambda

3. Add a target group to your Application Load Balancer listener:

EC2 -> Load balancers -> YourALB -> HTTPS:443 listener -> Edit listener -> Add target group

4. Create CloudWatch metrics which you want to rely on for traffic offload. It can be the following:

  • Number of connections per target
  • Average target response time
  • ECS metrics, like CPU, Memory usage, etc.

Try to configure metrics that predict traffic spikes as early as possible. For example, if the number of connections per target is increasing, we can suspect that the traffic spike is close.

5. Trigger Lambda on Cloud watch alarm

Previously, you would need to send an SNS message from an Alarm and trigger Lambda from SNS. However, recently, AWS introduced a new feature that allows you to trigger Lambda from alarms directly.

6. In your Lambda, write an application that updates the ALB listener weight using the AWS SDK or API (see docs). Add some traffic weight to the target group, which is configured in Step 1.

This will offload traffic from Fargate and give existing tasks some space until new tasks are Autoscaled.

7. Additionally, you can control Fargate autoscaling from this same Lambda in order to avoid conflicts with default scaling metrics. To do this, you can use SDK:

    
import {
ECSClient,
UpdateServiceCommand,
DescribeServicesCommand,
DescribeServicesCommandInput,
} from '@aws-sdk/client-ecs'


const describeServiceCommand = new DescribeServicesCommand({
services: [ecsService],
cluster: ecsCluster,
})
const serviceResponse = await escClient.send(describeServiceCommand)

const desiredCount = serviceResponse.services[0].desiredCount

cosnt scalingUpDesiredCount = desiredCount + ${scalingSize}
cosnt scalingDownDesiredCount = desiredCount + ${scalingSize}

const updateParams = {
service: ecsService,
cluster: ecsCluster,
desiredCount: scalingUpDesiredCount // or scalingDownDesiredCount,
}

const updateServiceCommand = new UpdateServiceCommand(updateParams)

await escClient.send(updateServiceCommand)

Summary

By setting up an extra target group for Lambda and adjusting traffic weights, we can keep things running smoothly until Fargate catches up. However, there remains a possibility of encountering 503 errors if our CloudWatch metrics fail to detect the traffic surge promptly.

In the next article, we will investigate how to use Lambda as a fallback for ECS Fargate

--

--