AWS ECS container auto-scaling with Lambda and Cloudwatch rules

How we combined Step Scaling and Lambda functions to control the step scaling policy

Published in

THRON tech blog

9 min readJul 23, 2019

After discussing how to manage host auto-scaling, this article will focus on how we manage AWS ECS containers auto-scaling .

There are several container types in ECS:

Task: started either manually or with a schedule, they are mostly useful for “one shot tasks”, they don’t need to scale.
Daemon: container that will be launched for each host, useful to manage host monitoring, they don’t need to scale.
Service: a group of identical containers that respond to a balanced number of requests, DNS or Load balancer are used to balance requests, they are the focus of this article.

Our goals

Service containers are a group of containers that work on the same type of requests and need to share the load in a uniform way. As for most SaaS applications, our daily load profile changes a lot and we want to:

Improve resiliency. Adjusting Service count provides a better reliability by delivering optimal performance to each request regardless of the load.
Lower costs. When load falls, there will be more resources than needed, it’s important to be able to quickly lower the service-container count to avoid wasting resources until the next load peak is received.
Reduce maintenance effort. Having a fully automated way of adjusting Service availability both for load management and failure management allows to greatly reduce efforts of on-call engineers and focus engineering resources in high-value tasks.

When should my Service count change?

We have two different Service container cases to manage:

CPU-intensive. Some tasks require more CPU as requests increase. The ideal outcome would be to have a constantly high average usage: this would ensure CPU won’t be wasted for all tasks.
Consumers. Consuming tasks from a queue is a frequent need on a SaaS architecture, the ideal outcome would be to keep the queue almost empty all the times, increasing workers when consumption rate is lower than task creation rate: this would ensure the maximum efficiency in performing actions (minimum wait time for the end user).

Out-of-the-box AWS Service autoscaling

AWS provides tools to manage container autoscaling. First you need to create a Scalable Target, it defines which resources we are going to scale (ECS Services in this case) and its limits (minimum and maximum containers count). You then link the Scaling Policy to the resource.

AWS provides two different scaling policies: Target Tracking e Step Scaling. Target tracking basically adds and removes containers to keep a target load metric value and it works as expected. You provide the metric and the target value, AWS will automatically scale containers in order to keep the metric value as close as possible to the provided target value, metric can be any Cloudwatch metric, including the custom defined ones.

For CPU intensive Services, the metric ECSServiceAverageCPUUtilization is very good.
Scale out — When CPU surpasses the target value for more than 3 consecutive minutes, it adds one or more containers depending on how much the target value is surpassed.
Scale in — When CPU stays below 90% of target value for more than 15 consecutive minutes it removes one or more containers based on the difference between the average CPU usage and the target value.
The only parameters you can work with are the metric, the target value and the cooldown period for scale in/out.

For Worker Services we tried using Target Tracking with the metric ActivityScheduleTime (AWS/States namespace) but it showed issues on the scale in, mostly originated by the metric and not by the scaling policy. The issue is caused by the metric having sometimes “holes”: when data is not available the metric doesn’t automatically fill with “Zero”, it leaves the metric with no data instead. This behaviour is not supported, as documented (4th bullet point).

Using Step Scaling instead of Target Tracking

Step scaling doesn’t automatically create Cloudwatch alarms so it requires more effort but it also provides more flexibility because of that. We tried applying Step Scaling to solve our “worker service” scaling needs.

By manually creating alarms we can customise when scale out/scale in should be performed instead of the fixed 3/15 minutes of the Target Tracking. It also allows us to manage the alarm behaviour when data is not available: should the metric be considered over the threshold or under the threshold in such event? .

Thanks to this setting we can trigger alarms also when queue is empty (which provided empty values in the standard metric), we can attach Step Scaling policy to those alarms and add or remove containers based on the metric value. On Step Scaling you can define “step size” to quickly perform scale in and scale out operations, the only limitation is the number of steps (20).

Step Scaling policy exaples — Sample step scaling policy applied to scale out

When reducing the cluster size we would prefer to have a “slower” approach, in order to be more prepared for the upcoming load increase. We can’t just set the option to remove 1 container only because once the alarm triggers it will invoke just 1 scaling action, then stops. The alarm doesn’t change its state and this won’t trigger any more scaling actions.

Slowing scaling down, lambda functions to the rescue

We would love to have the Simple Scaling policy (available on hosts) for Service containers too: it adds or removes one host, waits for a cooldown period and then re-evaluate the state and starts all over without having the alarm to change state.

Here we describe how we achieved a similar result by using the AWS swiss knife: Lambda Functions.

We configured Cloudwatch alarms to queue an event to a SNS topic, this will trigger a Lambda Function. The Lambda function won’t have any useful input in this scenario, it only know which alarm has been triggered. In order to keep the Lambda function as stateless as possible we didn’t store a map linking alarms to services so we embedded in the alarm the information we needed.

Unfortunately AWS doesn’t allow to embed into the alarm further data, so we had to use the alarm description (string) to inject a JSON object that contains all the required information for the Lambda function to execute.

we are almost there, there’s still an issue: how do we trigger the lambda execution again, after the scale in has been performed ?

Lambda functions are pure serverless execution, you will be charged by the execution time: having active wait times is not smart and there’s an execution time limit (15 minutes) to be aware of.

To overcome this obstacle we configured Cloudwatch rules to reschedule the Lambda execution.
For each Service we create a Cloudwatch rule, disabled by default.
When the Lambda function removes a container from the Service, it will check if the minimum container count has been reached for the given Service, if it has not been reached, it updates the Cloudwatch rule, enabling and scheduling it. The rule will trigger the Lambda again after a few minutes cooldown defined through cron expression. Cooldown value and which rule to use are part of the JSON object in the alarm description.

The rule specifies the correct target (Lambda) and the data to send (the JSON description) and everything works fine because JSON input for Lambda is supported out-of-the-box without involving black magic :)

Lambda executes scale in, sets the Cloudwatch rule, the Cloudwatch rule invokes the Lambda again and so on… this will stop when Lambda detects that the minimum count of containers for the given Service has been met or when the alarm that triggered the process is not in the ALARM state anymore. When one of those two end conditions is met, the Lambda function will disable the Cloudwatch rule.

JSON alarm description contains:

ECS Service name
Cloudwatch rule name
Alarm name (used in the rule to check whether the alarm that triggered the cycle is still in alarm state or not)
Cooldown delay between scale in actions
How many containers to remove or add (step size)

The last parameter is needed to use the same Lambda function for both scale in/out and other scaling needs that might follow the same pattern.

Target Tracking and Step Scaling, pros and cons

Target Tracking scaling policy is easy to setup, works fine but it provides several limitations:

No control over the “speed” of scale in/out actions (fixed delay times between scale in/out operations, 15 and 3 minutes respectively);
No scale-in threshold customization (fixed at 90% of scale out threshold);
Can’t use it with any metric (some of them have no value at some times and this is not supported);

Step Scaling doesn’t have such limitations and is a better choice in our opinion. We miss a “Simple Scaling Policy” for Services, AWS are you planning to close this gap ? :)

One more thing

When configuring Scalable Target you can specify Scheduled Actions: those will trigger actions at given times, given rates or a cron expression schedule. On those actions you can update scaling limits, which is somewhat limiting but still very powerful and it enables a whole set of new usages for this feature.

You could, for example, leverage your prior knowledge about peak load hours to trigger container scaling before the actual usage increases or, as we did, leverage this option to reduce the development environment to minimum size outside office hours if you have most developers on the same timezone.

Let’s now consider the option of removing the development environment on specific company closure times: by setting minimum and maximum size to 0, Services will be removed, freeing space on the hosts which, thanks to the scaling method described on the first article, will remove all hosts from cluster with a cost saving and no manual intervention at all.

Conclusions

Combining Host and Service autoscaling allowed us to create a system that reacts automatically to any load change and also enabled us to implement cost saving related policies within the same system.

Service container count compared with request count: we consider this a very good result

In the following diagram you can see the average CPU consumption before (orange) and after (blue) we enabled the Service autoscaling.

Average CPU on service, before and after autoscaling: we achieved our goal of stabilising average CPU usage

Before enabling Service autoscaling we had a variable CPU usage that was also causing some performance or reliability issues on edge cases, this was caused by Service using CPU allocated to other Services when starving. After enabling Service autoscaling, the CPU usage is very stable and under control, allowing us to prevent overbooking of CPU resources.

Further improvements

ActivityScheduleTime is updated only when an Activity goes from queued to being worked, this causes a problem when tasks are very long because the metric won’t have new data thus slowing the whole scaling down: we’d like to replace that metric with one managed by ourselves and generated by our monitoring system;
Container start time has a big impact on overall efficiency of the scaling system, a slow container will create a “lag” between load and system reaction time, for this reason we want to refactor legacy containers to break them down in smaller pieces and increase system reaction times;
Scaling Policy have limits, all of them can just read one metric value only so if you need to combine multiple metric values in your scaling strategy you need to create new metrics that implement the combined logic.

This is the solution we came up with, given the time constraints and the overall requirements we have. Would you have done it in a different way ? Is there a better way to manage the slow scale-in in your opinion? Please let us know in the comments below.