Instance Auto Scaling Using AWS Capacity Providers

Incorporating capacity providers into your ECS architecture using CDK

Annie Holladay
YipitData Engineering
6 min readOct 4, 2022

--

Background

How do you determine the number of EC2 instances you need to run an application? At YipitData, we have been handling this calculation ourselves using our own tooling. This calculation adds complexity to our system, is error prone, and limits our deployment flexibility. We are migrating to use AWS Capacity Providers, which automatically manages EC2 instance auto scaling.

Architecture Overview

We use AWS ECS clusters with EC2 instances that are provisioned using Auto Scaling Groups. ECS clusters manage your containers and Auto Scaling Groups (ASGs) manage the EC2 instances; instances in a given ASG are registered to a cluster so that the cluster can use the ASG’s instances to run its containers.

For EC2-backed ECS Clusters, there are two types of auto scaling: ECS Service auto scaling and EC2 instance auto scaling.

  • ECS Service auto scaling is the ability to increase or decrease the desired number of containers (i.e. tasks) automatically based on a metric, for example based on the number of requests a web application gets
  • EC2 instance auto scaling is the ability to increase or decrease the number of EC2 instances available to your cluster based on the number of desired containers

Limitations of our Current Architecture

In our current architecture, we do not use service auto scaling or instance auto scaling. The ASG simply maintains a specified number of instances. Application owners initiate a change in the number of desired tasks. Our platform manages the number of instances by calculating the desired number of instances based on the size and number of tasks that the application owner has specified.

Besides simply being an added layer of complexity in our architecture, this methodology limits our deployment options. Because we have a set number of instances, we cannot increase the instances during a deployment for more reliability. Therefore, we have to deploy by stopping some or all of our current tasks to replace them with new ones because we would not have the temporary extra capacity to maintain our current tasks while we spin up new ones.

What are Capacity Providers?

The main goal of a capacity provider is to determine how many instances are required to serve the desired number of tasks and scale the number of instances to match that value. Thus a capacity provider is the interface that links your Amazon ECS cluster with your Auto Scaling Group. Each capacity provider is associated with an ASG. Each cluster can use multiple capacity providers to spread tasks across different ASGs using capacity provider strategies. This allows one cluster to use Auto Scaling Groups in different availability zones or use different types of infrastructure: EC2 on demand, EC2 Spot, Fargate, Fargate Spot.

How do Capacity Providers work?

Determining the Desired Number of Instances

The capacity provider will calculate the lower bound on the number of instances required based on the number of tasks, taking into account vCPU, memory, ENI, ports, and GPUs of the tasks and the instances. Then the capacity provider will determine the new desired number of instances based on this lower bound, as well as other constraints such as the minimum & maximum scaling step size, etc.

Scaling to Meet the Desired Number of Instances

ECS manages the scale-in and scale-out actions via a target tracking scaling policy. When you create a Capacity Provider, ECS creates a new CloudWatch metric which tracks the Capacity Provider Reservation. The capacity provider reservation is the ratio of (the desired number of instances / the number of instances that are currently running) times 100. The cloudwatch metric monitors this ratio and the alarm triggers when it is not 100, i.e. when the number of instances you want does not equal the number of instances you have. This alarm triggers a scaling action to either add or remove instances.

Scaling Out & In

When you redeploy your service, ECS will attempt to place the desired number of tasks, and then recalculate the desired number of instances. If there are tasks that cannot be placed because there are not sufficient resources on the available instances, those tasks will be in the “Provisioning” state. Because there are tasks in the “Provisioning” state, the desired number of instances will increase when it is recalculated, the Cloudwatch target tracking alarm will trigger, and new instances will be added.

If there are no tasks in a “Provisioning” state, ECS has successfully placed the desired number of tasks. Later on, the desired number of instances will be recalculated. Since all tasks have been successfully placed, the desired number of instances may be the same or lower. If it is lower, after 15 minutes of consecutive metric values showing a lower number of desired instances than current running instances, the Cloudwatch target tracking alarm will trigger and instances will be terminated when possible.

If you are using enable_managed_termination_protection on your capacity provider and new_instances_protected_from_scale_in on your Auto Scaling Group, an instance will only be terminated if there are no non-daemon tasks running on the instance. If you are using a binpack placement strategy for your service, ECS should place and terminate your tasks efficiently, allowing you to scale in when possible.

Our New Architecture Using Capacity Providers

During this migration, we would like to maintain our existing architecture, but replace our manual instance scaling with capacity providers. So we will be using 1 capacity provider and 1 ASG to provision the infrastructure for an ECS cluster.

CDK Implementation

We first define our ECS cluster and our Auto Scaling Group. You should consider the best value for the max_capacity. The capacity provider will only scale out up to this max_capacity, so you can use it as a safety break. AWS recommends not setting the desired_capacity of the Auto Scaling group. The desired_capacity is the initial number of instances in the ASG, however if this is set, every deployment will reset the amount of instances to this number. The default behavior when desired_capacity is not set is to not reset the number of instances to the minimum capacity with each deployment.

Next we define our capacity provider and attach it to our cluster. You should consider the best value for the target_capacity_percent. This value is referring to the Capacity Provider Reservation metric that the target tracking policy monitors, so you can adjust this value if you want to keep your cluster over or under provisioned. The default value is 100, meaning you want the desired number of instances to equal the running number of instances. By setting this value less than 100, you can enable spare capacity and vice versa. Additionally, if you are using any service that will need to access the EC2 metadata endpoint, such as Datadog tracing, you will need to use can_containers_access_instance_role=True in your AsgCapacityProvider, as well as the call to add_asg_capacity_provider (due to this bug).

Lastly, we define our Service*. The service desired_count is the desired number of containers. We only have one capacity provider, so we will only have one capacity provider strategy. The weight value designates the relative percentage of the total number of tasks that should use the specified capacity provider, so at least one capacity provider must have a non-zero weight value. We also use a binpack by CPU placement strategy since workload is CPU bound.

*Note: you cannot use ecs_patterns.ApplicationLoadBalancedEc2Service until AWS fixes this issue

Final Thoughts and Learnings

Adding capacity providers has simplified our architecture immensely and given us more flexibility in our deployment strategies. We have already begun to use a rolling deployment strategy instead of in-place deployments. Additionally, the implementation of capacity providers using CDK revealed that although CDK’s L3 constructs seem quite useful, you will often still need to use the L2 constructs because the patterns are too opinionated or not fully developed.

--

--