Auto Scaling Microservices on ECS
A guide to using Application Auto Scaling with the Elastic Container Service and CloudFormation
This article covers the topic of auto-scaling ECS services and serves as a follow-up to my previous article about deploying microservices on the ECS Fargate platform. In order to help you get started, a GitHub repository has been set up to accompany this article — it includes a sample web service as well as a CloudFormation stack for provisioning the underlying infrastructure. This guide starts off by covering the concept of auto-scaling and how it is implemented as a service in AWS, we then review the sample web service before bringing the concepts together in a working deployment. Basic AWS and ECS knowledge by the reader is recommended. If you’re new to AWS, please remember that some resources may incur charges, therefore ensure you have a billing alarm set up.
Application Auto Scaling
Arguably, one of the biggest advantages of migrating application workload to the cloud is the promise of paying only for the resources an application truly needs. This, coupled with the ability to automatically scale an application’s underlying resources allows teams to avoid under- or over-provisioning infrastructure. In this way, teams are not only avoiding over-paying for unnecessary resources but also ensuring their applications have the required capacity needed to handle virtually any workload.
There are various AWS application resources that could be configured to automatically scale. Such resources include DynamoDB tables, EMR clusters, and ECS services, among others. To configure automatic scaling of these resources AWS provides the aptly named Application Auto Scaling service. The diagram below depicts a particular use-case of this service — Service Auto Scaling:
Let’s have a closer look at the components at play.
The ECS service on the right side of the diagram is the actual resource registered with the auto scaling service as a scalable target. The auto scaling service will update the ECS service’s desired task count based on the particular scaling policy which we’ll describe below. The desired task count of this service is known as the scalable dimension. In the context of ECS, this is the only scaling property that exists. For other services such as DynamoDB, one could scale various dimensions such as read or write capacity of a particular table.
The following example CloudFormation snippet provisions a scalable target for an ECS service named
my-service in the cluster
In this example we set an upper limit of 5 desired tasks when scale-out events occur. You might want to use CloudFormation mapping to modify this upper value based on an environment parameter to ensure staying within a predefined budget. However, the running task count metric should be monitored to see if we’re constantly above the desired count and whether we need to increase the existing upper limit.
Once the scalable target has been registered, it should be possible to list it in with the following AWS CLI command:
$ aws application-autoscaling describe-scalable-targets --service-namespace ecs
The scaling policy defines when a scaling event — either scale in or scale out — should occur. There are several types of scaling policies available but we have chosen to implement the Target Tracking scaling policy type in our demo project. As the name suggests, this policy tracks a selected metric and monitors its current value in comparison to a selected target value. A common choice is to track the ECS Service CPU Utilization metric. As the CPU utilization passes a predefined target value, a scale-out event is triggered and the service desired task count is incremented. Conversely, when the CPU utilization drops below the target value, a scale-in event is triggered and the service desired task count is decremented.
As an example, the CloudFormation snippet below provisions a scaling policy that targets the ECS service CPU utilization metric at 75%.
To avoid scaling-out excessively a cooldown period could be specified. As the period is not required to be set explicitly, the configuration above defaults to a period of 5 minutes. A cooldown period follows a scale-out event, during which no additional scale-out events should occur (CloudWatch alarms associated with the scaling policy would be ignored). Conversely, a separate cooldown period value is specified for scale-in events to avoid scaling-in too quickly. Note that during a scale-in cooldown period, scale-out events could still be triggered.
In the case where capacity needs are proportional to the metric monitored, such as service CPU utilization, it is likely that using a target tracking policy would suffice. However, it is possible to use a step scaling policy for additional control. With step scaling we map different ranges of a tracked metric to different adjustments, known as step adjustments. To build on our target tracking example above, we could define the range above or equal to 90% CPU utilization metric to scale our capacity in increments of 2 tasks, whilst for the range of 75%-90% utilization the capacity will grow in increments of a single task. In other words, we are able to define a more aggressive scale-out policy when the CPU utilization is particularly high.
Target tracking and step scaling are useful for auto-scaling during times of unpredictable load. Scheduled scaling, on the other hand, could be used to update resource capacity based on a predictable pattern. This could be useful in such scenarios where an online store sees a spike in traffic during certain times of the day.
To describe existing scaling policies under the ECS service namespace, issue the following command:
$ aws application-autoscaling describe-scaling-policies --service-namespace ecs
Additionally, to review scaling activities of the past 6 weeks, issue the following command:
$ aws application-autoscaling describe-scaling-activities --service-namespace ecs
IAM Auto Scaling Role
As depicted by the diagram above, the application auto scaling service is responsible for creating and deleting CloudWatch alarms based on the defined scaling policies. It is therefore important not to manually update or delete these alarms directly. In addition, it monitors and updates the ECS service when scaling events occur. To allow this service to perform such actions we need to provision an IAM role that this service could assume, as defined below:
Now that we have covered the basic building blocks of auto scaling with ECS, let’s move on to build a simple web service that we could deploy in order to test our target tracking policy implementation.
A Rust Web Service
One of my colleagues recently introduced me to the Rust language. Rust has a rich type system, is highly performant and its compiler helps you avoid common pitfalls. It comes with great tools such as Cargo — its build tool and package manager. The crates.io site hosts tens of thousands of packages (“crates”) supported by a very active and helpful community. It is truly exciting to learn this fairly young language and benefit from its ecosystem.
Our service will imitate the Python service implementation that we’ve covered in this previous article. A health check endpoint is exposed to be checked by the load balancer and a simple struct is returned from another sample endpoint.
The code sample builds on the Warp web server framework, which in turn builds on Hyper, an HTTP implementation for Rust. The basic building block in Warp is the Filter, which defines how to handle requests. In the following code snippet we compose two Filters to handle the aforementioned endpoints.
To prepare this HTTP server for deployment, we utilize a multi-stage build as described by Alexander Brand in this post. By preparing a docker image that includes only the target executable, we end up with a very lightweight image (~5MB).
The built image has already been uploaded to Docker Hub so we could refer to it in our deployment next.
Deployment and Load-Testing
Our project’s code repository includes both the web server code as well as the CloudFormation templates. The project architecture is very simple in order to focus on the auto-scaling aspect of the deployment. We provision two public subnets which are used by our load balancer as well as ECS tasks. It is generally better to instantiate private subnets for the ECS tasks as we’ve done previously, but in this current example the public subnets are used to facilitate pulling our docker image from an external source (Docker Hub) without the need to set up NAT gateways.
To deploy our CloudFormation stack, you’ll need to create or use an existing S3 bucket to store the template artifacts. We package the artifacts in the
cloudformation folder by issuing the following command:
$ aws cloudformation package --template-file main.yaml --s3-bucket <CHOSEN S3 BUCKET> --output-template-file generated.yaml
This will generate the
generated.yaml file which we can then deploy with the following command:
$ aws cloudformation deploy --template-file generated.yaml --stack-name <CHOSEN STACK NAME> --capabilities CAPABILITY_NAMED_IAM
This command could take several minutes. As soon as the stack has been provisioned, the nested load balancer stack should include the output value for the key
LoadBalancerDNSName. We can try out this endpoint with
$ curl http://<load balancer endpoint>/valuation
This should return our hard-coded valuation struct response as JSON. Now let’s put some load on the service and monitor the number of instantiated tasks.
We’ll be using
hey to stress test the service. Let’s send 10 million requests to our service:
$ hey -n 10000000 http://<load balancer endpoint>/valuation
The service should scale to two tasks after a few minutes followed by a 5 minute cooldown period. Following this cooldown period, another task should be added. This would repeat until either the service reaches its upper bound on the number of tasks (defined by the scaling policy), or the load test had finished running. Stopping the load test will gradually decrease the number of tasks back to one with a scale-in cooldown period between each decrement.
In this guide we’ve covered the basics of Application Auto-Scaling and in particular how they apply to scaling microservices in ECS. We’ve built a lightweight Rust web service container and provisioned an ECS service with a target tracking scaling policy. We then load tested it through the load-balancer to ensure the service auto-scales as expected.