Autoscaling in AWS Part 1: Autoscale ECS Services

Santi Muñoz
Signaturit Tech Blog
5 min readSep 28, 2018

For all the SaaS companies, the availability and performance of their products is a key point for the business. To provide a fast response to an increase of traffic, maintaining the maximum performance possible, we need to implement a system that scales automatically.

This is an important milestone for a tech company, in this series of posts will be explained all the resources and services we implemented to achieve an autonomous system that is able to scale up/down services and machines.

All the resources are created with Cloudformation and the services involved are in NodeJs and deployed as Lambdas.

Initial Scenario

All our services are dockerized and running in AWS ECS. We have one cluster for the light services, and one dedicated for each big service that needs more resources. For each cluster we have two Autoscaling Groups, one with On Demand machines and the other with Spot machines.

The main goal is to be able to automatize the scale up/down of the services and machines in the clusters.

In this post we will analyze the scalability of the HTTP services:

  • Scale out HTTP services
  • Scale in HTTP services

For the part two of this series, we will talk about the scalability of the workers:

  • Scale out workers
  • Scale in workers

In the last part of this series, we will implement the autoscaling for the machines:

  • Scale out machines
  • Scale in machines

Scale Out HTTP services

When scaling services we can use a variety of metrics: CPU, Memory, Load balancer metrics…

For the HTTP services we decided to use the metric TargetResponseTime of its Target Group, but for the future we also plan to add the CPU and Memory in the equation.

Most of the services have an average response time lower than 1 second, the idea is to scale out when the latency rises 5 seconds for 2 minutes.

To achieve this we will create the following resources:

  • A ScalableTarget attached into the cluster’s service. With this we will be able to configure the minimum and maximum number of tasks of the service, securing the minimum tasks available, and the maximum number of tasks we are willing to run.
The If statement in the ResourceId is just because the names we have for the clusters

Also notice that the ResourceId is composed by service/<cluster-name>/<service-name>

  • A ScalingPolicy that will have the action to increase the number of desired tasks.

The attributes to remark here are the StepAdjustments. It’s very important to choose which one to use between the MetricIntervalLowerBound attribute or the MetricIntervalUpperBound depending on the alarm that will trigger this.

If your alarm is triggered when a certain metric overpass a threshold, then you need to use MetricIntervalLowerBound, otherwise, if your alarm is triggered when a certain metric is below a threshold, you have to use MetricIntervalUpperBound.

In our case we are using the metric TargetResponseTime, and the alarm will be triggered when the response time is higher than 5 seconds for 2 minutes. With this configuration, the StepAdjustment has to use MetricIntervalLowerBound.

And for the value in the ScalingAdjustment, a positive value adds to the current capacity and a negative number subtracts from the current capacity.

  • A CloudWatch Alarm that will trigger the Scaling Policy when the latency is higher than 5 seconds for 2 minutes.

With these three resources we are able to increase the desired count of an ECS service when its latency increase.

Scale in HTTP services

For scaling down the HTTP services we will use the same metric TargetResponseTime but with a new alarm that will be triggered when the latency is below 2 seconds for 2 minutes.

For this we will use the following resources:

  • The same ScalableTarget created. As we have created it before, we don’t need to do it again.
  • A ScalingPolicy that will have the action to decrease the number of desired tasks.

When for the scaling up policy we choose MetricIntervalLowerBound, for this case we will use MetricIntervalUpperBound because the alarm for scaling down is triggered when the latency is lower than 2 seconds. So, because the metric is lower than the threshold, we have to use MetricIntervalUpperBound.

Notice also that the ScalingAdjustment is a negative number, to indicate that it will decrease the desired count.

  • A CloudWatch Alarm that will trigger the Scaling Policy when the latency is lower than 2 seconds for 5 minutes.

This alarm is not really a bad alarm, its only purpose is to trigger the scale in policy of the service.

Summary

To scale up/down an ECS Service based on the value of a metric is pretty direct, we only need an alarm that will trigger a scaling policy.

In the following screenshot you can see the scale up and down in action :)

We can also see that the scale in process is less aggressive.

As a reactive process, the scaling is only executed after the alarm is triggered. Some of the next improvements will be to implement a predictive algorithm for the scale up/down.

In the next post we will implement the same but, for workers running in ECS as well. The main difference will be that we will use a lambda and a custom metric to scale in.

About Signaturit

Signaturit is a trust service provider that offers innovative solutions in the field of electronic signatures (eSignatures), certified registered delivery (eDelivery) and electronic identification (EID).

Open Positions

We are always looking for talented people who share our vision and want to join our international team in sunny Barcelona :) Be a SignaBuddy > jobs

--

--