Autoscaling in AWS Part 4: Scale In EC2 machines using mixed Autoscaling Groups

Published in

Signaturit Tech Blog

4 min readJun 17, 2019

In the Part 3 of this series we implemented the scale out and scale in strategies for the machines attached into an ECS Cluster and managed from an Autoscaling Group.

In the last post the algorithm implemented for the Scale In part only works when the machines of the cluster are all of the same type. Since AWS released a new Launch Template that allows using multiple type of instances in the same Autoscaling Group, we had to implement a new Scale In script to handle it.

In this post we will explain this new script. As a recap, the scenario we have is, multiple ECS Clusters, each one provisioned by two Autoscaling Groups, one with OnDemand instances only, and another one with Spot machines. What we have changed since the last post is that now we use multiple types of machines in the Spot Autoscaling Group.

The premises are the same as the last post:

A cluster can scale down when it has enough free resources to remove one machine, reschedule all its tasks, and end up with space to schedule one more container of the largest service.

And always following the requirements below:

Maintaining the AZ balance in the Autoscaling Group.
Wait until all the containers are stopped before terminating a machine.
Remove only Spot machines.

Collect data from the cluster

The first thing we need to do is to collect some data about the cluster, their services and tasks in order to take decisions later with them. For each cluster we will collect:

{  
   "clusterName":"Cluster-AA",
   "largestService":{  
      "serviceName":"main_service",
      "memoryReservation":2000
   },
   "daemonServices":[  
      "cron",
      "monitor",
      "logstash"
   ],
   "containerInstances":[  
      {  
         "remainingMemory":1259,
         "totalInstanceMemory":3704,
         "containerInstanceArn":"arn:aws:ecs:eu-west-1:XXX:container-instance/yyy-zzz",
         "instanceId":"i-secretId",
         "isSpot":true,
         "availabilityZone":"eu-west-1b",
         "tasks":[  
            {  
               "taskArn":"arn:aws:ecs:eu-west-1:XXX:task/11-00",
               "group":"service:cron",
               "reservedMemory":25
            },
            {  
               "taskArn":"arn:aws:ecs:eu-west-1:XXX:task/22-00",
               "group":"service:logstash",
               "reservedMemory":300
            },
            {  
               "taskArn":"arn:aws:ecs:eu-west-1:XXX:task/33-00",
               "group":"service:monitor",
               "reservedMemory":120
            },
            {  
               "taskArn":"arn:aws:ecs:eu-west-1:XXX:task/44-00",
               "group":"service:main_service",
               "reservedMemory":2000
            }
         ]
      }
   ]
}

Filter the instances of the cluster and only keep the spots and the one from the biggests AZs

From the list of instances of the cluster we will keep only the Spots and the ones from the biggests AZs, with this what we’re trying to achieve is to avoid breaking the balance in the Autoscaling Group. The idea is to only remove machines from the Availability Zone with more machines.

Evaluate if an instance can be drained

Once we have the list of instances that can be drained, we will iterate them until we find one that can be drained and all their tasks, except the daemons, can be rescheduled in other machines, plus leaving enough free room to fit N number of the largest container.

In the code below you can see the implementation of this idea:

Drain and terminate the instance

Now that we have chosen the instance to drain, we will put it in DRAINING status, wait until all the tasks are killed, and remove and terminate if from the Autoscaling Group.

In order to drain it safely we will give 70 seconds to the instance to drain all the tasks:

Once is drained, we will remove it from the Autoscaling Group and terminated.

Wrapping Up

And the method that would orchestrate all this would be something like this:

If you have implemented a different script to scale down safely your cluster don’t hesitate to share it with all of us! Each case has different requirements, and in this scenario we have tried to solve a scaling challenge when using Autoscaling Groups with multiple type of Instances.

About Signaturit

Signaturit is a trust service provider that offers innovative solutions in the field of electronic signatures (eSignatures), certified registered delivery (eDelivery) and electronic identification (EID).

Open Positions

We are always looking for talented people who share our vision and want to join our international team in sunny Barcelona :) Be a SignaBuddy > jobs