Aws Auto-scaling in production

This post is the first in a series of blog posts and tutorials on how to manage AWS in production, review of different data stores, setting and managing elasticsearch instances and writing sophisticated queries, elasticache clusters, building scalable web apps, recommender systems, common security vulnerabilities, how to secure them and how to recover from an attack and do basic forensic analysis, scalable logging solutions, SQL partitioning .. etc. pretty much everything I worked on in production.

In a project I am working on, we receive around 2000 requests/minute which spikes up seasonally during Ramadan and holidays to over 5000 requests/minute. On average we serve over 1,500,000 requests per day. These requests range from simple restaurants page views to more complex searches using elastic-search.

Previously we used several c3.large ec2 machines 24/7 to keep up with the load at any time, this caused our AWS bills to go up.

We started analyzing the traffic we receive per day and found out that the traffic is huge during rush hours in the day. However, the traffic was a lot less during the rest of the day specially over night the traffic is around 10% of the maximum traffic in rush hours. We are paying for machines that we do not fully use 24/7 so we started looking into AWS auto-scaling groups.

Auto Scaling groups

Auto-scaling groups is an AWS feature that enables you to set rules which the group would use to launch or terminate instances. For example you can set a threshold for the CPU utilization, Network In, Network Out … etc. Using this whenever there is no enough traffic the group would terminate instances so you wouldn’t pay for them or launch instances when there is a traffic spike to be able to maintain the availability of your services.

Our Auto-scaling architecture

Instead of the c3.large machines we used t2.micro instances in an auto-scaling group and the reason is that it gives us a better control over the resources we use since the machines are smaller (think about quantization levels), the t2.micro machine uses HVM virtualization, provides a single core with 1GB of ram. The minimum number of instances we set to be 3 and the maximum to be 10 which we barely reach at all (I do not remember reaching it except manually during catastrophes such as failed deployments). I wrote a cloud-init script that runs during the initialization of new instances, the script pulls the latest version from the remote repository then runs a building and run script.

Scaling policies

The scaling policies we have were based on CPU utilization, Network-In and Network-Out. The policies were

Increase instances by 1 if

  • Average CPU utilization > 90% for 5 mins
  • Average Network In > 18000000 bytes for 5 mins
  • Average Network Out > 25000000 bytes for 5 mins

Decrease instances by 1 if

  • Average CPU utilization < 40% for 5 mins
  • Average Network In < 6000000 for 5 mins

We set these policies initially by inspecting the average of these metrics for the period preceding the launch of the autoscaling group and we kept changing it as we went on until we reached stable thresholds that gives us reasonable performance at all times.

Isolation and replacements of faulty instances

It always happens that for some reason the web server running on one of the instances crashes and goes out of service. It makes sense that whenever this case happens, even before we investigate the incident and fix the bugs, the instance need no requests to be routed to it and needs to be terminated and replaced by a fresh instance. For this we use Elastic load balancer to do the health checks and report to the auto-scaling group and we set the auto-scaling group to terminate unhealthy machines and replace it by new ones so at all times we have enough machines automatically maintained for us even in case of failure. The ELB does that by pinging a route and inspecting the status code and the response time which we threshold in the ELB health checks configurations to consider it a time out and flag the instance unhealthy.


Over night we have the minimum number of instances running i.e 3 machines. During the rush hours which are normally 1:00pm — 4:00 pm and 7:00pm — 10:00 pm, we have 7–8 instances running. During the rest of the day hours we have 4–5 machines running.

The cost for the c3.large machines we previously had is $0.105/hour the t2.micro instances cost $0.013/hour that is 8 times less than the c3.large instances i.e we need 8 micro machines for every c3.large machine we had, running 24/7 to pay the same amount of money we used to pay before.


Step by step guide to setting up the autoscaling group in a VPC and use AWS codedeploy to deploy your apps to it.

One clap, two clap, three clap, forty?

By clapping more or less, you can signal to us which stories really stand out.