AWS Code Deploy & Auto Scaling — Is there something better?

Miguel Mendez
Yik Yak Engineering
4 min readJan 24, 2017

In this post we will share our experiences using AWS Code Deploy and AWS Auto Scaling and suggest something that might be better.

The initial version of Yik Yak’s backend used a manual code deployment process (git pull) and did not perform any auto scaling. This combination was acceptable as an evolutionary step but as the app’s popularity grew it became quite untenable. Some obvious examples of the difficulties that arose from this arrangement were:

  1. What git SHA-1was deployed to which instances?
  2. What instance types do we use for the API servers again?
  3. How do I update just the API servers?
  4. We expect more traffic due to XYZ event. Can we quickly add more instances?

AWS Code Deploy

Code Deploy has a lot of nice features. You can define an “Application” that identifies an item that can be deployed. You define deployment groups within the Application and bind these to specific machines or Auto Scaling groups. You specify what revision to deploy and these revisions can come from github or S3. Lastly, historical deployment information is tracked so you can review it at a later time. So, it does a nice job of answering questions 1–3 above.

There is one catch though: starting up and deploying a brand-new EC2 instance takes around 5 minutes. I’ll come back to why this can be an issue in the next section of this post.

Okay, but how do we address question 4 above? Glad you asked. You use AWS’ Auto Scaling.

AWS Auto Scaling

Auto Scaling is a very useful feature that gives you the ability to define policies that govern when EC2 instances will be launched or terminated. Very complex policies, that integrate a multitude of signals, can be created, but for the purposes of this discussion we will assume CPU utilization is the only factor the policy looks at. That is, when the average CPU utilization across a group of machines exceeds some configurable threshold for a given amount of time, more machines will be added, and when it drops below that threshold for some amount of time, machines will be terminated.

However, there are few things to consider.

Auto Scaling Takes Time — Is that okay for your application?

Auto Scaling works in conjunction with Code Deploy which is not instantaneous. To be concrete, if I launch a clean EC2 m3-medium instance in the us-east-1d zone, it takes close to 5 minutes for that machine to be ready to take traffic.

If you use Auto Scaling for servers that processes user requests, as opposed to a queue or batch processor, the user’s experience may be impacted during the 5 minute window it takes to bring up a new instance. That is an eternity in terms of server processing time.

You can mitigate this by lowering the Auto Scaling threshold. This will cause you to bring more instances to bear when the system is loaded but not to the point where the user experience is significantly impacted. The tradeoff is that now you have excess capacity than you technically need to keep around, which means that you are wasting money.

Auto Scaling — Workload Types & Pricing Gotcha

Auto Scaling works well for workloads that are somewhat regular, e.g. they don’t fluctuate much. For example, if most days at around the same time of day your system experiences increased load for a set number of hours, then Auto Scaling will work well for you.

However, if your workload is sporadic, bursty or lasts less than an hour to complete you are likely to spend more money on EC2 instances than you should be. Look at the graph below showing the number of instances in service over an hour as driven by Auto Scaling. How many machine hours do you think were were billed?

The answer is three hours. The reason is that once an EC2 instance is spun up, you pay for a whole hour whether you use the whole hour or not. Even though two new instances were only used for twenty minutes in total, we were billed for two hours.

Conclusion

AWS Code Deploy and Auto Scaling are good tools if you understand the gotchas:

  • It can take minutes to bring up a new instance
  • You pay for a whole hour of machine time even if you only use it for a few minutes
  • You can adjust your auto-scaling thresholds to compensate for the scaling delays at the expense of machine utilization

But is there a more cost-effective way to manage machines without losing the ability to scale based on workload? Yes. It’s called Kubernetes and it’ll be the subject of our next post.

--

--