Designing for EC2 Per Second Billing

Published in

The Blue Sentry Blog

3 min readOct 4, 2017

Amazon’s new per second billing for EC2 usage is good news. Taking utility billing down to the second, which was previously per hour, is a huge change. Organizations with long-running workloads on AWS won’t necessarily see a benefit from this change. For companies who run a lot of job or worker instances, the savings could be huge.

Transient EMR Clusters

When you need to do batch processing of data, this is typically done using Elastic Map Reduce(EMR) clusters. These clusters run on EC2 and are gaining the benefit of per second billing. With per hour billing, the typical goal is to size the cluster so the job runs as close to an hour without going over. Allocate too much power, and the job will complete in 20 minutes but billed for the full hour. Allocate too little, and it could run 63 minutes and now your paying for 2 hours. This is a fine balance that requires monitoring, and if the amount of data needing to be processed varies, it’s almost impossible to get right.

With per second billing, you can size your cluster with plenty of power, and only pay for the try processing you use. If it runs 63 minutes, no big deal, pay for 63 minutes. If it completes in 32 minutes, then you only pay for 32 minutes. Companies can now run their EMR jobs more frequently also, keeping their data warehouses more up to date.

Remember to take into consideration the bootstrapping time of an EMR cluster. This takes on average 10 minutes, running more frequent transient EMR clusters will result in paying for that bootstrap time.

Also be aware that per second billing also applies to spot instances also.

SQS/ASG Configuration

Textbook Simple Queue Service(SQS) examples involve using an AutoScaling Group(ASG) to spin up EC2 instances to handle the queue size. When the jobs have been completed and removed from the queue, the ASG will delete some of the instances leaving just enough to handle the current load.

Prior to per second billing, people were very conservative in their ASG scale-out/scale-in rules. This caused queues to get backed up more than optimal. Companies didn’t want to pay to spin up an instance for an entire hour if after 15 minutes the jobs in the queue fell back to normal levels.

ASGs were also configured to not terminate instances quickly. Nothing worse than running an instance for 15 minutes and terminating it, just to have another surge of jobs cause another instance to be started for another 15 minutes. Now you’ve had to pay for 2 hours while only using 30 minutes.

Now with per second billing, architects should take a new look at the ASG configurations. Don’t be afraid to scale up and service your queues better. Also, don’t be afraid to scale back. If there’s no work to be done, don’t pay for idle machines because it used to be the smart thing to do.

When Lambda’s 5 Minute Limit is Not Enough

When deciding if a new project is going to run on Lambda or EC2, a major point to consider is, “Will this job run for over 5 minutes?” If the answer is, “Yes”, then Lambda is out of the question due to its 5-minute running limit. Lambda offers per 1/10 second billing, which makes it very attractive. Now with per-second EC2 billing, running jobs on EC2 isn’t nearly as expensive as it previously was.

A common use-case I’ve seen are cron jobs. These are processes that you want running on a scheduled basis: every 15 minutes, every hour, etc. If the job could run in under 5 minutes, it was natural to run it on Lambda using a CloudWatch Event cron job to start the process. If it took more than 5 minutes, then it was common to have an always-on EC2 instance.

Sure you could still use CloudWatch Events to start a transient EC2 instance on a scheduled basis that did the processing then terminated itself, but you were getting billed for the entire hour no matter if the instance was only on for 10 minutes. So if the job ran frequently, it was easier to leave the instance running all the time.

Now you can spin up an instance and only pay for the time you are really using that machine.

Conclusion

There are many different situations where per-second billing will save your company money. Reassess your current architecture and alway look for new ways to save. I’ve made plenty of design decisions for customers based on costs and savings. Sometimes the game changes and new opportunities come up to save even more.

Designing for EC2 Per Second Billing

Transient EMR Clusters

SQS/ASG Configuration

When Lambda’s 5 Minute Limit is Not Enough

Conclusion

Written by Dennis Webb