AWS Cost Cutting 101 — Compute

Published in

Call For Atlas

8 min readApr 12, 2021

In a previous article, we went through the economic benefits of deploying and scaling on the Cloud. With its lack of upfront costs and general agility, cloud deployments are a cheap and effective way to experiment with your ideas.

Though be warned — cloud operating costs, have no ceiling.

Say you scale your 4 instance MVP to a worldwide app running on 4000 EC2 instances without any thought. Your usual $40/month bill just transformed into a $40,000/month monster.

With great agility comes infinite cost potential!

Harness the Compute

AWS compute is a basic feature of the cloud. Also, AWS compute (your EC2 or Fargate instances) will basically make up 80% of cloud costs!

AWS categorizes compute under the following:

General Purpose — Good enough for all scenarios. These are the Ts, Ms and As in AWS.
Compute Optimized — Perfect for high performance compute operations, from media to big game servers. These are the Cs in the EC2 setup
Memory Optimized — Good for heavy-memory-usage solutions and large in memory data-sets. These are the R, X and Zs.
Hardware Accelerated Computing — Anything you want to be hardware accelerated — maybe that bitcoin miner. Ps, Gs and Fs in the EC2 setup.
Storage Optimized — For instances with heavy read and write operations and are hungry for IOPs. These are the Ds and Hs in the EC2 setup.

AWS yearly updates these groups with new instances. So keeping an eye out for new releases is a must for any cloud engineer.

Some instances within these categories can be further boosted with the following capabilities:

Burst performance: Unlike fixed performances, these come with the ability to take on load beyond its baseline at no additional cost. The T3s from our general purpose group are an example of this.
Storage options: From general purpose SSD, Provisioned IOPS to cheap Magnetic. Some come with Dedicated throughput (mainly the M instances) for their EBS volumes or for 1/2 the price, you can get an instance storage (aka. Ephemeral storage).

If you know your tech-stack well and know what to expect, you will pay exactly what you need. Keep in mind that all this compute is actually rented machines from an AWS datacenter. Most of it is virtualized, which means you will still get slightly varying performance.

If we scale, even the slightest overprovisioning (for example in storage, using 500GB when you need only 450GB) will turn cents into thousands. In this case we need to set a baseline and scale up and back down using auto scaling groups (ASGs).

Baseline and Scale

The most efficient use of the paid cloud resources is buying the minimum you need, at a discount, and scale up or down when demand comes in or leaves. This is done through auto scaling groups (ASG).

An ASG will monitor a metric that will denote the stress of your system, usually the CPU utilization using AWS’s CloudWatch monitor.

To use ASG, your software needs to be architected right, this means it operates asynchronously and is stateless. If instead of lightweight stateless microservices you have a big clunky monolith, you won’t be gaining as much with AWS’ auto scaling.

When load hits your system, ASG creates new instances to handle it. It then pulls the new instances out when the load ends.

It would be catastrophic to our expenses if we try to provision for our highest load — or worse — provision nothing past the baseline and let our users take the hit. Here are some statistics from a client’s webapp:

There is a significant number of load requests from 9am-2pm (users accessing the app at the start of their day), that is 5hrs of using the equivalent of 20 T3.Micros. For the remaining 19hrs the usage is a negligible 2 T3.Micros .

Let’s calculate the costs: T3.micro is a $0.0104/hr. This means in a day you pay: (0.0104*19*2)+(0.0104*5*20) = $1.44 a day in a year $523.85.

If you were to provision for the max load you will be paying (0.0104*24*20) = $4.99 a day which in a year would lead to $1,822.08. With an educated use of ASGs, you just cut your costs by 50% year over year.

ASGs scale on these criteria:

Schedule scaling — Ideal for the above example, we know when traffic spikes every day.
Demand scaling — Monitoring a metric through cloudwatch. Usually Average CPU Usage or Network In.
Predictive scaling — Based on AWS’s ML models. Great when you have patterns, as was our case. AWS will do the work for you to guess when to scale or descale and by how much.

When it comes to adding or removing instances, you have 3 ways:

simple scaling — On stress, scale by a certain amount then cooldown and wait for the next breach in your metric.
stepped scaling — On stress, scale by a certain amount but add more as demand increases and your metric is breached more. The difference with the above is that there is no cooldown period with this policy.
Target tracking — Will use ML models to decide how best to scale.

Say we don’t want the hustle of scaling or our system is not ready to scale (some stateful monolith). In this case we can look at discount programs.

Reservation Discounts

Reserved instances (RIs), scheduled RIs and convertible RIs (cRIs, more on this in another article) are the main method to get a lower cost for large and protracted computation efforts.

Lately AWS also released a savings plans variant that focuses on your ondemand per-hour commitment and is applied automatically to your EC2s.

When you buy a reserved instance, you will sign a contract to use it for 1 to 3 years and pay according to one of these setups:

no upfront — you get 30% discount.
Partial upfront — up to 60% discount.
full upfront— up to 72% discount.

RIs are best used in combination with ASGs. A baseline is created using a set of RIs and the spike is scaled using ondemand EC2 instances or spot instances, which are described in the next section.

Applying this to the client’s webapp previously mentioned. We know we need a baseline of 2 T3.micro instances which costs us $523.85 yearly. Let’s reserve that baseline using a 1 year contract paid upfront, this will costs us $106 a 1/5 of our previous bill:

We also said we have 5hrs of high traffic to scale to. We will deal with this spike using ondemand instances and get a nice discount using a compute savings plan.

With the compute savings plan we need to commit to a number of hours for 1 to 3 years. We have to commit to 5 hours a day or 36,500hrs using a 1 year fully upfront plan. With this, we will only pay $255.5 instead of $379.6, this is a 33% savings.

Discounts on our t3.micro for ondemand instances

There is a marketplace available on AWS for you to buy or sell your RIs, in case you want to get creative with your discounts. This is not applicable to the savings plan though.

It goes without saying that to make use of RIs you need a long term IT strategy.

SPOT the Instance

Spot instances operate on AWS unused EC2 capacity, you will buy CPU hours from AWS excess.

Before 2018, spot instance plan was a game of auction. Whoever offered the most got the use of the CPU for that time period. It has now changed, with bidding removed.

You will get a spot reserved for up to 6 hours, before receiving a notice that it will be terminated in 2mins.

This setup is quite complicated so why use spot instances at all? Because you can get 10x savings on your standard on ondemand instances.

Note the 70% discount and 5% interuption rate

Let’s create a quick estimate on our scaled T3.micros. Yearly we said it will cost us $523.85 but using spot instances we save 70% which means we only pay $157.15. But here is the catch:

Interruptions! — the 5% in the screenshot above is the rate of interruption, this is different for every region. This will negatively affect our user experience if we are interrupted during a long computation.
Delayed response — you will be booking a time slot which will happen at some point in the future. This isn’t good for our case where we need to provide users instant response to their request.

Spot instances come with the largest discounts, but with the least control. Your application needs to run asynchronous jobs with tolerance for failure. These work well if you have some data crunching workload with no schedule limitations.

Instantly Right

Sometimes the greatest wins come when you choose the right instance for the job.

In the client’s webapp, we used the T3.micro extensively. T3 is an upgrade from T2 with 30% performance and pricing improvement. We also have a T3A instance type, which uses a more stable CPU that doesn’t do bursts but provides a 10% savings from the T3’s $0.0104 to $0.0094.

t3a > t3

Last year (2020) AWS released a more balanced T4 instance which comes at a lesser cost of $0.0084 and offers EBS bursting for those heavy storage operations.