How to select the most efficient AWS EC2 instance types using Pareto front analysis

The results demonstrate the latest generation of instances provide a significant cost efficiency boost over previous instance types

Willard Simmons
A Cloud Guru
5 min readJun 20, 2017

--

UPDATE: AWS released C5 and P3 instance families, an updated chart is avaialable in this newer blog post: https://medium.com/@dxsimmons/update-nov-2017-selecting-the-most-efficient-aws-ec2-instance-types-using-pareto-front-analysis-e2e76ec95e31

At DataXu, we power a large part of our technology stack using Amazon Web Services (AWS). At any given time, we may be running thousands of AWS EC2 instances — or the virtual servers in Amazon’s Elastic Compute Cloud (EC2) that runs applications on the AWS infrastructure. For DataXu, the servers power data ingestion of hundreds of terabytes per day, elastic reporting, analytics, machine learning and consumer facing web services.

As a CTO, I thoroughly enjoy working with the constantly changing elastic cloud services from AWS. There are so many enticing options and combinations to choose from that I feel like a kid in a candy store.

The downside of such a vast array of options is that evaluating the financial impact of choosing any one offer can be mind-numbingly complex. Due to the nature and scale of DataXu’s business, even small opportunities for improvement in our AWS cost affects our overall profitability. Understanding the different tradeoffs and cost impacts of EC2 instance choices in AWS is time consuming and tricky.

This led me to search for an easier way to compare and contrast the various offerings of AWS. I needed to find the optimal choice for DataXu while also saving time and simplifying the process for future decision-making. Since I wanted to better understand the compute vs. memory cost tradeoff of the 76 different EC2 instances I used a method called an “efficient frontier” or “Pareto front” analysis to organize and understand the pricing information.

Analyzing the Efficient Frontier

This is a Pareto front analysis of all EC2 instance types in terms of two metrics:

  • Compute Cost Efficiency
  • Memory Cost Efficiency

The goal is to organize the set of point solutions — in this case, instance types — in terms of normalized metrics, then shrink the consideration set to a smaller set of the most efficient choices.

The metrics should be orthogonal, meaning they don’t depend on each other. In this case, I used normalized measures of compute efficiency vs. memory efficiency as the two metrics, defined as follows:

The cost in this study is based on EC2 on-demand pricing in AWS region us-east-1 with units of dollars per hour indicated by “$-hr”

Compute Efficiency: Measured as “Compute ECU / $-hr”
An Elastic Compute Unit (ECU) is a normalized unit of CPU integer processing power available in an AWS instance. According to AWS, one EC2 Elastic Compute Unit provides the equivalent CPU capacity of a 1.0–1.2 GHz 2007 Opteron or 2007 Xeon processor. An m1.small instance is measured at 1 ECU, whereas an c4.8xlarge has 132 ECUs.

Memory Efficiency: Measured as “Memory GB / $-hr”
This is simply the available GB of memory divided by the hourly on-demand price. The so-called “utopia” point, or the theoretical point in the direction of maximal compute and memory efficiency, is in the upper right of the chart. The Pareto front, or efficient frontier, is indicated by the dashed line. This is an envelope connecting the set of “non-dominated” solutions.

A solution is considered “dominated” if there exists any other point that has a higher efficiency in either one of the two metrics. The set of “non-dominated” solutions are those where all points are not dominated.

In this case, the non-dominated solution sets are c4.8xlarge, m4.4xlarge, r4.large, and x1.32xlarge. These points define the edges of the “efficient frontier”. These four instance types maximize either compute or memory efficiency. There are no other instance types that are more cost efficient than these four in both memory and compute efficiency, at the same time.

Caveats to this Analysis

Note that this is only a study of compute and memory efficiency. Available instance types have different sizes and speeds of local storage, different networking speeds and other interesting features like GPUs and FPGA co-processors. The most efficient solution for you may not match the ones that appear on this chart, if you require the benefits of these kinds of features.

Furthermore, this analysis assumes you are fully utilizing both compute and memory. While this is often the case for DataXu’s applications, it is not often not the case for most typical web apps. Consider your application’s use of memory and CPU when studying these results.

Additionally, instance types that have burstable CPU, like the t1 and t2 families are not modeled correctly here. The amount of ECU you get from these instances depends on your workload, so I didn’t know how to represent this fairly on the chart.

I tried a few different ways, but decided instead to set it to zero and call out the discrepancy here. The plot accurately shows the memory cost efficiency of those families. The vertical position of the t1 and t2 instance types on the chart heavily depends on your particular use case.

The Final Results

If you group the instance types by family, it’s easy to see that latest generation of instances, C4, M4, R4 and X1, demonstrate a very significant cost efficiency boost over previous instance types. For example, the C4 family averages a compute efficiency score of 85, whereas C3 averages about a 65.

This is a 23% cost efficiency improvement if you use C4 instead of C3!!

Note that I didn’t complete a similar study for reserved instance (RI) pricing. While most RI discounts are roughly in the same proportion regardless of instance type, this is not universally true. Some instance types have much deeper RI discounts than others.

Raw data for this study was pulled from the excellent tool hosted at http://www.ec2instances.info. It’s very convenient to pull a CSV from this page and gather data for your own analysis.

I hope this was helpful. I look forward to any feedback from the AWS community about the analysis method and would enjoy hearing about any tricks or tips you use to make cost impacting decisions at scale.

Thanks for reading! If you like what you read, hit the ❤ button below so that others may find this. You can follow me on Twitter.

--

--