Teads Engineering
Published in

Teads Engineering

Estimating AWS EC2 Instances Power Consumption

1U custom AWS server, from the re:Invent 2016 AWS innovation at Scale keynote by James Hamilton
  • A review on existing research on power modeling applied to cloud infrastructure as well as papers describing factors that can have a great impact on actual consumption (Chapter 2),
  • Our experiment and the tooling we have used to build power consumption profiles for a set of EC2 instances (Chapter 3),
  • Finally, we analyze the results and list key findings (Chapter 4 & 5).

1 — A closer look at AWS instances

As a reminder, we are trying to find a way to convert our EC2 hourly usage metrics into kWh. In order to estimate the power consumption of instances, our hypothesis is that if we can have the consumption at the hardware level, then we can simply use a converting factor based on the number of vCPU.

1.1 — How instances are sized and what’s inside

In this great talk from re:Invent 2017, Adam Boeglin describes how c5 instances are sized. Basically, resources are cut into instances in a linear fashion. In his example, the c5.18.xlarge instance is the equivalent of two 9.xlarge instances and the CPU to memory ratio stays the same across all sizes.

1.2 — Looking for power profiles of equivalent server configurations

Using these hardware specifications we tried to find comparable servers in the SPECpower database. SPECpower [1] is an industry-standard benchmark that evaluates the power and performance characteristics of servers. It requires a power analyzer device to be able to measure power at the AC input while running a set of workloads (mostly CPU and memory-bound).

AWS c5.metal and similar servers comparison, in green characteristics that are matching, in blue those that are higher and in red lower

2 — Server power consumption distribution and modeling

Before jumping into power measurements we need to know more about how power consumption is distributed in a server. This will help us analyze our results and check if they make sense.

2.1 — Server power consumption drivers and factors

According to a recent study from Philippe Roose et al. [3] it is said that in a modern datacenter server (without GPU) the two largest power consumers are the CPU and memory.

  • Manufacturing, in a recent study, Yewan Wang et al. [4] observed a 7.8% power variation among 12 identical servers running the same test suites. Another study from Coles et al. [5] tested three servers with similar specifications and showed a 5% power variation, mainly due to the CPU.
  • Thermal conditions, Yewan Wang also demonstrated that “the ambient temperature affects server power in two ways: through temperature sensitive components (i.e. CPU) and through server internal cooling fans”. The variation observed, here running a cpuburn benchmark, is significant: “with an Intel Xeon E5–2609v3 CPU, the power is increased by 16% when the temperature of CPU varies from 37.7°C to 74.5°C.”

2.2 — Server power modeling

A lot of research has been done regarding power modeling at different scales (datacenter, server, virtual machine, down to the application level). There is a really interesting study from Leila Ismail that does a comparative evaluation of software-based power models for datacenter servers [10]. The study covers linear and non-linear models that use either mathematical or machine learning approaches. Another survey of power models has been published by Weiwei Lin et al. [11], this time for cloud servers specifically.

3 — Building power consumption profiles for AWS instances

In a cloud environment, we can only use software metering tools. In our previous article, we discussed Intel RAPL (Running Average Power Limit), a software power meter available starting from Intel’s Sandy Bridge CPU architecture. Research from Fahad et al. [12] has demonstrated a good correlation between RAPL readings and system power meters. This feature has been experimented with success on EC2 instances by Nizam Khan et al. [13] by accessing RAPL counters through model-specific registers (MSRs).

3.1 — Reading RAPL counters on EC2

There are few resources about reading MSRs on EC2. Here is an old blog post from Brendan Gregg [14] where he explains how to read temperature levels on EC2. It’s also important to note that MSRs are specific to each processor family and to be able to read them we first have to look for their addresses in each Intel Datasheet, which aren’t reader-friendly.

In this screenshot, we can see some of the information that is read from the MSRs like the CPU and DRAM TDP.

3.2 — Building a stress test protocol

Ideally, we want to measure our instance consumption under different loads. Looking at possible stress testing tools we found stress-ng, a tool that has a wide range of options and can help in performing granular stress tests (CPU, memory, IO).

  • --ipsec-mb, a stress test that performs cryptographic processing using advanced instructions like AVX-512 (test called ipsec later). We wanted to observe the impact of such instructions on power consumption.
  • --vm, a test that specifically performs memory stress methods so that we can observe the impact of memory-intensive workloads (test called VMStress later).
  • --maximize, in this test stress-ng will launch different types of stressors (CPU, cache, memory, file) and set these to the maximum settings allowed in order to get an estimate of a worst-case scenario (test called maximize later).

3.3 — Automating the experiments

We wanted our experiments to be easily repeated, reproduced, and verified. Vincent Miszczak packaged and automated Turbostat and stress-ng to automatically perform these tests and log the associated power consumption. We simply called this tool turbostress 🔥, you can find it on our GitHub.

4 — Results

We have run the turbostress test suite on the c5, m5, r5, z1d, and m5zn bare metal instances and logged the results. We detail our findings below.

4.1 — CPU load tests

In this test, stress-ng spawns one stressor per thread and exercises the CPU by sequentially running different stress methods. These are supposed to cover the variety of possible workloads we can have in production. This test is performed for each load increment until it reaches 100%.

4.2 — Memory intensive tests

We learned earlier that DDR4 memory consumption should be around ~0,4W/GB for normal workloads. We also saw that memory consumption was quite stable in our first CPUStress test. This was observed for all the instances.

4.3 — CPU intensive tests with AVX-512 instructions

Another scenario we wanted to investigate is the impact of advanced CPU instructions such as AVX-512. Here we compare power consumption metrics for four tests that fully load the CPU.

4.4 — Comparison with SPECpower profiles

Of course, we cannot really compare our figures with SPECpower reports as the methodology differs significantly (throughput versus CPU utilization rates). On AWS we only have CPU and memory consumption estimated by RAPL whereas SPECpower data comes from power analyzers that measure the whole server.

  • We failed at finding comparable machines in the SPECpower database. Indeed CPUs are not exactly matching and have an average lower TDP in our selection, however, they are from the same Xeon family.
  • Our tooling is biased. RAPL readings are off and/or stress-ng tests are flawed, even if some more robust studies demonstrated that RAPL could be a good proxy to actual power consumption it’s not as accurate as a power analyzer.
  • Our experiment protocol is wrong, stress test sessions are too short and the simple average method we use doesn’t handle well potential outliers. We would need to do more tests to confirm, our different sessions showed relatively consistent measures but we kept them short (60 measures per test maximum).

4.5 — Converting to virtual machines

We still need to take into account the other resources that we are not measuring. From what we have seen in some power models, a simple constant could be ok as memory and CPU are the main power proportional components in a server. For the sake of simplicity and to avoid any arbitrary heuristic we will wait for on-premise measurements to set this constant.

5 — Key Takeaways

  • First, there are many external factors that should be considered and can greatly impact actual server consumption, such as manufacturing and thermal conditions. In a cloud context, this isn’t something we can manage and we can have different hardware generations for the same instance type.
  • For a given CPU utilization rate, the actual power consumption is tied to the type of workload that is processed. We have seen significant variations across our tests and confirmed that advanced AVX-512 instructions can be power-hungry compared to more common instructions.
  • Memory shouldn’t be considered to have a marginal footprint. According to RAPL, it can even exceed CPU consumption in some configurations. Our measurements are close to memory manufacturers' benchmarks which is reassuring.
  • Finally, using a statistical approach based on SPECpower data could greatly underestimate power consumption for AWS instances, at least for the Intel Xeon-based configurations we tested.

Call for contribution

We have imagined our power measurement protocol to be easily reproduced and challenged. We are looking for help to check how these software measurements would compare with on-premise measurements using power analyzers. This could help us check the accuracy of our approach and eventually straighten up the data (CPU + memory) to the whole system.

Bibliography

  1. Power and Performance Benchmark Methodology V2.2, SPECpower committee — December 2014
  2. Energy Proportional Servers: Where Are We in 2016? Congfeng Jiang, School of Compute Science and Technology Hangzhou Dianzi University — 2016
  3. Prediction & Modeling Energy Consumption for IT Data Center Infrastructure, Philippe Roose et al., LIUPPA, University of Pau — 2018
  4. Potential effects on server power metering and modeling, Yewan Wang et al., IMT Atlantique — 2018
  5. Comparing Server Energy Use and Efficiency Using Small Sample Sizes, Coles et al., Lawrence Berkeley National Laboratory — 2014
  6. Experimental analysis of vectorized instructions impact on energy and power consumption under thermal design power constraints, Amina Guermouche et al., Telecom SudParis — 2019
  7. System-Power Calculator for SDRAM devices, Micron
  8. Calculating Memory Power for DDR4 SDRAM, Micron — 2017
  9. How Much Power Does Memory Use?, Crucial
  10. Computing Server Power Modeling in a Data Center: Survey, Taxonomy and Performance Evaluation, Leila Ismail — 2020
  11. A Taxonomy and Survey of Power Models and Power Modeling for Cloud Servers, Weiwei Lin et al., South China University of Technology — 2020
  12. A Comparative Study of Methods for Measurement of Energy of Computing, Fahad et al., School of Computer Science, UCD — June 2019
  13. RAPL in Action: Experiences in Using RAPL for Power Measurements, Nizam Khan et al., Aalto University — 2018
  14. The MSRs of EC2, Brendan Gregg — 2014
  15. Reading RAPL energy measurements from Linux, Vince Weaver, University of Maine
  16. Energy Efficient Scheduling of Servers with Multi-Sleep Modes for Cloud Data Center, Chonglin Gu et al. — 2018

--

--

200 innovators building the future of digital advertising

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store