Teads Engineering
Published in

Teads Engineering

Evaluating the carbon footprint of a software platform hosted in the cloud

Photo by Taylor Vick on Unsplash
  • How to estimate the environmental impact of industrial activities and how existing methodologies apply to information technologies (Chapter 3)
  • Where is the research in cloud data centers’ energy consumption, and are there tools and solutions we can use to measure it (Chapter 4)
  • What it would take to build a proper estimation for our AWS infrastructure, based on available data (Chapter 5)

1 — Our Stack

We are excluding internal IT tools and services as well as direct value chain partners: demand-side platforms used by Brands and Agencies to program their campaigns and Publisher infrastructure where ads are displayed.

Identifying sustainability initiatives for our main providers

2 — First analysis with costs and resources

2.1 — Using costs as a proxy

Evolution of the infrastructure cost per impression (in blue) compared to the growing number of impressions (in grey) for the past three years 👏 to the Teads Engineering team

2.2 — Using CPU and RAM consumption as a proxy

Teads EC2 resources usage over time (vCPUs and RAM quantity per day) — April 2020 to September 2020
vCPU footprint per impression (in blue) compared to the total number of impressions (in grey) — April 2020 to September 2020

3 — How to estimate the environmental impact of information systems

  • ISO 14064 is the international standard and a reference when it comes to greenhouse gas (GHG) emission reporting. It was created to standardize practices and certify the results. It can build upon the accounting methodologies below.
  • Bilan Carbone is the french accounting methodology, compliant with ISO 14064. This methodology was created by the French Agency for ecological transition (ADEME).
  • Greenhouse Gas Protocol was created by the World Resources Institute (WRI) and the World Business Council for Sustainable Development. The GHG Protocol works with NGOs and governments to build a credible and efficient GHG accounting methodology.

3.1 — Emission Boundaries and Scopes

Overview of GHG Protocol scopes — Source GHG Protocol
  • Scope 1 emissions are direct emissions from owned or controlled sources.
  • Scope 2 emissions are indirect emissions from the generation of purchased energy.
  • Scope 3 emissions are all indirect emissions (not included in scope 2) that occur in the value chain of the reporting company, including both upstream and downstream emissions.

3.2 — Emission Factors

Global Warming Potential (GWP) — Converting 1 kg of GHGs into X kg of CO2 equivalent — Source Clim’Foot European Project

3.3 — Available methodologies for Information Systems

Information System Annual Environmental Report methodology — Source: Déployer la sobriété numérique, The Shift Project — October 2020
  • They recommend pragmatism when looking at unknown emission factors. Even if it’s not ideal we should try to estimate the correct order of magnitude of an emission even if we don’t have precise data.
  • As for the calculation itself, we should use local electricity emission factors for cloud infrastructure and add compensation measures separately when they exist.
  • We need to consider the whole lifecycle for the equipment we are using with at least embodied emissions (production phase) in addition to actual energy consumption (run phase).

4 — What is the state of research on cloud data centers energy consumption & measurement

  • On Theodo’s blog, Cyrielle Willerval recently explored how to monitor a server’s energy consumption to optimize the impact of a web application. While it’s an interesting method to use when developing or refactoring services, it does not give us the global footprint of a service.
  • We can also list Carbonalyser, a browser extension that estimates the global footprint associated with internet browsing. In that case, the computed value is based on really high-level estimations.
  • While writing this article we came across Argos, a new initiative that intends to bridge the gaps and estimate the energy footprint of software at the system level (client, server, network, and database). The estimate is limited to Watt.hours for now.
  • NegaOctet, which is a French research program aiming to define a dedicated methodology and emission factors to evaluate Digital Services impacts. The results of this initiative are highly awaited.
  • Clim’Foot, which is a European research project that searches for the harmonization of carbon accounting practices at the European level. For now, it’s lacking emission factors for the Tech industry.

4.1 — Components of a Cloud Platform

Cloud services components — Source: Assessing the suitability of the Greenhouse Gas Protocol for calculation of emissions from public cloud computing workload, David Mytton, August 2020
  • The physical location of the infrastructure isn’t precise. For example, AWS has a single region in Virginia but has 55 physical data centers in that geography. This forces us to use energy mix intensity aggregates on a regional level for our calculations.
  • Cloud providers often develop their own custom hardware for which we don’t have any specifications or lifecycle information.
  • We run virtual machines and do not precisely know the corresponding physical server specifications. It gets even harder for serverless services (marginal usage in our case, but still).
  • Each instance family and generation has a specific footprint.
  • VM allocation ⁷ has a significant impact on actual energy consumption.
  • Actual server energy consumption doesn’t scale linearly according to CPU load and requires modeling ⁸.

4.2 — Cloud Data Center energy footprint distribution

Energy Footprint of Cloud Computing Systems — Source: Supporting energy-awareness for cloud users, David Guyon, January 2019

4.3 — Techniques to determine compute carbon footprint

Option 1: Estimation based on available hardware specifications

Option 2: Estimating consumption based on software metering

RAPL Power Domains — Source: RAPL in Action: Experiences in Using RAPL for Power Measurements, Nizam Khan et al.
  • This technique only covers the run phase and we still have to estimate emissions from manufacturing.
  • RAPL readings might not be reliable to profile a VM as it reads the consumption at the processor level and not at the core or thread-level (vCPU). The same physical resources are shared between different users in a virtualized cloud environment and there is an expected impact of other co-running user instances on the overall power consumption and load of the system. Having co-running users could be seen as an issue to precisely determine precise software footprint but in our case, it’s simply a direct impact from running in the cloud. We can accept this limitation as part of the physical reality of our infrastructure.
  • This approach is CPU-centric and needs to be extrapolated to the overall system to be used in a carbon footprint analysis.

5 — What it would take to build an estimation for our AWS infrastructure

  • Emissions from running compute primitives
  • Embodied emissions from the compute hardware manufacturing phase
  • Emissions from running and manufacturing network primitives
  • Emissions from storage primitives

5.1 — Emissions from running compute primitives

EC2 running hours * Instance Ratio * Physical Instance Energy Consumption (kWh) * Region Emission Factors (CO2kg/kWh) * PUE
  • Some of this data is reported as No Instance type, according to Cost Explorer’s documentation, “This category includes costs (e.g., data transfer in/out) that are not directly attributable to a specific Instance Type”. We can assume this would be covered in our calculation for network primitives.
  • Lambda (AWS serverless compute service) and maybe other marginal services are missing from this report (not significant for us).
  • Virginia US eGRID: ~0.335 kgCO2/kWh in 2018 (reported as 739.35 lbCO2/MWh), using the “Virginia” state data on the service
  • Ireland SEAI: 0.375 kgCO2/kWh in 2018
  • Tokyo Bureau of Government: 0.470 kgCO2/kWh in 2017
  • France eco2mix: 0.035 kgCO2/kWh in 2019, it’s interesting to see that we can greatly optimize our infrastructure impact by locating it next to low carbon grids.

5.2 — Other emission sources

Embodied emissions from the hardware manufacturing phase

Emissions from running and manufacturing network primitives

Emissions from storage primitives

Takeaways

  • We didn’t think we would have to go this far to get meaningful numbers. But the lack of readily usable and accepted emission factors makes it quite complex to estimate the carbon footprint of software platforms.
  • Things are improving for the best and we are starting to receive some data from our providers but methodologies are not fully disclosed which makes it difficult to compare and aggregate. We are lacking true customer reporting standards.
  • We cannot use costs as a proxy since we are billed according to the usage of virtual resources, without considering load and the energy consumption impact. As a result, an idle resource costs the same as an instance running 100% CPU even if their respective impact may largely differ. Pricing models may also distort the emission reality (on-demand resources versus spot resources, EMR markup, etc.).
  • Finally, there is a need for more transparency on life cycle analyses that are produced so that the community can benefit from these efforts. Having a set of consumption profiles in kWh with good confidence intervals for infrastructure primitives would be a game-changer. It would help in performing life cycle analysis and taking educated decisions when it comes to software architecture and infrastructure location.

Bibliography

  1. Bilan Carbone methodological guide v8, Association Bilan Carbone — 2017
  2. Hiding greenhouse gas emissions in the cloud, David Mytton, Nature — July 2020
  3. Net Zero Initiative — A framework for collective carbon neutrality, Carbon 4 — April 2020
  4. Ecoconception Web : les 115 bonnes pratiques, Frédéric Bordage, GreenIT.fr — April 2019
  5. Déployer la sobriété numérique, The Shift Project — October 2020
  6. Assessing the suitability of the Greenhouse Gas Protocol for calculation of emissions from public cloud computing workloads, David Mytton, personal blog — August 9, 2020 — A state of the art of the available data for Cloud customers to calculate their emissions.
  7. An experiment-driven energy consumption model for virtual machine management systems, Mar Callau-Zori et al., INRIA — 2018 — A study illustrating how VM allocation on physical hosts can impact energy consumption.
  8. Energy Measurement and Modeling in High-Performance Computing with Intel’s RAPL, Nizam Khan et al. — 2018
  9. Supporting energy-awareness for cloud users, David Guyon, INRIA Myriads — 2018 — PhD Thesis covering the energy footprint of Cloud Computing systems.
  10. Electricity Intensity of Internet Data Transmission, Joshua Aslan et al., Center for Environmental Strategy, University of Surrey — 2018
  11. The energy and carbon footprint of the ICT and E&M sector in Sweden 1990–2015 and beyond, Jens Malmodin et al., Ericsson Research — 2016
  12. Towards The Systematic Reporting Of The Energy And Carbon Footprints Of Machine Learning, Henderson et al., Stanford — 2020
  13. A Comparative Study of Methods for Measurement of Energy of Computing, Fahad et al., School of Computer Science, UCD — June 2019
  14. RAPL in Action: Experiences in Using RAPL for Power Measurements, Nizam Khan et al. — 2018

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store