Understanding the Carbon Footprint of Economic Systems with pymrio: An EEIO Model-Based Approach
Environmentally Extended Input–Output (EEIO) models are a powerful macroeconomic approach to analyzing CO2 emissions trends.
Authors: Almudena Hellin & Jaime Oliver Huidobro (Data Science @ ClarityAI)
According to the Greenhouse Gas Protocol (GHG Protocol), a company’s CO2 emissions can be divided into three categories. Scope 1 covers the company’s direct CO2 emissions, Scope 2 covers indirect emissions derived from the company’s energy use, and Scope 3 covers the indirect emissions from the company’s value chain, from the distribution of raw materials to the recycling of the products it produces. As explained in this article by Clarity AI, indirect emissions can account for 90% of the total emissions for publicly traded companies. When analyzing a portfolio or global interactions, as we study in this article, we must consider how some of a company’s indirect emissions (Scope 3) may be other companies' direct emissions (Scope 1).
In this post, we will show how to leverage the pymrio package to use one of the most common macroeconomic approaches to model carbon emissions: Environmentally Extended Input–Output (EEIO) models.
Impact analysis at Clarity AI
At Clarity AI we use machine learning and artificial intelligence to bring societal impact to markets. We are continuously developing new ways to measure and optimize data, enabling more conscious and informed investment decisions based on people’s values toward impact. This way, we put environmental, social, and governance dimensions at the forefront of decision-making.
As part of the Data Science team, we leverage data from different sources to build a robust and scalable model. It all starts with a solid understanding of each industry’s operations and impacts at a global level. Which highlights the importance and usefulness of this kind of analysis.
In particular, in this article, we will talk about how we leverage EEIO models to understand global value chains and use the available matrices to compute indirect emissions. Specifically, a very easy method to obtain the upstream emissions using the available matrices.
The Environmentally Extended Input–Output models
The EEIO models describe the complex network of macroeconomic trade relationships between industries at a global level, and their environmental and social effects. The whole model is a potent tool for the comprehensive analysis of a wide variety of factors, such as employment, water use, or greenhouse gas pollution.
The model is composed of two inputs: on the one hand, the Input-Output tables which are based on the values of economic transactions between all the industries and countries participating in global supply chains. On the other hand, we take into account the environmental stressors: factors of environmental strain per unit Euro.
This model is especially useful because it provides information on the direct impact of the industries and calculates the indirect contributions related to the supply chain. In this way, the EEIO tables have become an essential support element for analyzing global value chains in a sustainable economy and calculating their environmental impacts.
Extracting information
So, how can we implement one of these EEIO models? Pymrio (Stadler 2021) is an open-source python library that works with different EEIO databases like EXIOBASE, OECD, and EORA. It includes download functions, visualization methods, and automatic checks to calculate any missing tables, making the analysis incredibly easy.
Of the available databases, EXIOBASE stands out for its high level of detail at the sector level, the variety of indicators available, the wide temporal coverage, and ease of use. The economic tables are provided in Million EUR, while the environmental and social metric tables are given in the unit of the metric per Million EUR, e.g. Kg CO2eq/Million Euro.
To start working with EXIOBASE we only need two libraries:
import pymrio
import pandas as pd
When downloading the EEIO tables, we select the year or years that are of our interest. As a note, the original EXIOBASE data series ends in 2011, and the subsequent tables up to 2022 were estimated based on a range of trade and macroeconomic data.
exio_meta = pymrio.download_exiobase3(storage_folder='data/', system="ixi", years=[2020])
Once the data is downloaded, it is ready to be used by parsing the zip object into an Input-Output object:
exio3 = pymrio.parse_exiobase3(path='./pymrio_test/EXIO3/IOT_2020_ixi.zip')
The database comes with pre-calculated tables containing economic transactions between the different sectors, namely: industry monetary output, production factors, and the extended matrices with the intensities of the environmental and social stressors. Exiobase provides 163 industries and 44 countries. The industry classification is EXIOBASE own and the country names come in ISO2 format (2 letters per country). Pymrio allows to easily manipulate the dataset. For instance, we can aggregate all the industries within the waste incineration sector into one and recalculate the tables with the Pymrio methods:
incineration_sectors = ['Incineration of waste: Food',
'Incineration of waste: Paper',
'Incineration of waste: Plastic',
'Incineration of waste: Metals and Inert materials',
'Incineration of waste: Textiles',
'Incineration of waste: Wood',
'Incineration of waste: Oil/Hazardous waste']
sector_dict = {incineration_sectors[i]:'Incineration of waste' for i in range(len(incineration_sectors))}
exio3.rename_sectors(sector_dict)
exio3.aggregate_duplicates()
exio3.calc_all()
Also, EXIOBASE provides 417 emission categories, ranging from ozone layer depletion to carcinogenic effects on humans or methane emissions. In this post, we use the ‘Carbon dioxide (CO2) CO2EQ IPCC categories 1 to 4 and 6 to 7 (excl land use, land use change and forestry)’ indicator to study carbon emissions.
Mathematical interlude
Now, to correctly understand which tables we need to use, let’s have a bit of mathematical context.
At the heart of the IO tables, we find the transaction matrix, 𝑍, that describes the global inter-industry flows within and across countries. Each element on the diagonal represents the domestic revenues while the off-diagonal elements describe the trade from region A to region B for each industry.
It is complemented by the matrix that contains the product demand, 𝑌. Where each element on the diagonal represents the domestic demand and the off-diagonal elements describe the exports from region A to region B for each industry.
Matrices 𝑍 and 𝑌 are used to describe the global economy, extract the vector with the total monetary output per country and industry, x, and, ultimately, the so-called Leontief Matrix 𝐿, where each element represents the total amount of dollars passing through industry 1 for every €1 consumed in industry 2. The expression to calculate this matrix is:
Where the matrix A, called direct requirements matrix, is calculated as:
By combining the total monetary output per country and industry with the factor of production matrix, 𝐹, which represents the environmental and social factors per country and industry, we arrive at the matrix with the coefficients for the direct intensities of each industry, labeled as 𝑆. Finally, by multiplying the direct intensities matrix with the Leontief matrix, we arrive at the table with the extended coefficients 𝑀, i.e. the total intensity matrix (direct and indirect). Calculated as:
We use these two matrices to perform our analysis. To calculate the indirect upstream emissions, we just have to use the matrices supplied by EXIOBASE:
Results
We can extract some interesting insights from the transaction matrix, for example, In the following graph we represent the raw materials and transport industries that contribute the most to the production of energy in the EU27. We left out business services (like HR support) and machinery to focus only on raw materials and transport needs. One of the first things we can notice is how the energy sector is still wildly dominated by non-renewable sources.
The non-renewable energy industry buys coal and gas both from European and non-European sources. Interestingly, to produce energy in Europe it is mostly used the natural gas produced in Europe. However, the low contribution of non-EU natural gas to energy production suggests that it gets consumed directly by consumers and other industries.
Also, as a curious note, it seems like all energy production industries need to buy natural gas to some degree. For nuclear energy, the money spent on uranium seems to be minor, in comparison with gas purchases, as it didn’t make the cut for the graph.
Continuing, from the matrices of intensities we extract the following figures, representing the median direct intensity:
For direct emissions we observe expected patterns. The most emitting industries are the non-renewable energy producers, extraction industries, or transportation. On the other side we find retail services, electricity trade and nuclear energy production.
In the next plot, we represent the ratio of indirect to direct intensities. The higher the ratio, the lower is the direct contribution with respect to the indirect sources.
For indirect emissions, we observe other interesting results. Trading of energy of course has a big indirect impact since they do not emit themselves, but buy a lot of polluting energy. Other industries that have high indirect impact are the ones that use lots of energy in their operations, such as the manufacture of machinery, electronics and textile products. We also see how the production of nuclear energy has a high impact ratio, probably due to the low direct emissions.
One industry that is always the center of attention is the transportation sector, because it is a big contributor to Scope 3 emissions throughout a company’s value chain due to the amount of greenhouse emissions that this sector produces. We can see this behavior in the graphs. The ratio of indirect to direct emissions of water and air transport fall in the low side of the graph, probably as a result of the large amount of direct emissions.
Conclusion
In summary, in this post we saw the use of the EEIO tables to analyze social and environmental indicators, both due to the direct and indirect activities of the industry. We also explained how the pymrio tool allows us to work with these tables incredibly easily.
Given the macroeconomic nature of the EEIO tables, it is a model that allows one to analyze and gain a very good understanding of supply-chain-related environmental pressures due to production and consumption activities. With some additional considerations, it can be used to evaluate the carbon footprint of a given company.
As data scientists it is absolutely necessary to have a deep knowledge of the data we are handling, as part of the exploration of the data and well before we even start to build the machine learning model. It is an essential part of our work, not only to be able to discern the quality and correctness of the data, but also to be able to evaluate and explain the results of the model.