Validating near real-time electricity emissions rates with an hourly historical benchmark

Gailin Pease
Singularity
Published in
9 min readSep 20, 2022

--

Electricity generation is responsible for one-quarter of carbon emissions in the U.S., so many companies, cities, and policymakers who care about climate change have focused their efforts on reducing emissions from their electricity use. Real-time data about the emissions intensity of grid electricity, such as those available through Singularity’s Carbonara platform, is one important tool that empowers consumers to understand their emissions and make timely decisions about their electricity use to decrease emissions.

In one use case, Singularity and Sense collaborated to demonstrate that carbon-aware decision making can reduce the carbon impact of electric vehicle charging by up to 43%. Accurate, real-time emissions data also enables more timely accounting of scope 2 emissions, which will be increasingly important for enforcement of emissions regulations and reporting of corporate climate risk. Realizing these benefits depends on the accuracy of real-time emission rate estimates, but how confident should we be in their quality?

Singularity Energy has performed what we believe is the first validation of near real-time hourly emissions data against a historical benchmark. This validation was made possible by Singularity’s recently released Open Grid Emissions Initiative (OGE), an open-source dataset of historical, validated grid emissions that we believe represents the most accurate, comprehensive, and high-resolution grid emissions dataset for the U.S. available to date. Our analysis revealed that Singularity’s near real-time emissions data is accurate for the vast majority of electricity consumers in the United States. In the regions where we found issues with near real-time data, the comparison has already informed directions for future work on improving near real-time data in those regions.

Figure 1: In the New York ISO (NYISO), real-time emission rates agree in trend and magnitude with the benchmark provided by OGE. However, there is a larger difference in magnitude of emission rates in the summer (right panel) than in the winter and spring (left panel), indicating that changes in emission rates throughout the year may not be fully captured by real-time data.

Why might real-time estimates differ from benchmark data?

To understand why real-time grid data may differ from benchmark data, it’s useful to understand where real-time emission intensities come from. Singularity multiplies near real-time net generation data for each regional generation fleet (natural gas, coal, wind, etc) by an annual-average, fleet-specific generated emission factor from the most recent year of the EPA’s eGRID database, resulting in emissions intensities available as soon as generation data is published (hence our use of “near real-time” to refer to the data compared here; though the same insights apply to true real-time data calculated using the same methodologies). In contrast, the Open Grid Emissions benchmark emissions data is mostly from measured hourly plant-level emissions supplemented by reported fuel consumption data.

There are several reasons why near real-time emission rate estimates may be inconsistent with benchmark data:

  1. The near real-time net generation data reported by grid operators may sometimes be incomplete, with grid operators missing or mis-reporting some fraction of generation on their grid.
  2. The broad fuel categories used to describe real-time generation data may include several energy sources with different emission rates. For example, biomass, geothermal, and waste would all be grouped into the “other” in EIA’s Hourly Electric Grid Monitor, one of the main sources of real-time generation data.
  3. The emission factor we use to convert generation to emission rates is specific to the generators of each fuel type in each region, but is an annual-average value. Using a constant value doesn’t allow us to capture seasonal or daily variation in fleet emission intensity.
  4. We source the emission factors from eGRID, which is released with a 1–2 year lag. This means that the emission factor will not reflect recent changes in each fleet, for example, if older, dirtier plants of a fuel type have been retired.

Real-time data is both vital for decision making and inherently limited by the data available in real time. Without comparing to a validated benchmark dataset, we can only guess how much the limitations of real-time data affect the quality of that data. Without an accurate picture of data quality, decision makers may have limited confidence in real-time data which results in less effective decisions. We hope that this validation increases confidence in real-time emissions reductions and informs improvements of real-time data.

How did we validate real-time estimates?

Accurate real-time emissions data should reflect both the hourly profile of emissions and the total magnitude of emissions. For applications like load shifting and energy storage, an accurate hourly shape is the most important feature of real-time data, since it represents which hours are relatively cleaner or dirtier than others. For scope 2 carbon accounting, an accurate emission magnitude is the most important feature, so that grid emissions can be correctly allocated.

To determine if a real-time dataset meets these criteria, we need an hourly picture of true grid operations, made freely available for the first time by Singularity Energy’s Open Grid Emissions Initiative. The first release of this dataset covers 2019 and 2020, and builds on decades of work in understanding the electric grid, including the EPA’s work on the eGRID emissions database and Catalyst Cooperative’s work to make U.S. energy data uniform and accessible via PUDL. To learn more, and to hear about use cases outside of validation, check out our announcement blog post.

Figure 1 compares near real-time and benchmark data for 2020 in NYISO, the grid operator serving New York state. The shapes of the two profiles match well throughout the year, but the magnitude is different, with larger magnitude errors in near real-time data visible throughout the summer. To capture these differences, we use several metrics to evaluate how well near real-time estimates match OGE benchmark data in magnitude and shape.

To evaluate how accurate the hourly magnitude of the data is, we calculate median absolute error and median absolute percentage error, which reflect how far each hourly estimate is from the benchmark value. To evaluate how closely the shape of real-time data matches the benchmark, we calculate the correlation of the two time series across the entire year. A correlation coefficient of 1 indicates that the shapes match exactly, and a coefficient of -1 means that the shapes are opposite (e.g. near real-time data shows that emissions are relatively high when they are actually relatively low). A coefficient of zero means that there is no relationship between the two shapes. Ideal real-time data would have median absolute and percentage error close to zero and correlation close to 1.

For this comparison, we used near real-time emission rates calculated from EIA-930 generation data using Singularity’s methodology. We use EIA-930 data because it is the only public near real-time data source available across the entire continental U.S. In some regions, more accurate and/or more timely real-time data may be available directly from grid operators; however, the same methodological limitations will affect real-time data regardless of source. We evaluate near real-time data after outlier detection via a rolling filter, since some regions report extreme values during periods of missing data that have an outsized effect on correlation.

Figure 2: Two of the metrics we use to evaluate the accuracy of real-time data plotted against BA size. Real-time data in larger BAs generally has better accuracy than in smaller BAs, both in magnitude (difference) and shape (correlation). Ideal real-time data would have correlation of 1 and difference 0 when compared to benchmark data (vertical lines).

Results of our validation

When we compared Singularity’s real-time grid emissions data to benchmark OGE data, we found that the near real-time data agreed with the benchmark in most hours and regions. We found that emissions intensities match well in larger balancing authorities (BAs), including California ISO (CISO), the Midwest ISO (MISO), and the New England ISO (ISNE).Figure 2 demonstrates this trend, with larger BAs having consistently high correlation and low differences between near real-time and benchmark data. This means that most electricity consumers can feel confident when making decisions using real-time grid data.

The absolute average difference in emission rate across all U.S. BAs (weighted by BA generation) is 39.5 lb CO2/MWh, only 6.8% of the average benchmark emission rate. The average difference is negative, meaning that on average, near real-time estimates are slightly cleaner than benchmark data, although this varies by BA (see Table 1). The average correlation across BAs (weighted by BA generation) is .88, indicating that the shape of real time data matches well with benchmark data. Many large BAs actually perform better than these averages (Table 1). When making decisions based on real-time data, customers should consider real-time data quality in their specific region.

Two large BAs, PJM and BPAT, have relatively worse metrics, indicating that they are further in shape (PJM) and magnitude (BPAT) from benchmark data. In the case of PJM, a large grid operator in the mid-Atlantic, the relatively low correlation is due to low-quality near real-time data in January, 2020. Excluding January 2020, the correlation between real-time and benchmark data in PJM is .94. In BPAT, a grid operator in the Northwest with primarily hydro generation, consistently underestimated near real-time emission rates are caused by a single coal power plant. Although this plant supplies power throughout the year, especially during fall and winter, BPAT reports no coal generation in near real-time data, artificially lowering real-time emission rate estimates.

Table 1: Data quality metrics comparing Singularity’s real-time and benchmark OGE data for the 12 largest BAs by annual generation (find data for all BAs at the bottom of this post). Negative rate differences indicate that real-time data underestimates emissions, while positive differences indicate that real-time overestimates emissions. Ideal real-time data would have a rate difference of zero and a correlation of 1. All comparisons are made using generated emission rates over 2020.

In small balancing authorities, the picture is more mixed. Smaller BAs have more frequent and longer reporting gaps in near real-time data, perhaps because they have fewer resources to dedicate to reporting. In addition, there are inherent limitations to applying real-time methodologies in a small region. The same sources of noise present in all real-time data cause larger problems in smaller BAs. Temporal and inter-plant variation in emission rates may be more important in small BAs, where only a few plants are operating in each hour. In large BAs, generation from a fuel type represents generation over dozens of plants in each hour, so the emission rate over those plants will remain close to the average emission rate for that fuel type. In a BA with only a handful of plants of each fuel type, variation in the emission rates of each plant has a larger impact on the average emission rate.

Future directions

Singularity continually seeks to improve our real-time data, and we are excited about the opportunities Open Grid Emissions provides to further that goal. In order to improve the quality of the near real-time net generation data that we use for this calculation, we’re working with the EIA to improve the data balancing authorities submit to EIA’s Hourly Electric Grid Monitor. Our work on Open Grid Emissions has revealed some cases where balancing authorities submit data with the wrong time stamps, or submit transmission data incompatible with their generation and demand data. By working directly with the EIA to fix these data issues, we ensure that all users will have the best possible near real-time data, enabling us to work together towards grid decarbonization.

This validation also revealed opportunities to improve the fleet emissions factors that we are using to convert net generation to emissions. For example, for some fleets, there is strong seasonal or hourly variation in the carbon intensity of generation, suggesting that we could potentially improve the accuracy of our real-time estimates by using month-specific or monthly time-of-day specific fleet emission factors from OGE. Figure 3 compares the benchmark hourly natural gas emission rate to eGRID’s annual natural gas emission rate, used to calculate real-time emission rates. There is significant variation in the benchmark natural gas emission rate, with consistently higher rates than eGRID during the summer, which may explain some of the discrepancy between real-time and benchmark data shown in Figure 1.

Figure 3: Hourly vs. annual natural gas emission rates in NYISO, the New York ISO. Hourly emission rates have significant variability around eGRID’s annual average emission rate, used to calculate real-time emission rates. Large peaks (eg, May 21, 2020) are likely due to plant start-up.

Furthermore, because the OGE dataset uses higher-resolution input data than eGRID, we’ve found that even on an annual basis, the OGE fleet emissions factors may be more accurate than those reflected in eGRID. Moving forward, we may switch to using OGE data as the source of these fleet emission factors.

In the coming months, Singularity will be working to incorporate these insights into our real-time and near real-time data offerings. As the Open Grid Emissions initiative continues, we will continue to look for opportunities to improve our data, helping our customers work towards their climate goals.

Appendix: full comparison of results across all balancing authorities

Table 2: Results across all balancing authorities.

Table notes:

1. HGMA and HST are not included because they have zero 2020 generation in OGE data.

2. JEA, EEI, and NSB are not included because their real-time emission rates are not available from Singularity’s API.

3. CHPD, GCPD, DOPD, WAUW, SEPA, YAD, WWA, GWA, CPLW, and HGMA are hydro-only balancing authorities with 0 lb/MWh CO2 emission rates in OGE. In some cases, real-time emission rates are higher because the BA reports non-hydro generation to EIA-930. Because the emission rate in these BAs is zero, we cannot calculate a correlation or percentage difference.

--

--