During the past few weeks, in conjunction with the AWS Public Dataset Program, we have been working to bring reanalysis data to AWS. Today we are excited to announce that an initial subset of ECMWF ERA5 data is now available in Amazon S3.
Introduction to ERA5
ERA5 is the fifth generation of ECMWF atmospheric reanalyses of the global climate, and the first reanalysis produced as an operational service. Reanalysis products are important because they assimilate vast amounts of observational data with physical models to create a detailed and consistent overview of weather and climate conditions. This is particularly important for areas where observational data is either sparse or non-existent, such as the oceans or polar regions.
ERA5 utilizes the best available observation data from satellites and in-situ stations, which are assimilated and processed using ECMWF’s Integrated Forecast System (IFS) Cycle 41r2. ERA5 provides improved accuracy and better spatiotemporal resolution than its widely used predecessor ERA-Interim, which utilizes a 2006 release of IFS (Cy31r2).
The ERA5 dataset contains one (31 km) high resolution realization (HRES) with hourly temporal resolution, and a reduced resolution (62 km) ten member ensemble (EDA) with 3-hourly temporal resolution. A set of basic statistical parameters (min, max, mean, standard deviation), accumulations, and an uncertainty estimate from the ensemble are also produced.
Vertically, the atmospheric data spans 137 hybrid sigma/pressure (model) levels and are also interpolated to 37 pressure, 16 potential temperature and 1 potential vorticity level. Over 240 parameters are produced on the surface and single level alone, including parameters unavailable in ERA-Interim such as 100-meter wind components.
The dataset is being released incrementally and is currently available from 2008 through near-present day (roughly 3 months after real-time). In 2019 the full extent of the data will become available, spanning from 1950 through present day. In total, the entire ERA5 dataset is roughly 9 petabytes of data.
Initial Studies Highlight Notable Improvements
Given its global spatial coverage, higher resolution, and improved data assimilation and processing, ERA5 has been a widely anticipated reanalysis dataset. Studies investigating the initial tranche of ERA5 data have shown favorable results, particularly its performance when compared to other reanalysis products or regions where observed data is unavailable.
In his article ERA5: The new champion of wind power modelling?, Jon Olauson compares ERA5 to MERRA-2 in terms of modeling wind power both for countries and for 1,051 individual Swedish wind turbines. He concludes “that ERA5 performs better in all analyzed aspects. As an average for the five studied countries, the mean absolute errors in modeled hourly generation (when compared to measurements) were 24% lower for ERA5.”
ERA5 performs better in all analyzed aspects. As an average for the five studied countries, the mean absolute errors in modeled hourly generation (when compared to measurements) were 24% lower for ERA5.
In Evaluation of global horizontal irradiance estimates from ERA5 and COSMO-REA6 reanalyses using ground and satellite-based data, which investigates the gap between satellite-based data and reanalysis data, Urraca et. al. conclude that “the quality of previous reanalyses such as ERA-Interim or MERRA-2 was quite limited and their use was generally not recommended. The new ERA5 is a substantial quality leap in the estimation of surface irradiance with reanalysis models, but still satellite-derived data should be preferred when available… ERA5 can be a valid alternative for regions not covered by geostationary satellites such as the polar regions, as well as to fill gaps in time series.”
Although these studies are based on a limited subset of the final ERA5 dataset, the improvements from ERA-Interim and MERRA-2 are notable. As more ERA5 data is published over the next year or so, it will be interesting to see what additional research reveals.
ERA5 Data on S3
ERA5 data are officially archived and distributed in two locations: ECMWF’s Meteorological Archival and Retrieval System (MARS) and the Copernicus Climate Data Store (CDS). The ERA5 data on S3 has been acquired from the ECMWF MARS system in GRIB format and transcoded into NetCDF 4 files each containing one month of data for one variable.
The bucket currently contains a curated set of 18 widely used surface and single level parameters from the ERA5 HRES sub-daily forecast stream. This includes 10-meter and 100-meter wind components, 2-meter temperature, total precipitation, significant wave height, and downwards solar radiation. Additional details on the available variables and data structure are available in the online documentation.
We’ve published a sample Jupyter notebook on GitHub to help you get started using ERA5 data on S3. The notebook examines how the data is organized, shows how to download files in NetCDF format, and finally perform some basic analysis on the data to investigate temperature in various locations.
An additional Jupyter notebook that uses ERA5 data to analyze potential wind energy production is also available in GitHub. This example uses the Planet OS Datahub API to acquire a subset of the ERA5 data, however it can easily be adapted to use S3 instead.
You can also download files directly from S3 via http using the bucket name and the full key name for the object. For example, the url to download air temperature at 2 meters for January, 2008 is: http://era5-pds.s3.amazonaws.com/2008/01/data/air_temperature_at_2_metres.nc4.
If you’re not familiar with the NetCDF file format, we recommend reviewing the Unidata’s Introduction to NetCDF. Unidata also provides a list of software tools for manipulating and displaying NetCDF data that is quite comprehensive. For quick visualization of NetCDF data, we recommend NASA’s Panoply data viewer.
If you’re looking to access specific subsets of the ERA5 data, it is also available on the Planet OS Datahub, which provides additional query options and response formats. This interface may be more suitable for those performing coordinate-based analyses or more familiar with ASCII data formats such as CSV and JSON.
We will be continuously updating the ERA5 S3 bucket as new data is released and plan to support the full temporal extent once available.
As always, feedback on the data is welcome. If there are additional variables that you’d like to see made available on S3, please reach out to email@example.com and let us know!