Changing the way we look at Earth data with xESMF:

How our different ways of looking at Earth data became a mathematical and software problem

Raphael Dussin
pangeo
4 min readOct 26, 2020

--

A representation of ocean model grids: global tripolar grid and US east coast in LCC

You cannot compare apples to oranges. That’s one of the problems Earth scientists face on a regular basis. There are multiple ways to describe a finite ensemble of geographical locations and the areas they encompass. Because Earth is a sphere (sorry not sorry flat earthers), one needs to choose a coordinate reference system (or CRS) to transform tridimensional data into bidimensional data that is usually referred to as a map. There are about a dozen commonly used projections for CRS: the one most people are probably familiar with is the Mercator projection, which is used in a lot of classroom world maps. But different projections can be more suited for more specialized fields of research. Polar scientists will prefer polar stereographic projections, which will give them a map centered on their area of interest. Once a CRS is chosen then one must decide on the spatial resolution (distance between two points) as well as how to represent the areas delimited by these points. The most common representation for the areas is called a grid, where each cell is a parallelogram delimited by 4 points. There are more complex representations using triangles or hexagrams (think chicken wire) referred to as meshes. The choice of the representation is usually dictated by the numerics used to resolve the set of equations of interest (e.g. an ocean or atmosphere general circulation model). For example, ocean models would use a displaced pole or tripolar grid to solve the problem of the North Pole singularity (cells area converge to zero) and atmosphere models would use a cube-sphere grid or a spectral representation.

Because of the diversity of approaches used in models and observations, scientists often have to go back and forth between different representations. This process is called regridding or remapping. It is the mathematical operation that consists in transforming an observed or modeled quantity (e.g. air or ocean temperature) from one set of geographical points onto a different set. This is a rather computationally expensive operation that aims at computing how much each point of the source set will contribute to any given point of the destination set. Because these contributions come only from neighboring source points, the operation produces a sparse matrix which will then be used to remap quantities. There are several methods that can be used to evaluate the contribution (i.e. weight) of the source points, each with their strengths and weaknesses. The most common algorithms are:

  • Nearest Neighbor: find which source point is the closest and set its weight to 1
  • Bilinear: find 4 closest points and set their relative weights depending on their distance
  • Conservative: adjust weights so that the area integral of the property is conserved

ESMF is a scientific library used in Earth System Models that provides this remapping capability to Earth scientists. Contrary to general purpose algorithms for bidimensional interpolation, ESMF is suitable for remapping on the sphere and has special treatments for East-West periodicity and polar singularities. Because it is written with parallel computing support, it can handle massive remapping operations. Remapping between grids of several hundred million points (1e17 weights to compute) can be done in minutes on hundreds of CPU cores. The remapping algorithms aforementioned and more are available therein, as well as support for grid, meshes and unstructured collections of points.

Over the last decade, the python programming language has become very popular within the scientific community thanks to the efforts of developers’ communities building specialized packages to address the needs of scientists, from linear algebra (numpy/scipy) to creating maps (matplotlib/cartopy). In recent years, xarray has become one of these go-to packages for Earth scientists.

xESMF was developed by Jiawei Zhuang to provide an xarray-compatible interface to the powerful ESMF remapping library, building on top of the ESMF low-level python interface (ESMPy). Using xESMF, scientists can perform the remapping operations needed for their science in a simple and intuitive way. They can also make use of the lazy and distributed processing implemented in xarray and using dask. xESMF provides most of the functionalities available in the ESMF library and is fully compatible with ESMF remapping command line tools. xESMF started as a nights-and-weekends project by a single Ph.D. student, but it quickly grew to have a wide audience. We are happy to announce that xESMF is transitioning to a community development model, hosted under the Pangeo github organization (see updated documentation and source code).

Regular releases are planned to provide support for more remapping applications, including meshes. Thanks to the efforts of new international collaborators, xESMF version 0.4 is now out with better support of the core functionalities. The current release includes:

  • Remapping to and from grid and loctream (collection of points) objects
  • Masking and extrapolation
  • All remapping algorithms (nearest neighbor, bilinear, conservative, patch,…)

To conclude, Earth scientists will always have different ways to peel their apples and oranges. Remapping their slices of data is a recurring task and xESMF provides a way of doing it accurately and efficiently. The quality of the science depends on all the successive steps taken in producing and analyzing the data, and good remapping is critical because… you cannot compare apples to oranges!

--

--