Vertical coordinate transformation with Pangeo — Have some pancakes and let Xgcm do the work

Julius Busecke
pangeo
Published in
9 min readNov 2, 2020
The perks of fieldwork in the subtropics: Sunny days, all day! — Sarmiento De Gamboa 2013 by Julius Busecke

The ocean, a vast, mysterious, and beautiful place, but did it ever make you think of pancakes?

Well if you are an oceanographer, it makes a lot of sense. The first time I heard that analogy was in a seminar by my good friend Sjoerd Groeskamp.

“If you pour syrup on a pancake, it very easily spreads horizontally but needs a long time to sink into the pancake”, which is quite a good analogy to what is happening with ‘stuff’ in the ocean. The ocean is stratified, meaning that lighter waters sit on top of heavier waters. You can imagine the ocean as consisting of layers (or pancakes).

‘Stuff’ like carbon, oxygen, nutrients, and other things are transported relatively easily along the density layer, but moving things across layers requires a lot of energy. Thus, in a lot of ocean science, it makes a lot of sense to ‘extract’ a certain layer of interest and look at how properties evolve on there.

The problem is that these pancakes in the ocean are neither fixed in time (they get thicker/thinner and move around) nor are they aligned with the depth. The density layers usually tilt upwards in the polar regions, where the heaviest water masses are formed.

This presents an acute problem since most datasets are given in depth-coordinates, which depending on the location in the ocean can give you a false image of where stuff enters the ocean. In this really cool video by Dhruv Balwada you can see this in action. Dhruv ran an idealized model and put a tracer in the surface ocean. The left panel shows the time evolution on a depth level. The right panel shows the same thing but on a surface of constant density.

All that red you see further up is likely coming from a different part of the surface. When you look at the right panel you can nicely see how the red color is mixing slowly away from the high concentrations.

So how do we transform datasets into different vertical coordinate systems in Python? The main idea relies on the fact that we can treat each ‘column’ of the ocean as a 1-dimensional array, which is independent of its neighbors. We can do that due to the large aspect ratio of the ocean: Lateral dimensions are orders of magnitude larger than vertical dimensions.

The task of vertical coordinate transformation is an old and heavily used technique, but there was no ‘go-to’ Python package to achieve this task for large datasets. Many people realized this and have developed methods:

  • My own attempt in Xarrayutils, a similar approach to Raphael Dussins VCR package
  • Spencer Jones’ Xlayers
  • An elegant solution to preserve extensive variables by Romain Caneill
  • and I am sure about 100 other implementations that I am not aware of…

All of these approaches had pros and cons, and we saw the need to gather all the experience the community had made on this problem to galvanize around a standard way to perform this very common task with Xgcm.

Our requirements were:

  • Seamless integration with the existing Pangeo ecosystem
  • Ease of use, while maintaining accurate calculations.
  • Flexibility. We had some ideas of which methods to implement (linear interpolation and conservative transformation — more on that later) but this approach should be flexible enough for others to ‘plug in’ their own transformation approaches.
  • Speed. Many implementations (like my own) did not scale properly and transforming modern high-resolution datasets took hours to days.

The breakthrough for this development was the use of Numba guvectorize . Without going into too much detail, this enables you to write up very simple algorithms that deal with a 1-dimensional array and numba will vectorize it on any n-dimensional array, and it will do it in WARP SPEED!

Ryan Abernathey implemented the low-level logic for both linear interpolation and conservative transformation (preserves extensive quantities along the axis) in a pull request back in 2019, and I immediately went to try it on a project I was working on for my postdoc. It worked phenomenally well, and I was convinced we should generalize this in Xgcm. Well, first I was a bit sour because it was better than the approach I developed and I felt that we could have saved time if we would have coordinated better. But in retrospect, I think that us (and many others) working on it has made the end result better for everyone, and that is what open source development should be about, not my ego!

As soon as I started my new job at the Climate Data Science Lab at Columbia University in September, I set out to work on the higher-level implementation of these algorithms in Xgcm. We started by rallying folks on GitHub and had a lively discussion on how to design the API for this new function and how to call it.

I have to say that this whole process was a definite highlight of my daily work. We got a ton of input from many different users and settled on ‘crowd-sourced’ naming and API design before I even started writing any serious code. We ended up naming the new function.transform() and after all of this sentimental writing, let’s get to the good stuff: Some demos with real data.

Let's use one of the CMIP6 models (NCAR’s CESM2) on Google Cloud Storage as an example:

As of now, you will have to install two packages that we will use from source: Cmip6_preprocessing and Fastjmd95 (and optional packages to reproduce the animation at the end). You can do that by executing this code in your notebook cell (or run it in your terminal withouth the !):

!pip install git+https://github.com/jbusecke/cmip6_preprocessing.git
!pip install git+https://github.com/xgcm/fastjmd95
# These are only needed for the animation at the end
!conda install ffmpeg -y
!pip install xmovie

And then we can import the data from the Pangeo catalog

So this dataset contains the potential temperature thetao , the salinity of ocean water so , and cfc11 the concentration of CFC11, a Chlorofluorocarbon (still one of the best tongue twisters I remember from high school Spanish) that had a strong anthropogenic increase in the atmospheric concentrations starting around the 1970s.

Source: https://www.esrl.noaa.gov/gmd/hats/combined/CFC11.html

To help us with the heavy lifting we will also need a dask cluster

It is important to read in the dataset before setting up the dask cluster, or the preprocessing won't work!

For more details on what is happening here, check out this notebook that explains in detail, how Cmip6_preprocessing makes your life easier when working with CMIP6 data in the Pangeo cloud.

We only have three variables in the dataset but none of them is density. In many regions of the global oceans, temperature is a good proxy of density, but since we are going to look at the Southern Ocean, let's calculate potential density from potential temperature and salinity

Ok, let's move on and check out the data. For this example, I will focus on a section of the Southern ocean along 130W in the upper 2000m of the water column

Plotting potential temperature, salinity, and CFC11 concentrations with some density layers highlighted with white contours

Latitude-Depth sections along 130W in the Southern Ocean. From top to bottom: Potential temperature, Salinity, CFC11 concentrations

Notice how that ‘tongue’ of high CFC concentration and low salinity extends from the surface towards deeper layers, starting at around 60S. This is a well-known signature of ocean subduction, and the local salinity minimum in the subsurface forms a distinct water mass: Antarctic Intermediate Water.

But now let's get fancier. We will transform the dataset into density coordinates using linear interpolation (the default method). To do this we will need to create an Xgcm Grid object with both center and outer grid positions (more info about this in the Xgcm docs).

The dataset does have the vertical cell center position but we need to construct the outer position from the cell bounds

Now we can create the grid object

Note: You could add more axes if you want to do other operations like e.g. derivatives, integrals, etc, but for this example, we will keep it simple.

The last thing we need is an array of ‘target values’, meaning the values we want to interpolate our data to, in this case, these values will describe density.

Lets first transform the salinity

Ok, what happened here? We passed the data array so to grid.transform() with the additional information of which axis to act on (here Z ), which values to interpolate to ( target_values) and where to find the density values in depth space ( target_data). The result is a data array which has all the same dimensions except that lev (the depth) was replaced by sigma_0 (the values correspond to target_values) . For more information on Xgcm’s transform function head over to the docs.

Let's repeat this with the CFC data and package the data arrays into a new dataset for convenience

Now we can plot the same section we saw previously but in density coordinates

Latitude-Density sections along 130W in the Southern Ocean. Top: salinity; Bottom: CFC11 concentrations

We can very nicely see that the local salinity minimum aligns along an isopycnal (26.6 is highlighted in orange). The transformation also allows us to analyze the subtle differences between these two tracers. While the CFC values did look quite similar in the depth view, here they do not align as well with the density lines. So why would that be?

A likely reason for the difference is that CFC is a transient tracer, meaning its concentrations changed dramatically over time, and the gradient along isopycnals is actually representing the CFC from the surface subducting over time. Even if the concentrations of CFC11 are higher in the surface along the 26.6 isopycnal, we find higher concentrations on lighter layers at depth. This tells us that the surface values from further north are subducted at a faster rate than the water masses in the Antarctic Intermediate Water. This is just one way of how transformation into density coordinates can give us a more quantitative/refined look at ocean phenomena.

Another nice thing we can do with the transformed data is to look at the spatial distribution of tracers

Salinity (upper) and CFC11 (lower) on the sigma0=26.6 surface in 2010

We can see that many of the spatial patterns between salinity and CFC are similar: The western boundary currents carry high salinity and low CFC concentrations southward along the coasts of Afrika, Australia, and South America. The furthest northward extent of the low salinity and high CFC follows the interior gyre flow from the eastern side of the basin towards the center in all major ocean basins.

Finally, because there is no way my first Pangeo blog post does not have an animation, lets quickly visualize the ‘invasion’ of CFCs into the Antarctic Intermediate Waters using xmovie

We think that Xgcm’s transform functionality fulfills all the goals we set out to achieve. It is a fast, accurate and user-friendly package that integrates nicely into the existing Pangeo ecosystem. This was only possible due to the amazing comments and volunteers who tested and improved this feature! We will continue to rely on the input of the broader earth science community to further evolve this functionality. Ultimately we hope that due to the modular structure, we can integrate even more algorithms under the same ‘hood’ and make Xgcm the default choice to do any kind of vertical coordinate transformation in Python.

This work was supported by the Gordon and Betty Moore Foundation.

--

--

Julius Busecke
pangeo
Writer for

(he/him) | Associate Research Scientist, Earth & Environmental Sciences, Columbia University. https:\\www.juliusbusecke.com