Introducing Pyleoclim: Paleoclimate Timeseries Analysis and Visualization with Python

Deborah Khider
CyberPaleo
Published in
6 min readOct 3, 2022

If you have been following LinkedEarth since its inception, you may already be familiar with Pyleoclim: a Python package for the analysis of paleoclimate data.

Since its first GitHub commit on May 24th 2016, the package has undergone continuous development, several major overhauls in how a user can interface with the package, and has been used in several workshops for the paleoclimate community. The most recent (and stable) version of the package is published in Paleoceanography and Paleoclimatology (a pre-print version of the article is available on ESSOAr).

What is Pyleoclim?

Pyleoclim’s raison d’être is to make the analysis of paleoclimate data easier, more rigorous, and frankly, more fun! Under the hood, it uses the libraries familiar to many scientists using Python such as: NumPy, Pandas, scipy, scikit-learn, Matplotlib, cartopy, and seaborn to name a few. This enables most tasks a paleoclimatologists would like to perform: spectral and wavelet analysis, coherence analysis, paleo-aware correlations, model-data comparison, singular spectrum analysis along with the pre-processing functions necessary for tasks such as standardizing, detrending, interpolation, binning, filtering…

What makes Pyleoclim easy to use is its object-oriented interface. If you’re familiar with Python, you may already be familiar with the power of object-oriented programming (OOP). OOP places the data at the center of the analysis rather than the functions. The objects contain both data and metadata in the form of fields that can be entered by a user (e.g. a timeseries would require at least values for time and the quantity being measured in time, but optionally allow for labels and units) and code that represents procedures that are valid for the data. The number of fields is dictated by the procedures (and their desired level of automation).

Pyleoclim contains several objects ubiquitous to paleoclimate data analysis as shown in the figure below.

The Pyleoclim User Interface describing main objects and functions attached to these objects.

Fantastic, but how does this all work in practice?

Let’s give Pyleoclim a try and perform spectral and wavelet analysis on on a Niño3.4 sea-surface temperature dataset. First, we need to import the relevant libraries and load the data into Pandas.

import pandas as pd
import pyleoclim as pyleo
url = 'https://raw.githubusercontent.com/LinkedEarth/Pyleoclim_util/master/example_data/oni.csv'df = pd.read_csv(url,header=0)
df

Now that we have the data loaded in Python, our next step is to create a Series object. Series is the Pyleoclim object that carries most of the functionalities. As you can glean from the documentation, this object can be created by providing two pieces of required information:

  • time: the values for the time axis
  • value: the values for the variable of interest

And other pieces of metadata. Although they are not required, they allow for some automation when provided (we will see some example in this blog):

  • time_unit: the units of the time axis
  • value_unit: the units for the variable of interest
  • time_name: the name used to represent time (e.g., Year, Age)
  • value_name: the name of the environmental variable
  • label: a label for the timeseries
  • clean_ts: if set to True, will automatically remove NaNs.

Let’s create our object…

ts = pyleo.Series(time=df['Dec year'],value=df['NINO34_ANOM'],time_name='Year', value_name='SST anomaly',time_unit='CE', value_unit='$^\circ$C',label='Niño 3.4', clean_ts=True)

… and make a plot. This is as simple as ordering the Series to plot (literally):

ts.plot()
This graph displays the Nino3.4 timeseries from 1880 to present
Niño 3.4 timeseries

Our command was simple but since we provided the names and units, Pyleoclim automatically labeled the axes. Let us now perform spectral analysis. First, we need to decide what pre-processing steps to apply. There is no obvious trend in the data so detrending is not necessary. Standardizing prior to spectral analysis is recommended, however. Furthermore, some of the methods for spectral analysis requires the data to be evenly-spaced. Let’s check that assumption:

ts.is_evenly_spaced()

This returns False, which is most likely due to the conversion to decimal year. This means that we need to interpolate first. Here, we will use a simple linear interpolation, which is the default in Pyleoclim:

ts_interp = ts.interp()

We now have a new Series object that we can standardize prior to analysis:

ts_std = ts_interp.standardize()

Now let’s perform spectral analysis using the MTM method:

PSD = ts_std.spectral(method='mtm')

This operation results in the creation of the PSD (Power Spectral Density) object, which has specific functionalities:

Table listing available methods for the PSD object in Pyleoclim
Methods associated with the PSD objects in Pyleoclim

One of them is a plot. Notice that we were able to reuse the name plot to describe the same action (i.e. plotting) for a Series and a PSD. However, you would expect different plots for a timeseries versus a periodogram. And you would be correct in that assumption:

PSD.plot()
Periodogram of the Niño3.4 series
Periodogram of the Niño3.4 series.

By default, Pyleoclim plots this spectrum in log-log coordinates, where power laws show up as straight lines. This can be further quantified using the beta_est() method.

This is one of the advantages of OOP: a similar action (in this case, the act of plotting) performed on different objects yield results consistent for that object.

Another functionality attached to the PSD object is to establish significance through an AR(1) benchmark:

PSD_signif = PSD.signif_test()
PSD_signif.plot()
Periodogram of the Niño3.4 series with 95% confidence interval measured by an AR1 benchmark

Unsurprisingly, the record displays significant power in the ENSO band.

Another advantage of OOP is that is allows for method chaining, which allows the plot above to be obtained in a single (if somewhat long-winded) line of code:

ts.interp().standardize().spectral(method='mtm').signif_test().plot()

The advantage is that such a line succinctly encapsulates all the transformations and computations applied to the data to get to the visualization (that is, the workflow). This makes it easier for people to follow what you did and reproduce your analysis.

Great! We got the results of our spectral analysis. But it doesn’t tell us whether the significant periodicities were consistent throughout the record. We need wavelet analysis to do so. In this example, let’s use the continuous wavelet transform, popularized in Matlab by Torrence and Compo (1998):

scal_signif=ts.interp().standardize().wavelet(method='cwt').signif_test()

And plot the results (and bonus points if you can guess the name of the function that allows you to do just that):

scal_signif.plot()
Scalogram of the Niño 3.4 timeseries

The periodicities are robust throughout the record in the ENSO band with a “quiet” period in the 1940s and 50s.

ok, I’m sold! How can I learn more about Pyleoclim?

We have several resources to get you started:

  • The Pyleoclim documentation will help you get started with the package, including installation
  • the manuscript describes the package and main functionalities along with three scientific use cases. You can reuse this code to create your own!
  • Self-paced tutorials to get you started with the package. If you wish to see the notebooks in their regular format, use the GitHub repo directly instead of the book.
  • PaleoBooks, a collection of Jupyter notebooks illustrating the scientific use of software libraries maintained by LinkedEarth. This collection will see rapid growth in 2022/2023
  • You can also get help from the community on our Discourse forum!

If you are new to Python, we recommend the following resources:

  • Pythia Foundations, a community learning resource for Python-based computing in the geosciences.
  • Self-paced tutorials to learn basic Python, including the scientific Python ecosystem (NumPy, SciPy, Jupyter, Matplotlib, Pandas, Cartopy, GitHub).

You can also use the LinkedEarth Community Hub for your research with Pyleoclim. You can request an account here.

--

--

Deborah Khider
CyberPaleo

Research Scientist at the USC Information Sciences Institute - Data Science, AI, and paleoclimatology