Exploring Temporal Trends: Analyzing Time Series and Gridded Data with Python

Jaydeo K Dharpure
10 min readMay 9, 2024

--

Introduction

Understanding trends within time series and gridded datasets holds immense significance for identifying underlying patterns and behaviors within the data, which further aids in developing accurate prediction models to anticipate future trends. Trend analysis can be utilized for various applications such as hydrology, climatology, urban mapping, crop monitoring, and more. A wide range of gridded data, including satellite observations, reanalysis data, and climate scenario observations, etc., from coarse to fine scales at the global or regional level, are freely available for spatio-temporal monitoring and modeling. In addition to this, non-parametric methods such as the Mann-Kendall (MK) trend test and Sen’s slope estimator are capable of providing trends of temporal and spatial scale data along with their significance levels.

Aim of the work

The main motivation of this tutorial is to assist researchers in implementing the Mann-Kendall test in temporal and gridded data using the open-source Python programming language. Additionally, it aims to calculate Sen’s slope value with both datasets. In this tutorial, we will utilize satellite-based Gravity Recovery and Climate Experiment (GRACE) Terrestrial Water Storage Anomaly (TWSA) data at a resolution of 0.5° from 2003 to 2021 for trend analysis. We outline simple steps using three types of datasets:

  1. Temporal trend test using Excel,
  2. Temporal trend test using mean value of raster images,
  3. Spatial trend test using time series of raster dataset

The entire code and material of this blog can be found in the GitHub repository.

Mann-Kendal and Sen’s slope methods

Mann-Kendal (MK) (Mann 1945) and Sen’s slope (Sen 1968) methods are non-parametric statistical methods which does not require normal distribution and autocorrelation in the time series data. We have demonstrated spatial trend and their significant level in our articles (Dharpure et al. 2020, 2021; Patel et al. 2021). The MK trend test assumes null hypothesis is that there is no trend in the time series data and alternate hypothesis is that it has a trend (increasing or decreasing). For this, Z-score is calculated that indicate the trend will be positive or negative. If the Z-score > 0 it means a trend is positive and Z-score < 0 it means the trend is negative, Z-score = 0 means no trend. However, we need to find the significant level of the trend. For this, we can select the ǀZǀ ≥ 1.645, ǀZǀ ≥ 1.96, and ǀZǀ ≥ 2.576 for the confidence level of p ≤ 0.1, p ≤ 0.05, and p ≤ 0.01, respectively or depending on the researchers. The Sen’s slope demonstrated the magnitude of trend, which is calculated by the median of the slopes of each pair of points. In this tutorial, we will demonstrate the statistically significant trend at p ≤ 0.05.

Temporal trend test using Excel

Loading all necessary python libraries

# Import necessary libraries
import os # Operating system functions
import re # Regular expressions
import numpy as np # Numerical computing
import pandas as pd # Data manipulation and analysis
import rasterio # Raster data manipulation
import matplotlib.pyplot as plt # Plotting library
import matplotlib.colors as mcolors # Custom color maps
from pymannkendall import original_test # Mann-Kendall trend test

Creating necessary functions

# Function for natural key sorting
def atoi(text):
"""
Convert a string to an integer if possible.

Args:
text (str): The input string.

Returns:
int or str: The converted integer or the original string.
"""
return int(text) if text.isdigit() else text

def natural_keys(text):
"""
Convert a string to a list of integers and strings for natural sorting.

Args:
text (str): The input string.

Returns:
list: A list of integers and strings.
"""
return [atoi(c) for c in re.split(r'(\d+)', text)]

We will use the pyMannKendall library for this purpose.

pyMannKendall provides 13 different versions of the MK test, including modified Mann-Kendall tests, Multivariate MK Test, Regional MK Test, Correlated MK test, Partial MK Test, etc.

Here, we use the Original Mann-Kendall test (original_test): The Original Mann-Kendall test is a nonparametric test that does not consider serial correlation or seasonal effects.

The test will return a tuple containing the named parameters:

  1. trend: indicates the trend (increasing, decreasing, or no trend),
  2. h: True (if a trend is present) or False (if the trend is absent),
  3. p: p-value of the significance test,
  4. z: normalized test statistics,
  5. Tau: Kendall Tau,
  6. s: Mann-Kendall’s score,
  7. var_s: Variance S,
  8. slope: Theil-Sen estimator/slope,
  9. intercept: intercept of Kendall-Theil Robust Line

For example: If we pass temporal data to the method, the return will be:

original_test(results_df.iloc[:, 1])
Mann_Kendall_Test(trend='decreasing', h=True, p=1.4475868503893707e-08, z=-5.667662571184689, Tau=-0.9532163742690059, s=-163.0, var_s=817.0, slope=-1.3711816358484143, intercept=6.001031021996766)

Function to perform the Mann-Kendall test and return the slope, intercept, and p-value

def mk_test(data):
"""
Perform the Mann-Kendall test and obtain the slope, intercept, and p-value.

Args:
data (array_like): The input data for the test.

Returns:
tuple: A tuple containing the slope, intercept, and p-value.
"""
# Perform the test and obtain the results
result = original_test(data)

# Extract slope, intercept, and p-value from the result
slope, intercept, p = result.slope, result.intercept, result.p

return slope, intercept, p

This code performs the Mann-Kendall trend test on data stored in an Excel file. It automates the process of conducting the Mann-Kendall trend test on multiple columns of data and then saves the results in an Excel file for further analysis or reporting.

# Define the file path
file_path = r'C:\Users\dharpure.1\OneDrive - The Ohio State University\Paper work\Blog\Mann-Kendall trend test\\'
input_filename = 'MK data.xlsx'

# Combine the file path and input filename
input_file_path = os.path.join(file_path, input_filename)

# Read data from Excel file
data_frame = pd.read_excel(input_file_path, sheet_name='Sheet1', engine='openpyxl')

# Set index starting from 1
data_frame.index = range(1, len(data_frame) + 1)

# Initialize lists to hold results
mk_results = {
'Column': [], # List to store column names
'Slope': [], # List to store slope values
'P-value': [] # List to store p-values
}

# Perform the Mann-Kendall test on each column in the data frame
for column_name in data_frame.columns[1:]:
# Calculate the slope, intercept, and p-value for the current column
slope, intercept, p_value = mk_test(data_frame[column_name])

# Store the results in the lists
mk_results['Column'].append(column_name)
mk_results['Slope'].append(slope)
mk_results['P-value'].append(p_value)

# Create a DataFrame from the dictionary
results_df = pd.DataFrame(mk_results)

# Define the save path for the results
save_path = os.path.join(file_path, "MK Test")

# Create the output directory if it doesn't exist
if not os.path.exists(save_path):
os.makedirs(save_path)

# Define the output file path
output_filename = 'MK result.xlsx'
output_file_path = os.path.join(save_path, output_filename)

# Save the results to an Excel file
results_df.to_excel(output_file_path, index=False)

print("Mann-Kendall test results saved to", output_file_path)

Plot the trend using matplotlib

# Plot the trend using matplotlib
fig, ax = plt.subplots(figsize=(12, 4)) # Set the figure size
colors = ['red', 'black'] # Define colors for different columns
x = np.arange(len(data_frame))
index = data_frame.iloc[:, 0]

# Iterate over each column in the data frame
for i, column_name in enumerate(data_frame.columns[1:]):
slope, intercept, p_value = mk_test(data_frame[column_name]) # Perform the Mann-Kendall test

# Plot the data with corresponding label and color
ax.plot(x, data_frame[column_name], marker='o', label=f'{column_name} (Slope: {slope:.2f}, Intercept: {intercept:.2f}, P-value: {p_value:.2f})', color=colors[i % len(colors)]) # Use modulus to cycle through colors

if p_value < 0.05:
# Plot the trend line for significant trends
ax.plot(x, slope * data_frame.index + intercept, linestyle='--', color=colors[i % len(colors)])

# Set the x-axis ticks and labels
interval = 1
ax.set_xticks(x[::interval])
ax.set_xticklabels(index[::interval], rotation=45)

# Set the y-limits to start and end on the x-axis
ax.set_xlim(-1, len(index))
ax.set_ylabel('GRACE TWSA (cm)', fontsize=12)
ax.set_xlabel('Year', fontsize=12)
ax.set_title('Trend Analysis')
ax.legend() # Display legend
ax.grid(True) # Show grid

# Define the output file path
output_figure_name = 'MK_figure.png'
output_figure_path = os.path.join(file_path, output_figure_name)

# Save the graph as TIFF with 300 DPI
fig.savefig(output_figure_path, dpi=300, format='png', bbox_inches='tight')

MK trend test using mean value of raster images

This code calculates the mean of data stored in GeoTIFF files. It reads the data from the files, excluding nodata values, and calculates the mean value. The results are stored in a DataFrame along with the corresponding year. The DataFrame is then saved to an Excel file. Additionally, it performs the Mann-Kendall trend test on the mean values to determine the slope, intercept, and p-value of the trend.

# Define a function to calculate the mean of a GeoTIFF file
def calculate_mean(tif_file):
try:
with rasterio.open(tif_file) as src:
data = src.read(1, masked=True) # Read data from GeoTIFF
nodata_value = src.nodatavals[0] # Get nodata value

# Remove nodata values
data_no_nodata = data.compressed() # Equivalent to data[~data.mask]

# Calculate mean value
mean_value = data_no_nodata.mean()
# Return the mean value
return mean_value

except Exception as e:
print(f"Error processing {tif_file}: {e}") # Print error message if an exception occurs
return np.nan # Return NaN if there is an error

# Base directory path
folder_path = r"C:\Users\dharpure.1\OneDrive - The Ohio State University\Paper work\Blog\Mann-Kendall trend test\TWSA\\"

# Initialize a dictionary to store results
results_dict = {'Year': []}

# Get list of GeoTIFF files in the current folder
tif_files = [f for f in os.listdir(folder_path) if f.endswith('.tif')]
# Sort files naturally
tif_files.sort(key=natural_keys)

# Initialize lists to store mean values
mean_list = []

# Iterate through files and calculate mean values
for tif_file in tif_files:
tif_path = os.path.join(folder_path, tif_file)
mean_value = calculate_mean(tif_path)

# Append mean value to the list
mean_list.append(mean_value)

results_dict['Year'].append(int(tif_file.rstrip('.tif')))

# Add mean values to results dictionary
results_dict['TWSA'] = mean_list

# Convert results dictionary to DataFrame
results_df = pd.DataFrame(results_dict)


dst_path = os.path.join(folder_path, 'result')
if not os.path.exists(dst_path):
os.makedirs(dst_path)

# Save DataFrame to Excel file
results_df.to_excel(os.path.join(dst_path, 'Raster mean value.xlsx'), index=False)

slope, intercept, p_value = mk_test(results_df.iloc[:, 1])
print(f'Slope: {round(slope, 3)}, intercept: {round(intercept, 3)}, p-value: {round(p_value, 3)}')
Slope: -1.371, intercept: 6.001, p-value: 0.0

Plot the trend using matplotlib

# Set figure size
plt.figure(figsize=(10, 6))

# Iterate over each column in the DataFrame starting from the second column
for column_name in results_df.columns[1:]:
# Calculate slope, intercept, and p-value
slope, intercept, p_value = mk_test(results_df[column_name])

# Plot the data for the current column
plt.plot(results_df.index, results_df[column_name], label=f'{column_name} (Slope: {slope:.2f}, Intercept: {intercept:.2f}, P-value: {p_value:.2f})')

# Plot trend line for significant trends (p-value < 0.05)
if p_value < 0.05:
plt.plot(results_df.index, slope * results_df.index + intercept, linestyle='--')

# Set labels and title
plt.xlabel('Time')
plt.ylabel('Data')
plt.title('Trend Analysis')

# Show legend, grid, and plot
plt.legend()
plt.grid(True)
plt.show()

Spatial trend test using time series of raster dataset

This code processes raster data stored as GeoTIFF files. It reads the data from each GeoTIFF file, performs the Mann-Kendall test on each pixel, calculates the slope and p-value, and stores the results in separate arrays. The code then saves the slope and p-value arrays as new GeoTIFF files, preserving the metadata of the original files.

# Define the data path and list of input files
path_data = r'C:\Users\dharpure.1\OneDrive - The Ohio State University\Paper work\Blog\Mann-Kendall trend test\TWSA\\'
files_list = [f for f in os.listdir(path_data) if f.endswith(".tif")]

# Open a sample file to get metadata and nodata value
with rasterio.open(os.path.join(path_data, files_list[0])) as src:
profile = src.profile
nodata = src.nodatavals[0]

# Sort files naturally
files_list.sort(key=natural_keys)
array_path_list = [os.path.join(path_data, f) for f in files_list]

# Initialize an array to store raster data
raster_data = []

# Read each file and append its data to the raster_data list
for file in array_path_list:
with rasterio.open(file) as src:
data = src.read(1, masked=True)
raster_data.append(data)

# Convert list of rasters to a numpy array
raster_data = np.array(raster_data)

# Get the dimensions of the data
rows, cols = raster_data.shape[1], raster_data.shape[2]

# Define the save path for the results
save_path = os.path.join(path_data, "MK Test")

# Create the output directory if it doesn't exist
if not os.path.exists(save_path):
os.makedirs(save_path)

# Initialize arrays for slope and p-value
slope_array = np.full((rows, cols), nodata, dtype=np.float32)
p_array = np.full((rows, cols), nodata, dtype=np.float32)

# Perform the Mann-Kendall test on each pixel
for r in range(rows):
for c in range(cols):
# Exclude nodata values
if raster_data[0, r, c] != nodata:
slope,__, p = mk_test(raster_data[:, r, c])
slope_array[r, c] = slope
p_array[r, c] = p

# Update profile for the slope and p-value rasters
profile.update(
dtype=rasterio.float32,
count=1,
nodata=nodata
)

# Save the slope raster
slope_filename = os.path.join(save_path, "slope.tif")
with rasterio.open(slope_filename, "w", **profile) as dst:
dst.write(slope_array, 1)

# Save the p-value raster
p_filename = os.path.join(save_path, "p.tif")
with rasterio.open(p_filename, "w", **profile) as dst:
dst.write(p_array, 1)

print(f"Saved slope and p-value rasters to {save_path}")

Plot the trend using matplotlib

# Define longitude and latitude arrays
lon = np.linspace(profile["transform"][2], profile["transform"][2] + cols * profile["transform"][0], cols)
lat = np.linspace(profile["transform"][5], profile["transform"][5] + rows * profile["transform"][4], rows)

# Create mesh grid for longitude and latitude
lon_grid, lat_grid = np.meshgrid(lon, lat)

# Set figure size
figsize = (10, 6)

# Plot slope
plt.figure(figsize=figsize)
plt.imshow(slope_array, extent=[lon.min(), lon.max(), lat.min(), lat.max()], cmap='jet', aspect='auto')
plt.colorbar(label='Slope (cm/yr)')
plt.xlabel('Longitude')
plt.ylabel('Latitude')
plt.title("Sen's Slope")
plt.show()

# Plot p-value
plt.figure(figsize=figsize)
plt.imshow(p_array, extent=[lon.min(), lon.max(), lat.min(), lat.max()], cmap='jet', aspect='auto')
plt.colorbar(label='p-value')
plt.xlabel('Longitude')
plt.ylabel('Latitude')
plt.title('p-value')
plt.show()

Plot significant and non-significant pixel based on p-values

# Plot significant p-values
plt.figure(figsize=figsize) # Set figure size here
# Set colorbar range
divnorm = mcolors.TwoSlopeNorm(vmin=0, vcenter=0.05, vmax=1)
# Colormap
colors = ['red', 'blue']
cmap = mcolors.ListedColormap(colors)
# Plot the p-values
plt.imshow(p_array, norm=divnorm, extent=[lon.min(), lon.max(), lat.min(), lat.max()], cmap=cmap)
plt.colorbar(label='p-values', ticks=[0, 0.05, 1])
plt.xlabel('Longitude')
plt.ylabel('Latitude')
plt.title('Significant P-values')
plt.show()

Conclusion

This tutorial has provided a comprehensive guide for researchers to implement Mann-Kendall test and calculate Sen’s slope value in both temporal and gridded datasets using Python. By leveraging open-source tools and utilizing satellite-based GRACE Terrestrial Water Storage Anomaly data, we’ve demonstrated step-by-step procedures for trend analysis at a resolution of 0.5° spanning from 2003 to 2021.

References

Dharpure JK, Goswami A, Patel A, et al (2021) Assessment of snow cover variability and its sensitivity to hydro-meteorological factors in the Karakoram and Himalayan region. Hydrological Sciences Journal 00:1–18. https://doi.org/10.1080/02626667.2021.1985125

Dharpure JK, Patel A, Goswami A, Kulkarni A V (2020) Spatiotemporal snow cover characterization and its linkage with climate change over the Chenab river basin , western Himalayas. GIScience & Remote Sensing 00:1–25. https://doi.org/10.1080/15481603.2020.1821150

Mann HB (1945) Non-Parametric Test Against Trend. Econometrica 13:245–259

Patel A, Goswami A, Dharpure JK, et al (2021) Regional mass variations and its sensitivity to climate drivers over glaciers of Karakoram and Himalayas. GIScience & Remote Sensing 00:1–23. https://doi.org/10.1080/15481603.2021.1930730

Sen PK (1968) Estimates of the Regression Coefficient Based on Kendall’s Tau. Journal of the American Statistical Association 63:1379–1389.

--

--

Jaydeo K Dharpure

Postdoctoral Research Associate at Byrd Polar and Climate Research Center, The Ohio State University, Columbus, USA