Understanding Spatial Autocorrelation in Python: A Practical Guide
--
By Jeff Barlow-Spady
Introduction
NB: I got some help from ChatGPT for the article
In this article, we’ll explore spatial autocorrelation using Python and key libraries like GeoPandas and inequalipy. We’ll cover concepts like Moran’s I, Kolm-Pollak, and Excess Density Estimator (EDE), providing a practical guide with code examples.
Spatial autocorrelation is a powerful concept in spatial statistics that helps us understand how similar or dissimilar values are distributed across geographic space. I’m currently learning while working on a project and wanted to share a high-level overview to help cement my learning.
What is Spatial Autocorrelation?
Spatial autocorrelation indicates the degree of similarity between observations in a geographic space.
More simply, it helps answer questions like: Are similar values clustered together, or are they randomly distributed? Python, with its vast number of geospatial libraries, allows us to analyze and visualize these patterns effectively.
Moran’s I: A Measure of Global Spatial Autocorrelation
Moran's I is a widely used statistic to quantify spatial autocorrelation. It ranges from -1 (perfect dispersion) to 1 (perfect clustering). The formula for Moran's I is:
where:
- (n) is the number of spatial units indexed by (i) and (j),
- (x) is the variable of interest,
- (\bar{x}) is the mean of (x),
- (w_{ij}) is a matrix of spatial weights, and
- (W) is the sum of all (w_{ij}).
Implementing Moran’s I in Python
Let's dive into Python code using GeoPandas and PySAL:
import geopandas as gpd
from libpysal.weights import lat2W
from esda.moran import Moran
import numpy as np
# Load your GeoDataFrame
gdf = gpd.read_file('path/to/data.shp')
# Assuming your_data_array is a 2-D numpy array
your_data_array = np.array([...]) # replace with your data
# Create the matrix of weights
w = lat2W(your_data_array.shape[0], your_data_array.shape[1])
# Create the Moran object
mi = Moran(your_data_array, w)
# Print Moran's I results
print(f"Moran's I: {mi.I:.4f}")
print(f"p-value: {mi.p_norm:.4f}")
moran_statistic = mi.I
Kolm-Pollak: Exploring Spatial Autocorrelation with Weights
Kolm-Pollak is an advanced spatial autocorrelation measure that incorporates spatial weights. It helps us understand how spatial units interact. The formula is complex but essentially captures spatial interdependence.
- (n) is the number of observations,
- (f_i) and (f_j) are the values of the variable of interest for the (i)th and (j)th observations, respectively,
- (W_{ij}) is the weight assigned to the pair of observations (i) and (j).
The Python code for Kolm-Pollak is similar to Moran's I, just using the inequalipy library:
import numpy as np
import inequalipy as ineq
# Assuming your_data_array is a 1-D numpy array
your_data_array = np.array([...]) # replace with your data
# Assuming epsilon and kappa are the inequality aversion parameters
epsilon = ... # replace with your value
kappa = ... # replace with your value
# Calculate the Kolm-Pollak EDE
kp_ede = ineq.kolmpollak.ede(your_data_array, epsilon, kappa)
# Calculate the Kolm-Pollak Index
kp_index = ineq.kolmpollak.index(your_data_array, epsilon, kappa)
print(f'Kolm-Pollak EDE: {kp_ede:.4f}')
print(f'Kolm-Pollak Index: {kp_index:.4f}')
Excess Density Estimator (EDE): Analyzing Density-Based Autocorrelation
EDE is especially useful for capturing spatial patterns in density. The formula involves the spatial density function.
- n: This is the total number of observations or data points.
- i, j: These are indices used to denote a specific observation or data point. They range from 1 to n.
- w_{ij}: This represents the weight assigned to the interaction between observations i and j. The weights can be based on the distance between observations, their similarity, or any other criteria.
- f_i, f_j: These represent the feature values of observations i and j, respectively.
- \bar{f}: This represents the average feature value across all observations.
- \sum: This is the summation operator. It is used to sum up the values generated by the expression that follows it. The double summation (\sum_{i=1}^{n} \sum_{j=1}^{n}) means that we sum over all pairs of observations.
import numpy as np
def weight_func(i, j, data):
# Calculate the Euclidean distance between the i-th and j-th observations
distance = np.linalg.norm(data[i] - data[j])
# Use the inverse distance as the weight
# Add a small constant to the denominator to avoid division by zero
weight = 1.0 / (distance + 1e-6)
return weight
def compute_ede(data):
n = len(data)
f_bar = np.mean(data, axis=0)
numerator = 0
denominator = 0
for i in range(n):
for j in range(n):
weight = weight_func(i, j, data)
diff = data[i] - f_bar
numerator += weight * np.dot(diff, diff)
denominator += weight
ede = numerator / denominator
return ede
# Assuming your_data_array is a 2-D numpy array
your_data_array = np.array([...]) # replace with your data
ede_statistic = compute_ede(your_data_array)
print(f'Excess Density Estimator (EDE): {ede}')
density_estimator = ede
Combining Statistics into an Index
To visualize the overall spatial autocorrelation pattern, combine multiple statistics into an index. A simple weighted sum or average works well:
# Combine statistics into an index
# Combine statistics into an index
index = 0.25 * moran_statistic + 0.25 * kp_ede + 0.25 * kp_index + 0.25 * density_estimator
This index can be added to your GeoDataFrame and visualized.
That’s It!
Spatial autocorrelation analysis in Python provides insights into the spatial distribution of your data. By leveraging libraries like GeoPandas and libpysal and inequalipy, you can compute and visualize Moran's I, Kolm-Pollak, and EDE. Understanding these concepts empowers you to uncover hidden patterns and make informed decisions based on the spatial nature of your data.
Happy mapping!
---