Understanding Spatial Autocorrelation in Python: A Practical Guide

4 min readNov 11, 2023

By Jeff Barlow-Spady

NB: I got some help from ChatGPT for the article

In this article, we’ll explore spatial autocorrelation using Python and key libraries like GeoPandas and inequalipy. We’ll cover concepts like Moran’s I, Kolm-Pollak, and Excess Density Estimator (EDE), providing a practical guide with code examples.

Spatial autocorrelation is a powerful concept in spatial statistics that helps us understand how similar or dissimilar values are distributed across geographic space. I’m currently learning while working on a project and wanted to share a high-level overview to help cement my learning.

What is Spatial Autocorrelation?

Spatial autocorrelation indicates the degree of similarity between observations in a geographic space.

More simply, it helps answer questions like: Are similar values clustered together, or are they randomly distributed? Python, with its vast number of geospatial libraries, allows us to analyze and visualize these patterns effectively.

Moran’s I: A Measure of Global Spatial Autocorrelation

Moran’s I is a widely used statistic to quantify spatial autocorrelation. It ranges from -1 (perfect dispersion) to 1 (perfect clustering). The formula for Moran’s I is:

where:

(n) is the number of spatial units indexed by (i) and (j),
(x) is the variable of interest,
(\bar{x}) is the mean of (x),
(w_{ij}) is a matrix of spatial weights, and
(W) is the sum of all (w_{ij}).

Implementing Moran’s I in Python

Let’s dive into Python code using GeoPandas and LibPySAL:

import geopandas as gpd
from libpysal.weights import lat2W
from esda.moran import Moran
import numpy as np
# Load your GeoDataFrame
gdf = gpd.read_file('path/to/data.shp')
# Assuming your_data_array is a 2-D numpy array
your_data_array = np.array([...]) # replace with your data

# Create the matrix of weights
w = lat2W(your_data_array.shape[0], your_data_array.shape[1])

# Create the Moran object
mi = Moran(your_data_array, w)

# Print Moran's I results
print(f"Moran's I: {mi.I:.4f}")
print(f"p-value: {mi.p_norm:.4f}")

moran_statistic = mi.I

Kolm-Pollak: Exploring Spatial Autocorrelation with Weights

Kolm-Pollak is an advanced spatial autocorrelation measure that incorporates spatial weights. It helps us understand how spatial units interact. The formula is complex but essentially captures spatial interdependence.

(n) is the number of observations,
(f_i) and (f_j) are the values of the variable of interest for the (i)th and (j)th observations, respectively,
(W_{ij}) is the weight assigned to the pair of observations (i) and (j).

The Python code for Kolm-Pollak is similar to Moran’s I, just using the inequalipy library:

import numpy as np
import inequalipy as ineq

# Assuming your_data_array is a 1-D numpy array
your_data_array = np.array([...]) # replace with your data

# Assuming epsilon and kappa are the inequality aversion parameters
epsilon = ... # replace with your value
kappa = ... # replace with your value

# Calculate the Kolm-Pollak EDE
kp_ede = ineq.kolmpollak.ede(your_data_array, epsilon, kappa)

# Calculate the Kolm-Pollak Index
kp_index = ineq.kolmpollak.index(your_data_array, epsilon, kappa)

print(f'Kolm-Pollak EDE: {kp_ede:.4f}')
print(f'Kolm-Pollak Index: {kp_index:.4f}')

Excess Density Estimator (EDE): Analyzing Density-Based Autocorrelation

EDE is especially useful for capturing spatial patterns in density. The formula involves the spatial density function.

n: This is the total number of observations or data points.
i, j: These are indices used to denote a specific observation or data point. They range from 1 to n.
w_{ij}: This represents the weight assigned to the interaction between observations i and j. The weights can be based on the distance between observations, their similarity, or any other criteria.
f_i, f_j: These represent the feature values of observations i and j, respectively.
\bar{f}: This represents the average feature value across all observations.
\sum: This is the summation operator. It is used to sum up the values generated by the expression that follows it. The double summation (\sum_{i=1}^{n} \sum_{j=1}^{n}) means that we sum over all pairs of observations.

import numpy as np

def weight_func(i, j, data):
 # Calculate the Euclidean distance between the i-th and j-th observations
 distance = np.linalg.norm(data[i] - data[j])
 
 # Use the inverse distance as the weight
 # Add a small constant to the denominator to avoid division by zero
 weight = 1.0 / (distance + 1e-6)
 
 return weight

def compute_ede(data):
 n = len(data)
 f_bar = np.mean(data, axis=0)

 numerator = 0
 denominator = 0
 for i in range(n):
 for j in range(n):
 weight = weight_func(i, j, data)
 diff = data[i] - f_bar
 numerator += weight * np.dot(diff, diff)
 denominator += weight

 ede = numerator / denominator
 return ede

# Assuming your_data_array is a 2-D numpy array
your_data_array = np.array([...]) # replace with your data

ede_statistic = compute_ede(your_data_array)
print(f'Excess Density Estimator (EDE): {ede}')
density_estimator = ede

Combining Statistics into an Index

To visualize the overall spatial autocorrelation pattern, combine multiple statistics into an index. A simple weighted sum or average works well:

# Combine statistics into an index
# Combine statistics into an index
index = 0.25 * moran_statistic + 0.25 * kp_ede + 0.25 * kp_index + 0.25 * density_estimator

This index can be added to your GeoDataFrame and visualized.

Originally published at https://medium.com on November 11, 2023.