10 Essential Operations for Spatial Data in Python

Duccio Piovani
namR
Published in
7 min readFeb 19, 2019

--

An incomplete introduction to treating and visualising spatial data in python

Building Coordinates in Paris (France)

Four years ago I started working with geo located data, with no or little previous experience in the field. Helped by the incredible amount of altruistic people that populate online forums, I learnt, patch after patch, most of the techniques required to efficiently (or not) analyse and visualise spatial data in Python.

Today at nam.R, together with my team, I work with spatial data every day. Data on building footprints, address geolocations, administrative boundaries, industrial zones coordinates or aerial images just to mention some. For this reason I have decided to aggregate in the following list some of the most important operations we have to repeat very often, hoping that this will simplify the life of someone out there approaching the field.

Disclaimer: there are probably smarter and more elegant ways to solve the problems I have presented, but the codes below definitely get the job done. I really hope this list will be useful to people just getting started in the field.

  1. Read Geometries

First things first: lets start by reading the geometries.

from csv

geoms = pd.read_csv('geometries.csv')

Immagine the file contains polygons under the columns geometry we now have to convert them to a geometry type ( by default they will be read as strings). Typically the geometry will be written in the well-known text format, which we can convert using the library shapely

# if the geometries are stored as well-know text
geoms['geometry'] = geoms['geometry'].apply(lambda g: wkt.loads(g))

from shapefile

If the file you want to read is a shapefile then we can use the geo-pandas library, which will handle the geometry types automatically

import geopandas as gpd
gdf = gpd.read_file('geometries.shp')

2. Change coordinate reference system (CRS)

When dealing with spatial data, often one needs to pass from one coordinate system to another one. Every part of the globe has its particular coordinate reference system or CRS, which minimises the error of treating 3D (almost) spherical coordinates, such as those referring to points on the earth, on a 2D plane. The most common CRSs are labelled with a specific code called epsg.

Using the Proj and Transform function of the pyproj library (point pojection)

from pyproj import Proj, transform# the crs of the point (in this case GPS Lat/Lon)
inProj = Proj(init='epsg:4326')
# the desired crs (in this case Lambert, French crs)
outProj = Proj(init='epsg:2154')
# the coordinates of the center of paris in Long/Lat
xold, yold = 2.349014, 48.864716
# project the coordinates in the new system.
xnew,ynew = transform(inProj,outProj, xold,yold)

In order to change the coordinate system of an entire file we can use geopandas

import geopandas as gpd#read the geo-dataframe
gdf = gpd.read_file('geometries.shp')
# define the current crs
gdf.crs = {'init':'epsg:4326'}
# project to a new crs
gdf = gdf.to_crs({'init':'epsg:2154'})

3. Extract coordinates and Plot Polygons in Matplotlib

Even though this may seem a trivial thing, a problem I often faced was to extract the coordinates of a geometry and use them to visualise it. I am still not sure this is the best way to do it but the code below does work. In the following example we will look how to do so for a polygon and a multipolygon (which is a collection or array of polygons)

import matplotlib.pyplot as plt
import numpy as np

# get the geometries from my DataFrame (or GeoDataFrame)
gp = geoms.loc[0,'geometry'] # Polygon
gm = geoms.loc[1,'geometry'] # Multi-Polygon
# extract their coordinates as a numpy array
# BEWARE: If you are dealing with multi-polygons you have to specify
# which polygon you are interested in. Usually its the first.
c1 = np.array(gp.exterior)# Polygon
c2 = np.array(gm[0].exterior)# Multi-Polygon
fig, (ax1,ax2) = plt.subplots(ncols=2,figsize=(10,7))ax1.fill(c1[:,0], c1[:,1], c='darkgreen' ,alpha=0.6, linewidth=4)
ax1.plot(c1[:,0], c1[:,1], c='darkgreen' ,alpha=0.9, linewidth=4)
ax2.fill(c2[:,0], c2[:,1], c='orange' ,alpha=0.6, linewidth=4)
ax2.plot(c2[:,0], c2[:,1], c='orange' ,alpha=0.9, linewidth=4)
ax1.set_title('Polygon 1',fontsize=20)
ax2.set_title('Polygon 2',fontsize=20)
plt.legend()
plt.show()
Example of two polygons visualised with matplotlib

4. Geometric Features with Shapely

Once we have read the geometries, we can easily calculate their basic geometric features, exploiting the library shapely, with the following commands

area = geoms['geometry'].apply(lambda g: g.area)perimeter = geoms['geometry'].apply(lambda g: g.length)convex_hull = geoms['geometry'].apply(lambda g: g.convex_hull)bounding_box = geoms['geometry'].apply(lambda g: g.bounds)

5. Find Intersections, between Polygons (the dummy way)

Calculating the intersections between a geometry g1, and a list of geometries geoms, can be a very long operation wich grows with the complexity of the shapes. A clever way of optimizing it is by spatially indexing the shapes, and a number of techniques exist to do so. We will see a few of these further down. But indexing the geometries does take some time and therefore for lists of small sizes we can simply use two functions built in shapely, intersection and intersects.

Here is an example:

# find the geometries with a non zero intersection with g1
bool_list = geoms['geometry'].apply(lambda g: g.intersects(g1))
# calculate the area of the intersections
areas = geoms.loc[bool_list,'geometry'].apply(lambda g: g.intersection(g1).area)

6. K-Nearest Neighbours with KD-tree

Calculating the nearest neighbours of list of n points, implies that one has to calculate the distance of every point to every other point, which needs n operations n times. This number therefore scales as the square of the size of the list, n², becoming very big very quickly.

The KD-Tree algorithm, by spatially organising the points on a tree, saves us many waiting hours by reducing the number of operations to n * log(n).

The following code is an example of how we can implement it using the scipy spatial library

from scipy import spatial#number of nearest neighbours
k=5
# if our geometries are polygons we start getting their centroids
centroids = geoms['geometry'].apply(lambda g:[g.centroid.x,g.centroid.y]).tolist()
# I create the Tree from my list of point
kdtree = spatial.KDTree(centroids)
# I calculate the nearest neighbours and their distance
neigh = kdtree.query(centroids, k=k)

7. Find all neighbours in a range with KD-tree

By following the same ideas introduced above we can calculate all the neighbours within a given range r

from scipy import spatial# We define the range
radius=100
# Like in the previous example we populate the KD-tree
kdtree = spatial.KDTree(centroids)
neigh_list = []# We cycle on every point and calculate its neighbours
# with the function query_ball_point
for m, g in enumerate(centroids):
neigh_list.append(kdtree.query_ball_point(g, r=radius))

8. Find the Maximum Intersection with R-tree

R-tree spatial indexing builds a tree to efficiently query 2D or 3D polygons by treating their bounding box. Like the KD-tree algorithm for points, the R-tree algorithm speeds up all spatial queries relating to polygons. In the following code we give an example of how to find, given a polygon go, the polygon with the biggest intersection in a list of geometries. To do this we need to install and import the rtree package.

from rtree import index# we create the spatial indexes from the bounding boxes of the 
# geometries.
idx = index.Index()
for n, g in enumerate(geoms['geometry']):
idx.insert(n,g.bounds)
# the polygons whose bounding boxes intersect go
possible_intesects = list(idx.intersection(go.bounds))
# the area of the intersections
intersections = [g.intersection(go).area for g in geoms.loc[possible_intesects,'geometry']]
# we take the polygon with the greatest intersection
id_poly_max = np.array(intersections).argmax()
poly_max = possible_intesects[id_poly_max]

9. Plotting Spatial Heatmaps with Geopandas

Thanks to geopandas we can easily plot choropleth maps of given quantities. In the example that follows we plot the population’s spatial distribution of Italian regions.

import geopdandas as gpd# read the shapefile with geopandas 
regions = gpd.read_file('italian_regions.shp')
#plot the spatial distribution of the population
fig, ax = plt.subplots(figsize=(10,8))
ax.set_title('Italian Population Distribution per Region',fontsize=18)
reg.plot(column ='population', cmap = 'Purples', legend=True, edgecolor='r',ax=ax)
ax.axis('off')

10. Plotting Polygons on a Map with Folium

In this example we will see how to plot the boundaries of the french departments with the map as a background. Once again this operation is made very simple by geopandas and folium:

import geopandas as gpd
import folium
# read the shape files of the french departments
dep = gpd.read_file('departments.shp')
# set the coordinates to Long/Lat WGS84
dep.crs = {'init':'epsg:4326'}
# define the folium map object
M = folium.Map(location=(47, 5),zoom_start=6)
# add the geometries
folium.GeoJson(dep.geometry.to_json()).add_to(M)
M

the map in this case looks like this:

And this is it.

Any comments on how to improve the code are not only welcomed but encouraged.

At nam.R (https://namr.com), we provide geo-spatial intelligence by aggregating, cleaning, organising and finally enriching a large amount of geo-localised data. Our goal is to build a product that helps companies, cities and organisations navigate the ecological transition and their business challenges.

--

--

Duccio Piovani
namR
Writer for

Climbing the giants’ shoulders one commit at the time. Lead Data Scientist at WFP, Hunger Unit.