Spatial joins in GeoPandas

You can join two GeoPandas GeoDataFrames through conventional means with merge, but you can also use sjoin to capitalize on the spatial relationship between two frames.

import pandas as pd
import geopandas as gpd
import requests
from shapely.geometry import Point

Let’s get some zip centroids from the US Census. This is the same zip points dataset we used in my recent Reverse Geocoding post.

url = ‘http://www2.census.gov/geo/docs/maps-data/data/gazetteer/2016_Gazetteer/2016_Gaz_zcta_national.zip'
zips = pd.read_csv(url, dtype={‘GEOID’ : ‘str’},sep=’\t’, usecols=[0,5,6])
zips.columns = zips.columns.str.strip() #some column cleanup
print (len(zips))
print (zips.head())
33144
GEOID INTPTLAT INTPTLONG
0 00601 18.180555 -66.749961
1 00602 18.361945 -67.175597
2 00603 18.455183 -67.119887
3 00606 18.158345 -66.932911
4 00610 18.295366 -67.125135

We’ll convert this from a Pandas DataFrame to a GeoPandas GeoDataFrame straight away. It’s a pretty simple process, we just create some geometry which is a Shapely Point object in our case. We also set the frame’s Coordinate Reference System to a popular projection. See the docs for more details

geom = zips.apply(lambda x : Point([x[‘INTPTLONG’],x[‘INTPTLAT’]]),           axis=1)
zips = gpd.GeoDataFrame(zips, geometry=geom) #geom is a Series
zips.crs = {‘init’ :’epsg:4326'}
print (zips.head())
GEOID   INTPTLAT  INTPTLONG                              geometry
0 00601 18.180555 -66.749961 POINT (-66.749961 18.180555)
1 00602 18.361945 -67.175597 POINT (-67.175597 18.361945)
2 00603 18.455183 -67.119887 POINT (-67.11988700000001 18.455183)
3 00606 18.158345 -66.932911 POINT (-66.932911 18.158345)
4 00610 18.295366 -67.125135 POINT (-67.125135 18.295366)

Let’s create another GeoDataFrame. This time using a geojson file containing the US states.

geojson_file = ‘data/gz_2010_us_040_00_500k.json’
states = gpd.read_file(geojson_file)[[‘NAME’, ‘geometry’]]
print (len(states))
print (states.head())
52
NAME geometry
0 Maine (POLYGON ((-67.619761 44.519754, -67.61541 44....
1 Massachusetts (POLYGON ((-70.832044 41.606504, -70.823735 41...
2 Michigan (POLYGON ((-88.684434 48.115785, -88.675628 48...
3 Montana POLYGON ((-104.057698 44.997431, -104.250145 4...
4 Nevada POLYGON ((-114.0506 37.000396, -114.049995 36....

As you can see, states geometry is being represented by polygons.

Now that we have two frames, we can join them. The first two args are self-explanatory. The op arg stands for operation. In our case, we want zip points that are within state polygons. Other options include intersects and contains. I’m taking the default on the how arg which is inner (left and right are the other choices)

zips_and_states = gpd.sjoin(zips, states, op=’within’)
print (zips_and_states[[‘GEOID’, ‘NAME’, ‘index_right’]].head())
   GEOID         NAME  index_right
0 00601 Puerto Rico 16
1 00602 Puerto Rico 16
2 00603 Puerto Rico 16
3 00606 Puerto Rico 16
4 00610 Puerto Rico 16

We now have zips(GEOID) alongside state names. We also have a new column (index_right) which is something that sjoin adds. It’s handy if you need to join back to states to get it’s geometry which is not included in the sjoin.

Check out the code here