A tutorial on how to intersect point features inside a polygon boundary in python geopandas
Intersection is one of the most commonplace geospatial analysis tool in GIS (Geographic Information Systems). The simplest intersect method is where various input geometric features (points, polygons, lines) overlap with one another, to derive the overlaid features as the output.
Usually, this can be done simply in a GIS software such as QuantumGIS or Esri ArcGIS. However I am getting a bit jaded switching back-and-forth between programming environments and GIS GUIs, especially when working on relatively simple tasks.
Therefore, here’s a step-by-step example of how to intersect datapoints features within the boundaries of a polygon. For non-GIS folks out there, think of it as another tool in your wrangling tool-set, if you ever need to filter a dataframe based on a spatial overlap.
Step 1: Observing the spatial distribution of our points of interest
I’ve collected and wrangled together a dataset of measurements from air monitoring sensors for the city of Seattle (Washington) from PurpleAir. Or so I thought: the dataset I downloaded comprises of the full extent of the map displayed, therefore I’ve got datapoints that fall outside of the area of interest.
Looking at the city of Seattle’s boundaries on Google Maps (left, below), the datapoints I’m interested in should be mostly spread over a north-south axis, and not be dispersed too much over a west-east axis. This is something that can be roughly observed by plotting the longitude against the latitude on a scatterplot (right, below).
But I hear you: this is too rough of a visual approximation! Wouldn’t it be nice if, for instance, we were to transpose the city’s boundaries over to the scatterplot?
Step 2: Get boundary (polygon) data to intersect our points with
Where does one get data for the boundaries of places? Well, you’re in luck, since that tends to be one of the easiest spatial features to find. A site I’ve been using for many years is overpass-turbo, a nifty little tool to extract mapping features from OpenStreetMap (OSM), the “Wikipedia” of open-source GIS data. Simply navigate to your area of interest on the right-side of the interface, and click on ‘Wizard’ to define your features of interest. Using OSM Wiki, we know that U.S. administrative boundaries for cities are defined as “admin_level = 8”, which will return all administrative boundaries within the extent of the map on display. Again, we just want Seattle, so amend the Wizard query to “admin_level = 8 & name=Seattle”, and voilà:
Click on the ‘Export’ button and select ‘download as GeoJSON’:
Another common source for data pertaining to municipalities are government open-data warehouses such as the City of Seattle Open Data portal, which also provides an option to get city limits as a GeoJSON.
Wait what’s a GeoJSON?
Step 3: Importing a GeoJSON into Pandas
import geopandas as gpdfname = “./qgis/outputs/seattle_B.geojson” # Your filepath herepoly = gpd.read_file(fname)print(type(poly))poly.head()
As you can see, the GeoJSON was imported as a geoDataFrame object (‘geopandas.geodataframe.GeoDataFrame’), since we used geopandas to import the file.
Step 4: Mapping the city boundaries on the scatterplot
Using matplotlib once more:
import matplotlib.pyplot as pltfig, ax = plt.subplots()ax = sns.scatterplot(x=”Longitude”, y=”Latitude”, data=df)
# this is plotting the datapoints from the PurpleAir dataframepoly.plot(ax=ax, facecolor=’none’, edgecolor=’purple’);
# plotting the city's boundaries here, with facecolor = none to
# remove the polygon's fill colorplt.tight_layout();
Neat! Now we can clearly observe where our datapoints are distributed relative to Seattle’s city borders. Therefore, we need to run an intersection so that only the points within the polygon are retained in our dataset.
Step 5: Intersect the datapoints with the city’s polygon
First, we need to convert the datapoint dataframe into a geoDataFrame, since we’ll be using a geopandas function (otherwise this will not work):
# Convert the purpleAir dataframe to a geodataframepoints_gdf = gpd.GeoDataFrame(
df, geometry=gpd.points_from_xy(df.Longitude, df.Latitude))type(points_gdf) # check here if the conversion worked# Important: you may need to reset the dataframe index if you had an index defined.
This part below brought to you by: read package documentation for smoother sailingᵀᴹ.
# enable shapely.speedups which makes queries running faster.pip_mask = points_gdf.within(poly.loc[0, ‘geometry’])
# selecting the polygon's geometry field to filter out points that
# are not overlaidpip_data = df.loc[pip_mask]
What we did here was to create a “mask” of the points which fell within the polygon, and used it to filter the main dataframe (df) using the .loc() function.
We map once more our datapoints with the city’s polygon:
import matplotlib.pyplot as pltfig, ax = plt.subplots()ax = sns.scatterplot(x=”Longitude”, y=”Latitude”, data=pip_data)poly.plot(ax=ax, facecolor=’none’, edgecolor=’purple’);plt.tight_layout();
Not a bad option if you’re looking to take a break from switching back-and-forth while wrangling your spatial dataset!
Importing GeoJSON into pandas: https://ocefpaf.github.io/python4oceanographers/blog/2015/03/30/geo_pandas/
Geopandas documentation on intersecting features: https://automating-gis-processes.github.io/CSC18/lessons/L4/point-in-polygon.html