# Spatial Intersects with Geopandas

Jul 21, 2020 · 5 min read

A tutorial on how to intersect point features inside a polygon boundary in python geopandas

Intersection is one of the most commonplace geospatial analysis tool in GIS (Geographic Information Systems). The simplest intersect method is where various input geometric features (points, polygons, lines) overlap with one another, to derive the overlaid features as the output.

Usually, this can be done simply in a GIS software such as QuantumGIS or Esri ArcGIS. However I am getting a bit jaded switching back-and-forth between programming environments and GIS GUIs, especially when working on relatively simple tasks.

Therefore, here’s a step-by-step example of how to intersect datapoints features within the boundaries of a polygon. For non-GIS folks out there, think of it as another tool in your wrangling tool-set, if you ever need to filter a dataframe based on a spatial overlap.

Step 1: Observing the spatial distribution of our points of interest

I’ve collected and wrangled together a dataset of measurements from air monitoring sensors for the city of Seattle (Washington) from PurpleAir. Or so I thought: the dataset I downloaded comprises of the full extent of the map displayed, therefore I’ve got datapoints that fall outside of the area of interest.

Looking at the city of Seattle’s boundaries on Google Maps (left, below), the datapoints I’m interested in should be mostly spread over a north-south axis, and not be dispersed too much over a west-east axis. This is something that can be roughly observed by plotting the longitude against the latitude on a scatterplot (right, below).

But I hear you: this is too rough of a visual approximation! Wouldn’t it be nice if, for instance, we were to transpose the city’s boundaries over to the scatterplot?

Step 2: Get boundary (polygon) data to intersect our points with

Where does one get data for the boundaries of places? Well, you’re in luck, since that tends to be one of the easiest spatial features to find. A site I’ve been using for many years is overpass-turbo, a nifty little tool to extract mapping features from OpenStreetMap (OSM), the “Wikipedia” of open-source GIS data. Simply navigate to your area of interest on the right-side of the interface, and click on ‘Wizard’ to define your features of interest. Using OSM Wiki, we know that U.S. administrative boundaries for cities are defined as “admin_level = 8”, which will return all administrative boundaries within the extent of the map on display. Again, we just want Seattle, so amend the Wizard query to “admin_level = 8 & name=Seattle”, and voilà:

Another common source for data pertaining to municipalities are government open-data warehouses such as the City of Seattle Open Data portal, which also provides an option to get city limits as a GeoJSON.

Step 3: Importing a GeoJSON into Pandas

`import geopandas as gpdfname = “./qgis/outputs/seattle_B.geojson” # Your filepath herepoly = gpd.read_file(fname)print(type(poly))poly.head()`

As you can see, the GeoJSON was imported as a geoDataFrame object (‘geopandas.geodataframe.GeoDataFrame’), since we used geopandas to import the file.

Step 4: Mapping the city boundaries on the scatterplot

Using matplotlib once more:

`import matplotlib.pyplot as pltfig, ax = plt.subplots()ax = sns.scatterplot(x=”Longitude”, y=”Latitude”, data=df) # this is plotting the datapoints from the PurpleAir dataframepoly.plot(ax=ax, facecolor=’none’, edgecolor=’purple’);# plotting the city's boundaries here, with facecolor = none to # remove the polygon's fill colorplt.tight_layout();`

Neat! Now we can clearly observe where our datapoints are distributed relative to Seattle’s city borders. Therefore, we need to run an intersection so that only the points within the polygon are retained in our dataset.

Step 5: Intersect the datapoints with the city’s polygon

First, we need to convert the datapoint dataframe into a geoDataFrame, since we’ll be using a geopandas function (otherwise this will not work):

`# Convert the purpleAir dataframe to a geodataframepoints_gdf = gpd.GeoDataFrame( df, geometry=gpd.points_from_xy(df.Longitude, df.Latitude))type(points_gdf) # check here if the conversion worked# Important: you may need to reset the dataframe index if you had an index defined.`

This part below brought to you by: read package documentation for smoother sailingᵀᴹ.

`import shapely.speedupsshapely.speedups.enable() # enable shapely.speedups which makes queries running faster.pip_mask = points_gdf.within(poly.loc[0, ‘geometry’]) # selecting the polygon's geometry field to filter out points that # are not overlaidpip_data = df.loc[pip_mask]`

What we did here was to create a “mask” of the points which fell within the polygon, and used it to filter the main dataframe (df) using the .loc() function.

We map once more our datapoints with the city’s polygon:

`import matplotlib.pyplot as pltfig, ax = plt.subplots()ax = sns.scatterplot(x=”Longitude”, y=”Latitude”, data=pip_data)poly.plot(ax=ax, facecolor=’none’, edgecolor=’purple’);plt.tight_layout();`

Not a bad option if you’re looking to take a break from switching back-and-forth while wrangling your spatial dataset!

Importing GeoJSON into pandas: https://ocefpaf.github.io/python4oceanographers/blog/2015/03/30/geo_pandas/

Geopandas documentation on intersecting features: https://automating-gis-processes.github.io/CSC18/lessons/L4/point-in-polygon.html

## Analytics Vidhya

Analytics Vidhya is a community of Analytics and Data…

### By Analytics Vidhya

Latest news from Analytics Vidhya on our Hackathons and some of our best articles! Take a look.

Medium sent you an email at to complete your subscription.

Written by

## HP-Nunes

Aspiring data analyst with a background in GIS. Finishing a Master’s in Environmental Assessment on participatory air monitoring and Citizen Science.

## Analytics Vidhya

Analytics Vidhya is a community of Analytics and Data Science professionals. We are building the next-gen data science ecosystem https://www.analyticsvidhya.com

Written by

## HP-Nunes

Aspiring data analyst with a background in GIS. Finishing a Master’s in Environmental Assessment on participatory air monitoring and Citizen Science.

## Analytics Vidhya

Analytics Vidhya is a community of Analytics and Data Science professionals. We are building the next-gen data science ecosystem https://www.analyticsvidhya.com

## Most Data Is Normal: Dating Is Anything But

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app