Working with Open Data shape files using Geopandas — how to match up your data with the areas defined in the shape file

A Gordon
DataExplorations
Published in
4 min readDec 10, 2018

I’m working on a project to gather information on historic buildings and other points of interest in the City of Toronto and wanted to generate some choropleth plots based on the data I’ve gathered so far. I downloaded the official neighbourhood boundaries shape file from Toronto Open Data and needed to find a way to match up my points of interest (with latitude/longitude) to the appropriate neighbourhood. How could I find out which area those coordinates fall into?

Happily, I found this project by Craig Booth in which he tackled a similar problem for Chicago and I was able to borrow and adapt his approach.

Reading shape files

First I used geopandas to import the shape file

import geopandas as gpddf_neighbourhoods = gpd.read_file("../data/Neighbourhoods/NEIGHBORHOODS_WGS84.shp")

This dataframe contains each neighbourhood and the coordinates that define the outline of its boundaries

We can take advantage of Geopandas/shapely functionality to extract the boundary coordinates for each area. The geometry column is of type shapely.coords.CoordinateSequence, so executing exterior.coords will return the exterior coordinates as a list. I then appended that back onto the dataframe in the geomlist column

df_neighbourhoods['geomlist'] = df_neighbourhoods['geometry'].apply(lambda x: list(x.exterior.coords))

Create a function for determining if a point lies inside the shape boundary

Having this information as a list allows us to take advantage of the slightly modified function from Craig Booth. The function, point_inside_polygon() accepts a set of coordinates and the “list-ified” boundary coordinates for a given neighbourhood and checks whether the coordinates fall within the boundary.

  • returns true if lat/long are within the bounds of the polygon
  • poly: a list of lat/long coordinates that define the boundary of an area
def point_inside_polygon(lat,lng,poly):
n = len(poly)
inside =False
p1x,p1y = poly[0]
for i in range(n+1):
p2x,p2y = poly[i % n]
if lat > min(p1y,p2y):
if lat <= max(p1y,p2y):
if lng <= max(p1x,p2x):
if p1y != p2y:
xinters = (lat-p1y)*(p2x-p1x)/(p2y-p1y)+p1x
if p1x == p2x or lng <= xinters:
inside = not inside
p1x,p1y = p2x,p2y
return inside

Test it out…

Ok, great… let’s test this out and see if a couple sets of coordinates fall within the Yonge-St. Clair neighbourhood.

The City of Toronto Open Data portal shows an outline of this area

Now let’s check the coordinates of a point that should fall within this area: (43.685052, -79.396575), shown here on Google Maps

The method returns true, indicating correctly that the point does fall in this area


poly = df_neighbourhoods.loc[0,'geomlist']
lat=43.685052
lng=-79.396575
point_inside_polygon(lat, lng, poly)
>> True

Now let’s try a point that falls just outside the boundaries of this area (43.698740, -79.400233):

The method returns False, which again is correct!

poly = df_neighbourhoods.loc[0,'geomlist']
lat=43.698740
lng=-79.400233
point_inside_polygon(lat, lng, poly)
>> False

Match Coordinates to Neighbourhoods

Now that we have confidence that the function is working as expected, we can loop through the main dataframe to find the neighbourhood for all the points of interest

def get_neighbourhood(row):
for ix, area in df_neighbourhoods.iterrows():
is_in_area=False
if row['latitude'] and row['longitude']:
is_in_area = point_inside_polygon(row['latitude'], row['longitude'], area['geomlist'])
if is_in_area:
#found area, exit
return area['AREA_NAME']
return ""
df_poi['neighbourhood'] = df_poi.apply(lambda row: get_neighbourhood(row), axis=1)

You can find the source code for this post on Github. In my next post, I’ll use this data to create a choropleth map in Altair

Helpful Resources

--

--