Counting Stuff in Geopandas

5 min readOct 18, 2021

In my last blog entry I described python’s Geopandas library with a focus on using it, and some related libraries like Folium and contextily, to create map visualizations. Pretty pretty maps are only one side of the Geopandas coin as you may remember it builds on the spatial analysis capabilities of the Shapely library, making it a powerful tool to analyze geodata.

What kind of analysis you might ask? Types that you see everyday when you use apps like Yelp, or search for businesses on Google Maps: How many restaurants are within one mile of my location? How many hospitals are in my zip code area?

screenshot of Google Maps — After working with Geopandas Google Maps just isn’t as impressive…

To demonstrate what you can do with Geopandas I’m going to walk you through a simple example using two shapefiles:

- One containing the building footprints of all the individual landmark sites in Brooklyn.
- One containing the zip code area boundaries for Brooklyn.

Both shapefiles were downloaded from NYC Open Data and some processing was done “off stage” for the purposes of this tutorial:

The CRS of the landmark shapefile was reset using Geopandas’ set_crs method to match that of the area zip code file
Both files were reduced to only contain landmarks and zip code areas in Brooklyn

To begin with we’ll want to import the necessary libraries and set our shapefiles as GeoDataFrames.

import geopandas
import matplotlib.pyplot as plt
%matplotlib inlinebrooklyn_zip = geopandas.read_file(‘Data/Brookly_Zips.shp’)
brooklyn_landmarks = geopandas.read_file(‘Data/Brooklyn_Landmarks.shp’)

Then it’s always a good idea to plot your data to make sure that everything is lining up as expected:

#Do yourself a favor and check out your matplotlib color options at
# https://matplotlib.org/stable/gallery/color/named_colors.htmlbase = brooklyn_zip.plot(ax = ax, color=’mediumaquamarine’, edgecolor=’gainsboro’)
brooklyn_landmarks.plot(ax=ax, color=’royalblue’)ax.set_title(“Brooklyn Landmarks”)
ax.axis(‘off’)

Map of Brooklyn zip code areas overlaid with the locations of NYC landmarks — Map output from above code

So now we have a map that shows us where landmarked properties are in Brooklyn and how that relates to the zip code areas. But what if you need some hard numbers? How can we get the count of landmarks in each zip code? The brooklyn_landmarks GeoDataFrame doesn’t contain the full addresses of the landmarks so we can’t groupby() the zip codes; instead we’re going to get that information by combining our two GDF’s.

This is where the Geopandas overlay function comes into play. Overlay allows us to create new shapes based on where the two GeoDataFrames overlap and outputs a GeoDataFrame which contains not only the new shape but also the attributes from both inputs (like a pandas join with more geometry).

Here you can see the types of methods that can be applied to the shapes in the two files:

Shows visualizations of intersections, unions, sypmetrical differences, and differences — Source: https://geopandas.org/docs/user_guide/set_operations.html

In this case, since we want to count the landmarks in the zip code areas we’ll be using the “intersection” method which returns a new shape for every area where the two GeoDataFrames overlap.

contains = geopandas.overlay(brooklyn_zip, brooklyn_landmarks, how = ‘intersection’)

Taking a quick look at contains.info() you’ll see columns from both files — the ones carried over from the zip code shapefile are in all-caps:

<class 'geopandas.geodataframe.GeoDataFrame'>
RangeIndex: 214 entries, 0 to 213
Data columns (total 31 columns):
 #   Column      Non-Null Count  Dtype   
---  ------      --------------  -----   
 0   ZIPCODE     214 non-null    object  
 1   BLDGZIP     214 non-null    object  
 2   PO_NAME     214 non-null    object  
 3   POPULATION  214 non-null    float64 
 4   AREA        214 non-null    float64 
 5   STATE       214 non-null    object  
 6   COUNTY      214 non-null    object  
 7   ST_FIPS     214 non-null    object  
 8   CTY_FIPS    214 non-null    object  
 9   URL         214 non-null    object  
 10  SHAPE_AREA  214 non-null    float64 
 11  SHAPE_LEN   214 non-null    float64 
 12  address     214 non-null    object  
 13  bbl         214 non-null    float64 
 14  block       214 non-null    float64 
 15  borough     214 non-null    object  
 16  date_des_d  214 non-null    object  
 17  time_des_d  214 non-null    object  
 18  landmark_t  214 non-null    object  
 19  lot         214 non-null    float64 
 20  lpc_altern  52 non-null     object  
 21  lpc_lpnumb  214 non-null    object  
 22  lpc_name    214 non-null    object  
 23  lpc_site_d  213 non-null    object  
 24  lpc_site_s  214 non-null    object  
 25  objectid    214 non-null    float64 
 26  objectid_1  214 non-null    float64 
 27  shape_area  214 non-null    float64 
 28  shape_leng  214 non-null    float64 
 29  url_report  214 non-null    object  
 30  geometry    214 non-null    geometry
dtypes: float64(11), geometry(1), object(19)
memory usage: 52.0+ KB

With our data consolidated we can group the new GeoDataFrame by the ‘ZIPCODE’ column and use .size() to get a count of the number of landmarks in each area code:

groups = contains.groupby(‘ZIPCODE’).size()# Display the 10 zip codes with the highest number of landmarks 
print(groups.sort_values(ascending=False)[:10])ZIPCODE
11201    43
11216    15
11215    14
11238    13
11211    11
11205     9
11226     8
11221     8
11213     7
11222     7
dtype: int64

To make this data easier to visualize we’re going to map the groups data into a new column of the brooklyn_zip GeoDataFrame called “num_landmarks”:

brooklyn_zip[‘num_landmark’] = brooklyn_zip[‘ZIPCODE’].map(groups)#some zip code areas did not contain any landmarks and we need to fill the #null value cells or they polygons will not be mappedbrooklyn_zip[‘num_landmark’] = brooklyn_zip[‘num_landmark’].fillna(0)

Finally we come to the visualization itself. The below code will generate a choropleth map color-coding each zip code area by the number of landmarks it contains. Note that .plot() is using “natural_breaks” as the classification scheme. This is a “optimal” classification scheme that minimizes within-class variance and maximizes between-class differences.*

import contextily as cx
fig, ax = plt.subplots(figsize=(10,10))brooklyn_zip.plot(ax=ax,column = 'num_landmark',scheme='natural_breaks', k=4, vmin = 0, cmap = 'cool',
                  edgecolor= 'gainsboro', alpha=.85, legend = True, figsize= (10,10))
cx.add_basemap(ax, crs=brooklyn_zip.crs, source=cx.providers.Stamen.TonerLite)ax.set_title("Brooklyn Zip Code Area Landmark Count")
ax.axis('off')

Map of Brooklyn area codes color coded with an cmap set called “cool”. It’s very 80s, ranging from lower values set to a bright turquoise to high values in bright magenta. — Sadly the ‘cool’ cmap color scheme doesn’t quite live up to its name…

This way of using overlay isn’t the only Geopandas tool that will show you how many of one object lie within another — there’s also the contains() method. Unlike the overlay function, which returns a new GeoDataFrame merging the data from both sets used, contains simply returns a Series where the valueTrue is given for each geometry that contains the input geometry.

test_landmark = brooklyn_landmarks[‘geometry’][0]
print(test_landmark)
results = brooklyn_zip[‘geometry’].contains(test_landmark)
# the index of the True value in results will give us the index of the row with the correct zip code in brooklyn_zip
# in this case row 13 holds the data on the 11201 zip code
results0     False
1     False
2     False
3     False
4     False
5     False
6     False
7     False
8     False
9     False
10    False
11    False
12    False
13     True .....

This is, as I’ve said before, only the tip of the iceberg when it comes to Geopandas spatial analysis capabilities, and I hope this blog entry has given you a base from which you can explore further on your own.

Counting Stuff in Geopandas

Written by Alexandra Marshall