Start Mapping with GeoPandas

Alexandra Marshall
4 min readSep 27, 2021

--

Magazines are pretty much dead these days so readers may not remember that if you subscribed to National Geographic each month’s copy arrived with an additional map stuck between its pages. These beautiful maps, usually related to the magazine’s cover story, covered numerous subjects from specific geographic landmarks like the Grand Canyon, to more abstracted concepts like the structure of the earth’s crust, or how environments and landscapes changed over time. I loved spreading them out across out dining room table to pour over them, learning along the way that maps could contain more information then just “here’s where this thing is.”

May 1966 National Geographic Magazine map showing the globe-spanning origins of various flowers.
May 1966 National Geographic Magazine map showing the globe-spanning origins of various flowers.

Imagine my delight when I discovered that among its many libraries Python has one that, if you have a passing familiarity with Pandas and Matplotlib, allows you to easily create map visualizations. Called GeoPandas it expands on Pandas’ data structures, Matplotlibs’ plotting capabilities and the Shapely library’s spatial analysis functionality to analyze, manipulate, and visualize geodata.

GeoData Building Blocks

One of the most important things to know when working with geodata is what kind of object your information describes. GeoPandas data structures can hold three different types of geometric object, or feature, data:

  • Points (aka Multi-Points) — These can represent cities, businesses, sites of historical significance. Anything that could be described in real life by sticking a pin into a map. Points are defined by a single set of coordinates.
  • Lines (aka Multi-Lines) — Lines can be used to represent streets, contour lines, and boundaries. They are defined by at least two sets of coordinates, but can contain more as long as they remain open.
  • Polygons (aka … oh you know the drill) — Polygons represent “the shape and location of homogenous feature types such as states, countries, parcels, soil types, and land-use zones”*. Polygons are closed geometric objects with at least three sides, described by multiple pairs of coordinates.

Feature data is contained in a GeoSeries and each GeoDataFrame has at least one GeoSeries column, known as the ‘geometry’. The geometry identifies the data that GeoPandas’ methods will be applied to. GeoDataFrames can have multiple GeoSeries columns and while only one can be the active ‘geometry’ at a time that status can be reassigned.

Coordinate Reference Systems

The other key piece of information when working with a GeoSeries or GeoDataFrame is its CRS, or Coordinate Reference System. The CRS defines how your features are related to real locations on the earth. A map of the earth transforms a 3-d object into a 2-d representation, and distorts the surface of the globe to achieve this transformation. You need all the geodata you’re working with to have the same kind of distortion, and the CRS serves as the point of alignment for this, ensuring that everything is wrong in the same way.

Various map projections, each with their own distortions
As you can see from this chart of projections there’s no telling what Greenland actually looks like.

GeoPandas allows for the easy identification and reassignment of a GeoSeries’ CRS.

Mapping, aka The Fun Stuff

Once you have identified your geometries and made sure your data sources’ are in alignment GeoPandas’ .plot method makes it easy to start generating your visualizations. The below Jupyter Notebook goes over some examples of how this method is used, either alone or with other libraries.

Other Libraries

The final example in above uses an additional library, contextily, to pull in a base map from the web to plot the map over. GeoPandas’ visualizations can be enhanced by the use of other additional libraries such as the following:

  • Cartopy — generate publication quality maps
  • Bokeh — create interactive visualizations that can be displayed in web browsers
  • Folium — Visualize data and create interactive maps

The below Jupyter Notebook uses a GeoPandas to generate the centroid** of each zip code area in New York City and the Folium library to create an interactive map that assigns a marker to each point that contains the population data for that area.

Non-interactive Zip code area population map with interactive markers
image of Zip code area population map with interactive markers that I promise I didn’t Photoshop

Unfortunately there’s no way to embed a folium map in a medium article at this time but clicking here will take you to a version of the notebook that allows you to interact with the map. This example also seeks to give you code for simple visualization but the Folium library is capable of much more complex visualizations. If you’re interested in learning more I recommend this excellent tutorial.

Tip of the Iceberg

This article has focused on providing you with the minimum information needed to start creating map visualizations with GeoPandas but as we saw with the centroid method GeoPandas is also a powerful tool for data manipulation and analysis. More information on these capabilities, along with an excellent “Getting Started” guide, can be found in the the GeoPandas documentation if you’d like to learn more.

*https://desktop.arcgis.com/en/arcmap/10.3/manage-data/geodatabases/feature-class-basics.htm

**Centroids are the point in an area whose coordinates are the mean of all other points. If you were to print out an object on cardstock and cut it out the centroid is the point at which you could perfectly balance the shape on the head of a pin.

Balancing a shape on its centroid
Balancing on the centroid

--

--