GeoPandas with NYC Harbor Water Data

A Quick Introduction to GIS with Python’s GeoPandas

New York City is composed of 3 islands and even the Bronx, the only part of New York City connected to the mainland, is a peninsula surrounded by water on three sides. How are these waterways effected by the 9 million people who live and work there?

How do you make data relatable? How do you summarize thousands of lines of data in one small digestible packet? The easy answer is visualizations. So great, how about a box and whisker plot? I hear groans and minds slamming shut. How about a line graph? Now the snoring starts. What about a pie chart? Now all of the statistically literate people are rattling their sabers. Visualizations as powerful as they are littered with pitfalls, they work for some people some of the time, but often the more engaging the less meaningful or even less accurate they are, and the more meaningful ones task the observer to absorb and integrate the details and structure before the message hits home. Enter mapped data, it is immediately relatable, just look at all the arts and crafts on Etsy in the shape of states or printed with city maps. The observer is immediately engaged and oriented, no matter their technical level. The perspective of a map is a visualization we silently and universally agreed upon the first time someone marked ‘X’ for the treasure and we have been evolving them ever since. Maps transcend language and culture, harnessing their power for data insights make them the smart answer.


Overview

GIS, geographical information systems come in two basic flavors, raster and vector. Vector data encodes geographical information as sets of latitudes and longitudes for points on earth, and lines and polygons that connect those points. Vector data are geographical data, like Google Maps. Raster data are image files of geographical systems, like Google Street View or Google Earth. The focus here will be on vector data representing components of the New York City sewer system to examine the impact wastewater has on the waterways that surround the city.

GIS data is readily available from many open data projects. Much of the data visualized here can be found at NYC Open Data. Many other cities make their GIS data available, as well as the National Parks Service. GIS data comes in a Shapefiles that can be readily found online.

The Shapefile is a folder of at least four different file types. GeoPandas is designed to read in all that data in a single line to create a dataframe with a geometry component.

Just reading in the folder with all the shapefiles is sufficient.

The geodataframe created looks like any other dataframe with the addition of a geometry feature.

The geometry in this file is a polygon, which is a set of longitudes and latitudes that are the vertices of the shape. These shapes can be immediately plotted.

The axes display the latitude and longitude.
The geometry may be accessed individually as well.

Specifics
The details of the map can be adjusted by setting the parameters of the plot as with other matplotlib visualizations.
Here the color and edge color are specified as well as the layout of the axes.
The intensity of the coloring can be controlled with the cmap parameter.
Varying the intensity makes visualizations that are accessible to the color blind.
GIS data may include lines rather than polygons.
Though each GeoPandas dataframe will have a unique geometry type, either points, lines or polygons. There are no mixed types.
Layering maps brings it all together.

A key element in layering maps is the coordinate reference system(CRS). The CRS of mapping data is the function that projects the earth, which is spherical and 3-dimensional, onto a flat 2-dimensional surface. There are different types of projections that have arisen in the history of map making. Before layering the CRS of the various GeoDataframes need to set to agree with each other.

Points can be sized according to associated values in the dataframe. However, this cannot be done with a geodataframe, but it can be done by with a pandas dataframe the the latitude and longitude as features.

The next level is to make the maps interactive. This gives people to be able to explore the waterways near their home neighborhoods and can be done with mpleaflet.

All of the data is layered on an existing map with labeled street and geographical information.
The actual interactive map can be accessed here:

Summary

Maps are an effective way to communicate insights with data that are both meaningful and accessible. GeoPandas is an excellent way of bringing these visualizations to life. The steps to get started are only a few lines of code. The sophistication of these maps can be increased by layering various data sources. The code for these maps and others can be found here: