Simple GeoSpacial Mapping with GeoPandas and the Usual Suspects
I haven’t written in a long time so I thought I would. I have recently had a need to start working on a geospatial plane and since I have NO experience doing such things, I thought I’d do something very simple… very very simple. Twenty-three lines of code simple.
I needed to get familiar with shapley and how it works while finding new tools to quickly render a lot of maps for trouble shooting.
Going into this I decided to read through a tutorial I found at Yhat. It was well written and easy to follow but I do not work in Jupyter Notebook. I had to amend some of the code to suit my purposes.
I will go through:
- Importing a shapefile using geopandas
- plotting the shapefile’s information with matplotlib
- working with polygons
- importing a plain csv into Pandas
- plotting individual points into aforementioned map
First, the imports.
- Pandas is a super cool data package. It makes dealing with large amounts of data almost trivial. (I also recently started using its SQL magic but it works mostly in memory :( ). This package shines in min/large scale numerical analysis.
- geoPandas is something new to me but so far, it seems like Pandas with methods and attributes specifically geared towards geography.
- Matplotlib is an oldie but a goodie
I used the link given in the Yhat article to download zip file, then I put the extracted file in my programs directory. The tutorial does this all programatically but I thought… why? This shapefile only concerns New York ,
I then used geopandas to read in the .shx file.
The is the head of the GeoDataFrame:

Now, the shape file is in a GeoDataFrame so all you need to plot it is:
We need to use plt.show() because geoPandas uses matplotlib to plot and because we are not in Jupyter notebook, we must call it explicitly.

Neat huh!?
It would also be cool if we lay some geometry over it.
The first plot is a convex hull. A convex hull is the smallest covext shape that contains all the points in the shapefile’s geometry.
The resulting plot is:

The second plot is an envelope. I am not quite sure of any use cases just yet but it’s nice to look at.

Notice that you must pass in gdf.plot() every time. This is because you need the original map under the the rendered polygons. Without it, New York will not appear. Like this:

Alright, Now I have a map and I want it so say something (This is where I veered from the tutorial).
So I go to Kaggle and I search “New York” (I know, I am a regular Sherlock). I find a pretty interesting Dataset concerning crime in New York between 2014 and 2015. I download the csv and plopped it in my project’s directory.
Pandas handled this job for me.
I dropped all rows that had an na because no one has time for that. This time the plotting is a bit different.
From what I understand, plt.scatter() does not create another “figure” so If there is one that hasn’t been rendered, it is put as another layer. The map looks like this:

Those are all the points in which crimes have taken place between 2014 and 2015 (with all the non-values removed)
This is just a dip of the toe. There is no much out there. I am going to move on to Folium next. I think those maps will be a lot more informative. The crime data is well done and I want to explore it and perhaps to some more in depth analysis.
