Reading Shapefile ZIPs from a URL in Python 3

This article takes inspiration from Andrew Gadius’ blog on the same topic, using updated libraries and Python 3 to achieve a similar effect. If you do not already have a working environment, check out Part One of this two-part blog post.

Prepare your workspace

In > jupyter notebook or your preferred Python 3 IDE, first import the relevant libraries. In Python:

import geopandas as gpd
import requests
import zipfile
import io
import matplotlib.pyplot as plt
%matplotlib inline # jupyter "magic" to display plots in notebook

Load the data

For this example we’ll download and graph a shapefile from the Census FTP of my home state, Florida. In Python:

url = 'http://www2.census.gov/geo/tiger/GENZ2015/shp/cb_2015_us_county_500k.zip'
print('Downloading shapefile...')
r = requests.get(url)
z = zipfile.ZipFile(io.BytesIO(r.content))
print("Done")
# z.extractall(path='tmp/') # extract to folder
filenames = [y for y in sorted(z.namelist()) for ending in ['dbf', 'prj', 'shp', 'shx'] if y.endswith(ending)] 
print(filenames)

Output:

Downloading shapefile...
Done
['cb_2015_us_county_500k.dbf', 'cb_2015_us_county_500k.prj', 'cb_2015_us_county_500k.shp', 'cb_2015_us_county_500k.shx']

You could have chosen to extract the files to a folder if you just want to download and unzip the files programmatically for use in a traditional GIS application, but we’re going to use the files directly in the notebook without saving to disk, so I’ve commented this line out.

From here, we can use a list comprehension to assign variable names to each of the four files, then read the shapefile into a geopandas dataframe. From here, check the number of records in the file and preview the data. In Python:

dbf, prj, shp, shx = [filename for filename in filenames]
usa = gpd.read_file(shp)
print("Shape of the dataframe: {}".format(usa.shape))
print("Projection of dataframe: {}".format(usa.crs))
usa.tail() #last 5 records in dataframe

Output:

Shape of the dataframe: (3233, 10)
Projection of dataframe: {'init': 'epsg:4269'}
Screencap of usa.tail()

Note: EPSG:4269 is the same map projection as WGS84

Plotting your data

Right away we can look at our data. In Python:

ax = usa.plot()
ax.set_title("USA Counties. Default view)");

Output:

Not exactly what we want, but very easy to implement!

It’s definitely a map! But let’s filter our dataframe to just include the counties of Florida. In Python:

fl = usa[usa.STATEFP=='12']

Geopandas is built on pandas and uses the same built-in plotting functions from matplotlib. Therefore many of the same matplotlib arguments work inside Geopandas commands, such as setting figure size, titles, axes labels or ticks, and coloring.

In the code below, I’ve added the arguments figsize to control the resulting plot size and cmap which specifies the color map of the plot. Use the matplotlib documentation to choose a nice color map. To make this a choropleth map, indicate the variable to display using the column argument and specify the scheme. By default, the scheme is set to “equal intervals” but can be set to any other pysal option. (For a more in-depth introduction to advanced plotting in matplotlib, see this helpful tutorial.) Make sure to run the following all in one cell, as Jupyter notebooks show plots by default, which clears the plot from memory and prevents savefig() from working as one might expect. In Python:

ax = fl.plot(figsize=(10,10), column='ALAND', cmap="tab20b", scheme='quantiles', legend=True)
ax.set(xticks=[], yticks=[]) #removes axes
ax.set_title("FL Counties by Land Area", fontsize='large')
#add the legend and specify its location
leg = ax.get_legend()
leg.set_bbox_to_anchor((0.5,0.3))
plt.savefig("FL_Counties.png", bbox_inches='tight')

Output:

Hello World , I’m your first geopandas map!

Lauren Oldja is a data scientist in Brooklyn, NY.