Mapping Detroit’s high-poverty areas and their access to public transit using geopandas, Mapbox, and Turf.js

Dan Henri
8 min readMay 17, 2018

--

This article/tutorial will be mostly the process of how I go about making a map which shows the areas of high poverty in Detroit, and their access to public transport. For a little background, Detroit generally has terrible public transit compared to other large US cities. In 2016, three counties in the Metro Detroit area voted on a comprehensive transportation bill which failed to pass. There was a plan to put another plan up to a vote in 2018, which is now in jeopardy because one country administrator.

This exercise will attempt to investigate a small part of this story: poor residents’ access to public transit. I’m defining access here by the proximity of a transit station. This won’t actually reveal how useful any station or line is for an individual. You can have a bus stop right outside your house and still have a 2 hour commute if you depend on transit. But that and other issues are outside of the scope of this project. Without further ado, let’s get to our data.

Data Gathering

As far as I know, the best data to get for this project would be American Community Survey data from the Census Bureau. First thing I’ll do is get my census tract boundary data. A census tract is the second smallest geographical unit that the Census Bureau breaks their data into. Because we are mapping at quite a small scale, a smaller unit of geography would be nice, but the statistical margin of error for our data is too high for my liking at that scale. Instead of getting this data from the Census Bureau, I’m going to go to the official Detroit Open Data portal. We can get the data in GeoJSON format here. Now throw it into Mapshaper to have a look:

Yes, that hole is supposed to be there.

Looks like Detroit, so we’re on the right track. I named this file “detroit-tracts.geojson”, for reference. Now, let’s find data on the poverty rate by block group.

To get this data we’ll use the US Census Bureaus API. Personally I had a really difficult time getting any of my calls to work, so I ended up using the Python package “census” and “us”. It would be easier to just use a curl command in the terminal but I just couldn’t make it work. Below are the commands I used in a Python 3 Jupyter Notebook to download, view, and export the data as a csv:

from census import Census
from us import states
import pandas as pd
import json
##Download census data for all of Wayne Country
c = Census("YOUR_KEY_HERE")
pov_list = c.acs5.state_county_blockgroup(('NAME', 'B17001_001E'), '26', '163', Census.ALL)
##Convert data to json format
pov_json = json.dumps(pov_list)
##Put in pandas dataframe
pov_df = pd.read_json(pov_json)
pov_df.head()
Output of the dataframe.head() command
##export census data as csv
pov_df.to_csv("pov-data.csv", index=False)

B01003_001E and B17001_002E are the estimates for the estimated total population in the census tracts and the estimated number of people living below the poverty line respectively. It also seems like you can’t download census tract info by city, so I downloaded all of Wayne County, which contains all of Detroit. To use the Census package you will need your own census api key, which you can download here.

The next step is to merge our two datasets. I used a separate Jupyter Notebook also running Python 3 for this. If you use Python for anything to do with data you probably are aware of pandas — a library that makes it easy to handle and manipulate data — but you may not be aware of geopandas. Geopandas is a library based on pandas which allows you to handle and manipulate geography-encoded (spatial) data, for instance our geojson file. It also allows you to perform some spatial processing functions. Geopandas is great because it basically works the same as pandas, so if you already know pandas it’s easy to pick up. I’ll use it just a bit in this notebook:

import pandas as pd
import geopandas as gpd
import numpy as np
##import census data we just downloaded to pandas dataframe
pov_df = pd.read_csv('pov-data.csv')
##import Detroit geojson file from earlier using geopandas
tracts = gpd.read_file("./detroit-tracts.geojson")
##view tracts data
tracts.head()
output of tracts.head()

The above shows how the tracts.head() output as it appears in a notebook. there are more columns to the right as well. It would help to reduce file size and increase readability to drop any unnecessary columns, which is what is occurring below:

##drop unnecessary columns and rename 'tractce_10' to 'tracts'
tracts = tracts.drop(tracts.columns[np.r_[0, 1, 2, 5, 7, 8, 11, 12, 13]], axis=1)
tracts = tracts.rename({'tractce_10':'tract'}, axis=1)
tracts.head()
Output of tracts.head()

That’s much better. We’ve reduced the columns so they at least all fit in the notebook view without having to scroll to the right. By the way, see that “geometry” column? That’s Geopandas in action. That column stores all of the geographic data for each feature. In this case that’s polygons representing Census Tracts. I’ve also kept a few columns in the dataframe which we won’t use right away, but may be useful in the future.

The next step is to change the data types stored in the tracts dataframe to something numerical. I’m not sure why the values end up as strings sometimes, but if we don’t change them, we won’t be able to join our two data sets together. The following code changes the features into either

tracts['shape_area'] = tracts['shape_area'].astype(np.float64)
tracts['awater_10'] = tracts['awater_10'].astype(np.int64)
tracts['geoid_10'] = tracts['geoid_10'].astype(np.int64)
tracts['aland_10'] = tracts['aland_10'].astype(np.int64)
tracts['tract'] = tracts['tract'].astype(np.int64)

Now for the part you’ve all been waiting for, to join our two data sets into one so that each census tract is associated with its correct poverty data. To do this we’ll use Pandas’ .merge() function with the how = 'inner' option. This is basically the same as an inner join in SQL. If you’re not familiar with SQL joins that’s ok. The thing that is important to know is that this next part will find the column called “tract” in both our geographic and poverty datasets, and will combine them based on the number in those columns. If the numbers match, the associated data will all be put into the same row in the dataframe. If you want to know more, check out the pandas documentation or this stack overflow question about SQL joins. Below is the actual code:

tracts = tracts.merge(pov_df, how='inner', on='tract')
tracts
tracts output

Feel free to explore the output of tracts to make sure it worked correctly.

The next thing to do is drop the rows that make up Belle Isle, where no one lives. This isn’t necessary to do but it I think it will make our map clearer. These rows are id 296 to 309 (you can tell because they’re poverty data columns both read zero and have a higher water area than the others:

tracts = tracts.drop(tracts.index[296: 310])

We also need to create a column which contains a number representing the ratio of people in poverty for each census tract. As of now “B01003_001E” represents the total estimated population of a census tract and “B17001_002E” represents the estimated number of those people under the poverty line. To get our ratio we just need the outcome of “B17001_002E” / “B01003_001E”:

tracts['pov_rate'] = tracts['B17001_002E'] / tracts['B01003_001E']
tracts
tracts output

As you can see in the tracts output, we now have a column with a poverty rate for each

Now the last thing to do in this Python notebook is extract the data. I extracted it as GeoJSON and called it “pov-rate.geojson”.

tracts.to_file('pov-rate.geojson', driver='GeoJSON')

Great! We’ve gathered and cleaned up our spatial poverty data. Now we will work on displaying it for the web.

Getting Set Up with Mapbox

There’s a couple of different methods you can use to style a basemap for your final map. I’m using Mapbox because I think it looks nice, and it integrates with Mapbox GL JS for the display of the basemap and data, which is a very cool new Javascript library from Mapbox. You’ll need to make an account (free) to begin. Once you do that, you’ll need to create and style your basemap. I’m not going to get into those details because Mapbox already had a good tutorial on how to do this. Just replace the data they use in that tutorial with our data we’ve prepared above. You can choose your own colors ramp, number of categories, and breaks. I made a category for every ten percentage point of poverty rate increase until I hit 50%. So I made a step at 10% poverty, 20%, 30%, 40%, and 50%, for a total of six colors. I used ColorBrewer2 to help choose a color scheme. I also created a category with the color black for the three tracts in our data without any people in them.

Displaying with Javascript using Mapbox GL JS

To display this map using Mapbox GL JS, we’ll need to start coding. Again, I won’t be getting into the details of this part either as Mapbox has yet another good tutorial about how to do this. The only thing I’ll add is that when you are writing the code which displays the poverty rate when a tract is moused over, remember to use Math.floor(tracts[0].properties.pov_rate * 100) in order to display the actual percentage and not the decimal ratio.

If all goes well, you should end up with something like the picture below: an interactive map of poverty in Detroit! For a working product click here.

This concludes part one. I’m hoping that for part two I can integrate turf.js into this app in order to measure access to transport and any potential disparities between poorer and wealthier areas in the city. I hope to make it interactive and fun to investigate, so keep an eye out for it!

--

--

Dan Henri

I'm interested in urban planning issues, maps and data, and immigration.