Using Data Science to Help Make Dangerous Bus Stops Safer in Detroit

Geoff P
5 min readJul 15, 2020

--

(Author’s Note: this article was originally written in April 2019 but is just being published now, July 2020.)

For Ford’s City:One Challenge in Detroit this year, $100k+ in grants will given to winners to fund pilots that best try to solve mobility and transportation problems in Detroit. The genesis of this program is in large part the Go Detroit Challenge, which finished in 2017, where a number of ethnographic studies and community workshops were performed and led by the City of Detroit in partnership with Ford to determine what the most pressing transportation problems were for city residents, and how they could best be solved. One especially large problem in Detroit that came up again and again in these workshops was the lack of safety while waiting for the bus, especially for women and high school students.

Detroit Department of Transportation (DDOT)

As a data scientist supporting this year’s challenge, one major goal is to see if I can use data science to best pinpoint where and when pilots should be implemented. The community workshops and ethnographic studies dovetail nicely with the data science by first shedding light on what problems are most pressing and most important for community members to have solved, and second informing what data should be looked at, and which features should be extracted from that data.

I began by looking at Detroit Police 911 Calls for Service, subsetting to assaults and related calls, and then subsetting those calls to only calls within 50 meters of a DDOT bus stop. I then normalized the number of calls by the number of workers and residents (as a proxy of ‘daytime population’) in the census tract a bus stop is in. My hope was to look for patterns in the data, first temporally (which I will present here), and then spatially to see if there were any outlier points to potentially target for safety interventions.

For exploring the time series data, looking first at the last 2.5 years of 911 calls, there doesn’t seem to be any noticeable seasonality or longitudinal trends.

Looking at the time series by hour of week, there does seem to be a bit of a spike Sunday afternoons… but is it both significant and meaningful?

Time Series Looking at Full Time Series, and Hour of Week
Fourier Decomposition of 911 Call Data

Using a Fourier Decomposition to look at the entire spectrum of frequencies the call data may exhibit, the largest spike is at 37 days — meaning, every 37 days there is a significant peak and trough in the number of calls (see chart above). However, this spike is only 3 standard deviations above the mean, and implementing an intervention to target a spike once every 37 days seems a bit… difficult to convince others of the significance. Instead, perhaps we can find some spatial patterns?

I used a localized spatial autocorrelation analysis to determine if there were hot spots in the city that would be a good place to implement interventions to curb dangerous bus waiting conditions. (Full code / description of analysis to be added later… or if/when appropriate — unsure of the audience for this blog post) And it turns out that, yes! There are hot spots! I used a voronoi diagram to divide each bus stop into its own polygon of its closest points, then looked to see when polygons were most spatially similar to their neighbors, as well as at least one standard deviation above the mean in terms of crime incidents. Those hot spots are presented in the maps at the bottom.

The next pass was trying to dive into a relationship between time and space — one that I have not quite been successful at showing — but ripe for future work. Map presented below.

I also built a regression model looking at demographic and environmental variables to try to predict the weighted crime incidents per resident + worker — I included a number of variables to capture the surrounding built envorinment as well as the surrounding demographic features (see chart below)— however, this simple linear regression model was only able to explain 5.7% of the variability of the response data around its mean (i.e. R-squared: 0.057).

Regression Model Inputs and Directional Impact
Correlation Matrix (correlations with weighted crime incidents per resident + worker were all low)
Hotspot Detection Map for DDOT Bus Stop Crime

Conclusions (for now): pattern recognition across space and time is difficult, and there doesn’t seem to be any interpretable temporal patterns that jump out in this data, and demographic and environmental variables did a poor job of predicting crime rates. However, there do seem to be a number of spatial ‘hotspots’ that would be a good place to start when looking for places to launch a pilot. However, all this is not to be taken at face value — physical and sexual violence is woefully under-reported, especially in Detroit, but this is where the value of community interviews and ethnographic studies can further help solve this difficult problem. A next step may be to look and see which areas of the city would have the most number of vulnerable people waiting for a bus (women, high school, along low-frequency bus routes), and look to start interventions there.

Next steps: taking an additional look at the time / space relationship. Talking to subject matter experts at the city, in the communities, at DDOT, and at Detroit Public Schools to come up with additional features to look at. Ultimate next step is to help inform and guide pilots in the City:One Challenge to maximize their impact in benefiting the most vulnerable populations in Detroit.

About the Author: Geoff is a data scientist working at Ford on the GDI&A Smart Mobility Analytics team. You can contact him via email at gperrin8@ford.com or on twitter at @moderngeoff — additional articles are at geoffperrin.com/articles.

--

--

Geoff P

urban data scientist — current: spatial solutions engineer, urbint past: ford smart mobility, bloomberg fellow @ detroit land bank authority, NYU's CUSP