Snapping Points to Street Segments

Michelle Ho
3 min readJun 26, 2018

--

The NYPD Motor Vehicle Collisions dataset is a record of all motor vehicle collisions in NYC. It captures the location, any injuries or fatalities, vehicle types involved and contributing factors in a collision. For location information, most collision records have a cross street, on/off street and latitude/longitude. From these given attributes, it’s not readily easy to join the collision locations to street segments from LION Single Line Street Base Map maintained by the Department of City Planning. Snapping these collisions to street segments is important in order to also capture street-level information like posted speed limits, number of travel lanes, and the presence of a bike lane.

For a project investigating predictors of motor vehicle collisions, my unit of analysis was street segments. I wanted to find a sum of collisions for each road segment during the year 2017. Then, after dividing by the length of the road, I would obtain a density of collisions across NYC road segments.

One simple approach would be to draw a buffer around the single line streets and count the number of intersecting collisions. However, the problem with this method is that collisions can be double counted, especially those at intersections.

Double counting where street segments intersect

A better method is to find the closest street segment to each point, group by each street segment, and sum the number of collisions that occur. Here’s a PostGIS code snippit that could accomplish this:

SELECT
roads.cartodb_id as cartodb_id,
count(collisions.cartodb_id) as collision_count,
sum(collisions.number_of_persons_killed) as sum_fatalities
FROM
(SELECT cartodb_id,
the_geom,
the_geom_webmercator,
number_of_persons_killed
FROM nypd_collisions_2017) AS collisions
CROSS JOIN LATERAL
(SELECT cartodb_id, the_geom, the_geom_webmercator
FROM lion_lines_nyc as roads
ORDER BY collisions.the_geom_webmercator <-> the_geom_webmercator
LIMIT 1) AS roads
GROUP BY roads.cartodb_id
Street segment in NYC styled by collisions density

Of course, this method isn’t bulletproof. For one thing, roads that are extremely close together or overlap one another (for example, an elevated highway like the BQE or a bridge overpass) might pick up collision points that didn’t actually occur on that road segment. In this case, it would be necessary to add a secondary check to see if the cross street or on/off street name from the collision dataset corresponds with the closest road segment. However, based on visual inspection, the points seem very close to the roads that they occurred on.

Another thing to consider is the case of collisions that occur on intersections. It’s more or less arbitrary which street segment will they be assigned to, based on the nearest query, if the collision point is actually on two line segments. Furthermore, we should consider intersections a separate unit of study, in addition to street segment. For now though, this is a pretty good picture of collision density on the NYC street network.

--

--