Methodology, Notes, and Why There’s No Map

An addendum to this article

Kevin B
2 min readApr 18, 2017

Data Analysis

The crash data in this article come directly from the NJ DOT website: http://www.state.nj.us/transportation/refdata/accident/

Data had to be downloaded as CSV by year and county. With that I used a homebrew Python script to filter for Jersey City and append the CSV files together. I then loaded the data in MS Excel, added headers, and created pivot tables and charts. To examine my work I’ve put the data, Python scripts, and Excel sheets on GitHub.

Limitations

I would have liked to do more to tease out information about pedestrian injuries and fatalities. The problem there is that I found evidence of faulty data. Specifically, I found a Jersey Journal article from 2009 that mentions two pedestrians being injured in a crash. I was able to find the corresponding row in my data-set but with zero ped injuries listed. It did list three total injuries. For that reason I have excluded pedestrian numbers from this article.

Where’s the map?

When I first got my hands on this data my first goal was to create a really nice map of Jersey City that showed where crashes occurred so people could learn more about their own neighborhoods and blocks. Like most great ideas, I wasn’t the first person to think of this. The impediment here is the quality of the data. Latitude/Longitude data, SRI, and street names were not consistent between the reports so there was no real way to plot this data onto a map.

Insurance Costs

In my section on economic costs I make some estimates of the insurance costs of crashes to Jersey City. A friend was able to point me toward this data-set from the National Association of Insurance Commissioners (NAIC)— http://www.naic.org/prod_serv/AUT-PB-13_2016.pdf.

From those tables I was able to pull Severity numbers for the Bodily Injury (BI), Property Damage (PD), and Collision (C) types of auto insurance. Severity is defined as losses divided by claims. Severity is broken down by state and I used the New Jersey data from the most recent year, 2013. When doing my calculations I averaged PD and C severity(s). I was told that PIP and Medical costs, also detailed in the NAIC report, were low enough to ignore for arguments sake.

I multiplied the BI severity dollar amount against the Injury count I observed and the PD&C severity average by the Crash counts. This gave me a number for Total Severity. In this case I am comparing the 2015 counts against the 2013 severity(s); however, these are the most recent years that were available. I then divided by the loss ratio to come up with the estimated premium costs to cover the injuries and crashes I observed.

A spreadsheet with my calculations is contained in my GitHub repo.

If you are able to make your own project from this data please let me know. My Twitter handle is @kevinaskevin

--

--

Kevin B

Interests include: bagels, feminism, manufacturing, econ, hiking, transit, and Jersey City. In that order? You decide.