How We Can Predict Roadway Accidents

By: Jeremy Roland

Published in

Center for Urban Informatics and Progress

4 min readOct 6, 2020

CUIP researchers have developed an intelligent computer model that can help predict where some roadway accidents will take place. How does this work? How did this project come to be?

Backstory

In early 2018 CUIP was approached by the Hamilton County Emergency Service District. They presented our team with records of traffic crashes in Hamilton county and asked, “What can we do with this data?”

From there, we began collecting the data, organizing it, and establishing a method to collect new data daily. Then we began analyzing trends and patterns.

The data consisted of records going back to early 2010s, but it was incomplete. It wasn’t until October 2016 that the Hamilton County Emergency Service District had collected complete and consistent records of traffic crashes.

Justification

The Crash Prediction Project can provide our local law enforcement with a tool that tells them where crashes are most likely to occur. Through this, they can better allocate resources and more efficiently respond to emergencies.

This is done through 2 methods: crash prediction and prescriptive analysis.

The goal of the project is to provide law enforcement with a live service application that will inform them of the areas in Chattanooga with the highest probability of traffic crashes. This is the crash prediction side of the project. Emergency services can place their officers at key locations to be closer to where these crashes will be happening, reducing response times to these crashes.

Prescriptive analysis refers to altering the infrastructure of the roadways to make high-probability areas safer. This could involve adding a stop or yield sign, a stoplight, changing from 1 lane to multi-lane, or moving structures that are too close to the roadways, etc. Suggestions for infrastructure change will be given to local government agencies to approve certain changes.

Methods

Upon receiving crash records, they are cleaned of any duplicate calls, as sometimes multiple people call in the same crash. Once cleaned, roadway geometric data is applied, including the number of lanes, pavement type, and the type of terrain. The spatial attributes are then aggregated, and each crash is assigned to a hexagon. This hexagon layout of Chattanooga includes 694 hexagons, with each hexagon covering 0.2 square miles.

Spatial aggregation came from a necessity to simplify the prediction area, as previous prediction attempts for the project resulted in an excessively high count of false positives, where the model says there’s a crash where one didn’t occur.

Once the basic data manipulation is finished we use a method of data sampling called negative sampling to provide class balance to our dataset. Early prediction attempts for the project used solely a dataset of crash records, which resulted in inconsistent results. Sometimes, the model would give us an excessive amount of false positives, and other times an excessive amount of false negatives. Negative sampling is used to generate non-crash records by altering the temporal and spatial information of crash records.

Through the use of non-crash records, our prediction model has the knowledge to predict non-crashes. The best performing method of negative sampling involved fixing the hour and date of a crash, while changing its location information. Through extensive testing, we found that an even 50–50 split of negative-to-positive data was the best for predicting crashes. In other words, for every crash entry, there exists a non-crash entry.

These predictions are done using a simple logistic regression model, with newton-cg acting as solver and class weight set to balanced. The data is fed to the model for learning, and all predictions are done on dates that are not included in the main dataset. All modeling was done using the traffic crash data from 2017–2019, with 2020 dates acting as our prediction dates.

To evaluate our model’s predictive capabilities, we use recall and specificity. Recall is the percentage of correctly predicted crashes, and specificity is the percentage of correctly predicted non-crashes. We currently have a recall of 70.01% and a specificity of 83.43%.

Future

The future of this project is to provide this predictive service as a live service to local emergency responders and be used on an as needed basis for prescriptive analysis for the local city government. Another main benefit of this project is that it uses widely available data. No highly specific data was used in this project, such as driver or vehicle specific information, because that information was not available in our area of study. While initially a draw back, it proved to be of great benefit to our project’s portability. If a different city wishes to implement the algorithm used for our project, they would only need the crash records and some basic location information.

The Center for Urban Informatics and Progress is a smart city research center at the University of Tennessee at Chattanooga. CUIP is committed to applied smart city research that betters the lives of citizens every day. For more on the work we’re doing and our mission, visit www.utc.edu/cuip.

How We Can Predict Roadway Accidents

By: Jeremy Roland

Backstory

Justification

Methods

Written by Reid Belew