The Most Dangerous Roads in Montgomery County, Maryland

Published in

INST414: Data Science Techniques

4 min readDec 3, 2023

Continuing my analysis with Montgomery County, Maryland motor vehicle incidents, I decided to analyze the relationships between roads and injury severity next.

Previous article on my exploratory analysis with this data:

Montgomery County Crash Reporting — Exploratory Analysis

As a very curious individual, I searched for data with a goal to learn something about humans. I felt using…

medium.co

The specific insight I want to extract from this data now is rating Montgomery County roads based on the number of vehicle incidents taken place, and the injury severity level reported. This will provide a better understanding on what roads are more susceptible to having vehicle incidents. This will direct stakeholders’ attentions to the most impactful areas when it comes to vehicle incidents, allowing for better traffic control and overall efficiency.

Identifying the injury severity level of each report at the named roads will also allow stakeholders like public service officials to measure the “danger level” of the road. For example, if Georgia Ave has more reports of drivers with a suspected serious injury than no apparent injury, then this will require further inspection towards other factors that may be increasing this injury severity.

Data Collection and Cleaning

I found this data from Data.gov, and collected it using the Python requests library with the API endpoint provided:

resp = requests.get('https://data.montgomerycountymd.gov/resource/mmzv-x632.json?$limit=5100')

The output provides details of all vehicle collisions occurring on Montgomery County roads within the last 3 months. After receiving the JSON output, I converted the dataset to a pandas DataFrame for better readability. I noticed that there were additional columns all similarly named like so: “:@computed_region_xxxx_xxxx”. These were probably automated since the API collects data via the Automated Crash Reporting System (ACRS) used by Maryland State Police. These additional columns were not in the API documentation, as it only recorded 43 relevant features, so I removed the extra information not recorded.

Building the Network

I used NetworkX to populate the graph, but first I created an additional column in the DataFrame to represent the number of reports that match the same criteria. To do this, I created a dictionary to record all instances of each road, organized by injury severity level. After doing this, I created a new column in the DataFrame recording the total number of reports with the same road name and injury severity for each row. This acts at the weight for each edge. The more reports that are associated with the same road name and injury severity, the more weight the edge has.

#Count all injury severity answers
injury_severities = df['injury_severity'].value_counts()

road_counts = {}

#Iterate through injury severity list
for severity_level in list(dict(injury_severities)):
    #Get all reports with current severity level
    df1 = df[df['injury_severity']==severity_level]

    #Count all instances of roads reported with the injury severity
    road_counts[severity_level] = dict(df1['road_name'].value_counts(dropna=False))

#Iterate through each row, adding the total number of reports with the same injury severity and road name
for index, row in df.iterrows():
    df.loc[index, 'num_reports'] = road_counts[row['injury_severity']][row['road_name']]

g = nx.Graph() #Build the graph

#Populate graph
g = nx.from_pandas_edgelist(df, source='road_name', target='injury_severity', edge_attr='num_reports')

nx.write_graphml(g, "mod2.graphml") #Export graph

After exporting the graph in GraphML format, I used Gephi to visualize the network.

Defining “Importance”

For this network, I made sure to highlight the edge weights by coloring them based on the amount of each weight. The higher the edge weight, the lighter the edge color is. In this case, more important nodes have heavier edge weights as this translates to the most popular/frequented roads from the vehicle collision reports. With further analysis in Gephi, I can also highlight all edges from a particular node. To demonstrate this we will look at crashes on River Rd, New Hampshire Ave, and Georgia Ave.

As shown, the ratio of injury severities is more diverse for River Rd than New Hampshire Ave and Georgia Ave. This insight will lead stakeholders to direct their attention towards River Rd more as it seems to populate more serious injuries for drivers. To get a list of the top dangerous roads, I extracted centrality metrics from the graph and printed the top 15 nodes.

#Pick number of top central nodes to print
top_k = 20

#Calculate degree centrality for all nodes
centrality_degree = nx.degree_centrality(g)

#Sort node-centrality dictionary by metric, and reverse to get top elements first
for u in sorted(centrality_degree, key=centrality_degree.get, reverse=True)[:top_k]:
    print(u, centrality_degree[u])

VEIRS MILL RD 0.008143322475570033
RANDOLPH RD 0.006514657980456026
BRIGGS CHANEY RD 0.006514657980456026
GREAT SENECA HWY 0.006514657980456026
COLUMBIA PIKE 0.006514657980456026
RIVER RD 0.006514657980456026
NEW HAMPSHIRE AVE 0.006514657980456026
GERMANTOWN RD 0.006514657980456026
COLESVILLE RD 0.006514657980456026
CONNECTICUT AVE 0.006514657980456026
DARNESTOWN RD 0.006514657980456026
GEORGIA AVE 0.006514657980456026
WELLER RD 0.006514657980456026
LAYHILL RD 0.006514657980456026
EAST WEST HWY 0.006514657980456026

Limitations

This analysis may be biased due the the timeframe of the data that I collected. Since it is the holiday season, vehicle collisions are expected to occur more frequently than usual, so this may not best reflect the average Montgomery County environment. Another note is that there was large number of reports (400+) that did not have a road name recorded, so this may impact the accuracy of this analysis.