Return to Reaper’s Road: Not Even the Sidewalk is Safe

Jason Kibozi-Yocka
11 min readDec 15, 2019

--

Are pedestrians at an increased risk of death even as traffic fatalities decline?

Credit: tedmcdonald

Are the number of traffic-related pedestrian deaths on the rise even as the number of traffic fatalities drop? This is the question I asked myself after having read a Verge article titled Drivers Killed the Most Pedestrians and Bicyclists in Almost 30 Years [1]. I’ve done analyses on NHTSA data in the past, including an article titled Reaper’s Roads: Who’s Likely to Die in a Car Crash [2], where I looked at the demographics of victims of traffic fatalities, and in another article titled Be Safe: What’s the Safest Mode of Transportation for Commuters [3], though I’d never seen this up tick in the number of traffic-related pedestrian deaths over the years. In an effort to find my own answer to this question, I returned to a National Highway Traffic Safety Administration (NHTSA) dataset that I’d used in the past.

Returning to the NHTA’s Fatality Analysis Reporting System (FARS) repository of accident reports dating back as far as 1975, I grabbed all the records from 2000 to 2018. Through the use of Jupytr Notebook and Python 3, in conjunction with MySQL and Tableau, I proceeded to clean and reshape my data for analysis.

You can access my full Jupytr notebook here.

TL;DR? No problem, just jump down to the Conclusion section.

Data Cleaning

Because of the amount of datasets that I was using (which came in the form of 19 .csv files), I decided that the most effective means of cleaning my data was to create a function that could take any one of my data files and spit out a cleaned version of said file. To do this, I began by created dictionaries and lists that would serve as guidelines for what my function would do.

I started by creating a list containing the names of the columns I wanted to keep:

# create a list of columns to keep
keepList = ['STATE','ST_CASE','VEH_NO','PER_NO','COUNTY','DAY','MONTH','HOUR', 'MINUTE','HARM_EV','SCH_BUS','MAKE','MOD_YEAR','AGE','SEX','PER_TYP','INJ_SEV','DRINKING','DRUGS','HISPANIC','RACE','DEATH_MO','DEATH_DA','DEATH_HR','DEATH_MN',]

Then I created a dictionary that attached state names to their FARS numbers:

# STATE
# create a dictionary that pairs state_numbers with their corresponding state_names
stateDict = {1: 'Alabama',2:'Alaska',4:'Arizona',5:'Arkansas', 6:'California',8:'Colorado',9:'Connecticut',10:'Delaware',11:'D.C.', 12:'Florida',13:'Georgia',15:'Hawaii',16:'Idaho',17:'Illinois', 18:'Indiana',19:'Iowa',20:'Kansas',21:'Kentucky',22:'Louisiana', 23:'Maine', 24:'Maryland',25:'Massachusetts',26:'Michigan', 27:'Minnesota',28:'Mississippi',29:'Missouri',30:'Montana', 31:'Nebraska',32:'Nevada',33:'New Hampshire',34:'New Jersey',35:'New Mexico',36:'New York',37:'North Carolina',38:'North Dakota', 39:'Ohio',40:'Oklahoma',41:'Oregon',42:'Pennsylvania',43:'Puerto Rico',44:'Rhode Island',45:'South Carolina',46:'South Dakota', 47:'Tennessee',48:'Texas',49:'Utah',50:'Vermont',51:'Virginia', 52:'Virgin Islands',53:'Washington',54:'West Virginia', 55:'Wisconsin',56:'Wyoming'}

I continued to create dictionaries, like the one above, for each of my columns attaching value names to their FARS values. I was able to do this by referencing FARS 2018 CRSS Coding and Validation Manual.

After creating my lists and dictionaries, I proceeded to create my function:

# create a function that handles cleaning for FARS datasets
def cleanFARS(filename,year):
# read in dataset as dataframe
farsDF = pd.read_csv(filename)
farsDF

# drop unnecessary columns in dataframe
dropList = list(farsDF.columns)
for i in keepList:
if i in dropList:
dropList.remove(i)
farsDF.drop(columns=dropList,inplace=True)

# convert column codes to name values
for col in list(keepDict.keys()):

if col == 'COUNTY':
farsDF[col].map(lambda x: countyDict[x] if x in countyVals else x)
else:
farsDF[col] = farsDF[col].map(keepDict[col])
# https://stackoverflow.com/questions/20250771/remap-values-in-pandas-column-with-a-dict
# drop rows in dataframe where column value contains NaN
#farsDF.dropna(axis=0,subset=[col],inplace=True)
farsDF.reset_index(drop=True,inplace=True)

# combine 'DAY' and 'MONTH' into 'DATE' column
dateList = []

for i, row in farsDF.iterrows():
date = str(row['MONTH'])+' '+str(row['DAY'])+', '+str(year)
if '88' in date:
dateList.append('N/A')
elif '99' in date:
dateList.append('Unknown')
else:
dateList.append(date)

farsDF.drop(columns=['DAY','MONTH'],inplace=True)
farsDF.insert(4, 'DATE',dateList,True)

# combine 'HOUR' and 'MINUTE' into 'TIME' column
timeList = []

for i, row in farsDF.iterrows():
if 'N/A' in str(row['DATE']) or 'Unknown' in str(row['DATE']):
time = str(row['HOUR'])+':'+str(row['MINUTE'])
else:
time = str(row['DATE']) + ' ' + str(row['HOUR'])+':'+str(row['MINUTE'])
if '88' in time:
timeList.append('N/A')
elif '99' in time:
timeList.append('Unknown')
else:
timeList.append(time)

farsDF.drop(columns=['HOUR','MINUTE'],inplace=True)
farsDF.insert(5,'TIME',timeList,True)

deathList = []
for i, row in farsDF.iterrows():
deathDay = str(row['DEATH_MO'])+' '+str(row['DEATH_DA'])+', '+str(year)
deathTime = str(row['DEATH_HR'])+' '+str(row['DEATH_MN'])
if '88' in deathDay:
deathList.append('N/A')
elif '99' in deathDay:
deathList.append('Unknown')
else:
if '88' in deathTime or '99' in deathTime:
deathList.append(deathDay)
else:
death = str(deathDay) + ' ' + str(deathTime)
deathList.append(death)

farsDF.drop(columns=['DEATH_MO','DEATH_DA','DEATH_HR','DEATH_MN'],inplace=True)
farsDF.insert(0,'DEATH',timeList,True)

# reorder columns
farsDF = farsDF[['STATE','COUNTY','ST_CASE','DATE','TIME','VEH_NO','SCH_BUS','MAKE','MOD_YEAR','PER_NO','PER_TYP','AGE',
'SEX','RACE','HISPANIC','HARM_EV','INJ_SEV','DEATH','DRINKING','DRUGS']]

return farsDF
myfiles = ['PERSON_2000','PERSON_2001','PERSON_2002','PERSON_2003','PERSON_2004','PERSON_2005','PERSON_2006','PERSON_2007',
'PERSON_2008','PERSON_2009','PERSON_2010','PERSON_2011','PERSON_2012','PERSON_2013','PERSON_2014','PERSON_2015',
'PERSON_2016','PERSON_2017','PERSON_2018']

Essentially, what my function does is use the pandas library’s .map() function to to map my dictionaries onto my datasets, using the dictionaries as a guideline to convert values in my datasets into their corresponding values in my dictionaries. It seems complex, but in principle is quite simple.

After creating this function, I ran all my datasets through the function and spit out cleaned .csv files. I also created another dataset alongside this, that combined all of my cleaned datasets into a singular data file.

mergedDF = pd.DataFrame(columns=['STATE','COUNTY','ST_CASE','DATE','TIME','VEH_NO','SCH_BUS','MAKE','MOD_YEAR','PER_NO',
'PER_TYP','AGE','SEX','RACE','HISPANIC','HARM_EV','INJ_SEV','DEATH','DRINKING','DRUGS'])
for file in myfiles:
myFile = file + '.csv'
myYear = file[-4:]
myDF = cleanFARS(myFile,myYear)
myDF.to_csv('datasets\\'+myFile.lower(),index=False)
mergedDF = mergedDF.append(myDF)
print('Completed: '+myFile)

MySQL

After I cleaned my datasets, I proceeded to create a schema called fars in my MySQL database titled infodata. If you’d like to know more about how I created this MySQL database, please see my previous article Greenhouse Healthcare: Is there a relationship between greenhouse gases and healthcare? [5].

Analysis

Are the Number of Motor Vehicle Accidents Going Down?

According to my data, The number of motor vehicle accidents has been on the decline since January 2000, dropping by 7.0% since the year 2000. While the number of motor vehicle accidents is not at the lowest it’s ever been, historical trends show a gradual decline, and future projections predict a continued, albeit small, decline in the future (that is, if current trends persist).

The lowest reported number of motor vehicle accidents occurred in February 2010 (1,830 accidents) and the highest reported number of motor vehicle accidents occurred in July 2005 (3,760).

A strange pattern appears throughout the data, showing a consistent dip in the number of motor vehicle accidents occurring in February (valleys in my data), followed by a rise in accidents around July and August (peaks in my data).

If we break my data down even further, into fatal vs. non-fatal accidents, we see that…

…the majority of motor vehicle accidents are fatal (ranging anywhere from 1,000 to 4,000 accidents; and dropping by 10.2%)…

Note: In order for an accident to be considered fatal, then at least one death must occur. This skews the data sightly in favor of fatal accidents since only one death needs to occur in order for the whole accident to be considered fatal.

…and, as a result, the minority of motor vehicle accidents are non-fatal (ranging anywhere from 5 to 50. Interestingly, future projections also indicate that the amount of non-fatal motor vehicle accidents is plateauing.

How Many Motor Vehicle Accidents Are Vehicle-on-Vehicle Collisions?

According to my data, the majority of motor vehicle accidents are vehicle-on-vehicle collisions. This type of accident follows the downward trend of the overall number of motor vehicle accidents dropping by 10.0% since the year 2000. The highest it’s ever been was in July 2003 (5,296 collisions) and the lowest it’s ever been was in February 2013 (2,262 collisions).

Note: It may seem odd that the number of collisions in a given year can be higher than the number of reported accidents in a year. But, this occurs simply because a single accident can involve multiple vehicles and, by extension, more than one collision.

If we break down motor vehicle collisions into fatal vs. non-fatal collisions, we see that…

… the majority of vehicle-on-vehicle collisions are fatal (meaning one or more persons died as a result of the collision; ranging anywhere between 2,000 and 5,500 collisions)…

…and, as a result, the minority of vehicle-on-vehicle collisions are non-fatal. Interestingly, gaps appear in this data showing that there occurred entire periods of time where all vehicle-on-vehicle collisions were fatal.

How Many Deaths Result From Motor Vehicle Accidents?

The number of motor vehicle occupant deaths is on the decline since the year 2000 (dropping by 6.6% for drivers and 24.0% for passengers). The bulk of deaths are those of drivers, as opposed to passengers. This is likely caused by the large volume of accidents occurring with solo drivers.

Future projections also indicate that the number of motor vehicle passenger deaths is dropping quicker than the number of motor vehicle driver deaths; and, past records show that the largest and smallest number of motor vehicle driver deaths occurred in July 2005 (5,633 deaths) and February 2014 (2,766 deaths) respectively, while the largest and smallest number of motor vehicle passenger deaths occurred in July 2003 (3,977) February 2014 (1,354) respectively.

The number of motor-vehicle related pedestrian deaths has risen since the year 2000 (rising by 35%). At it’s height, the number of motor-vehicle related pedestrian deaths was 1,562 deaths (in October, 2016), and at it’s lowest the amount was 594 deaths (in June 2010).

Like motor vehicle-related pedestrian deaths, the number of motor vehicle-related pedal-cyclist deaths has also risen since the year 2000 (rising by 21%). At it’s height, the number of motor-vehicle related pedal-cyclist deaths was 248 deaths (in July, 2004), and at it’s lowest the amount was 55 deaths (in January 2003).

Are there any demographic qualities that appear again and again among victims of motor vehicle-related pedestrian deaths?

There are more than twice as many male pedestrians killed (136,929 victims) in motor-vehicle related accidents than female pedestrians (69,851 victims) since the year 2000. As for race, the vast majority of motor-vehicle pedestrian deaths since the year 2000 are Non-Hispanic White (40,442 victims).

The majority of accidents, since the year 2000, occurred while the driver was not under the influence of any substance. In fact, the p-value comparing motor vehicle driver-related substance use and motor vehicle-related pedestrian deaths is <0.0001, which means there is no correlation.

The highest cumulative number of motor vehicle-related pedestrian deaths, since the year 2000, occurred in Texas (31,327 victims) and the lowest cumulative number of deaths, since the year 2000, occurred in Vermont (205 deaths).

The most frequent vehicle involved in a motor-vehicle related pedestrian deaths since the year 2000 was Ford (20,822 deaths), followed by Chevrolet (18,264 deaths), and Toyota (10,000 deaths). The least frequent vehicle involved in a motor-vehicle related pedestrian deaths since the year 2000 was Brockway, Peugeot, and Sterling (1 death each). This is perhaps due to familiarity, as more people know and drive vehicles like Fords than they do vehicles like Puegeot.

Conclusion

So, are pedestrians at an increased risk nowadays, even as traffic fatalities drop off. My data suggest yes. In my analysis, we discovered that the number of motor vehicle accidents has dropped by 7.0% since the year 2000, and the number of these accidents that are fatal (at least one person died as a result of the accident) dropped by 10.2%. We also found that the number of motor vehicle occupant deaths dropped as well since the year 2000, dropping by 6.6% for drivers and 24.0% for passengers. Unfortunately for pedestrians, time has seen the inverse occur.

Motor vehicle-related pedestrian deaths has increased by 35.0% since the year 2000, and of those pedestrians, the number of motor vehicle-related pedal-cyclist deaths have increased by 21.0%. Now this is alarming, but I don’t think we need to panic just yet. The average number of pedestrian deaths hovers average around 987 deaths per month (or 11,844 deaths per year). While this is bad, it is not necessarily a catastrophe. Comparatively speaking it adds up to only 260,562 deaths total across 18 years, which is about half the population of Kansas City.

Two other points we learn from my analysis is that substance use doesn’t seem be correlated to motor vehicle-related pedestrian deaths; and, the typical pedestrian victim of a fatal motor vehicle accident is a Non-Hispanic, White male in the state of California, and the likely thing to kill him is a sober driver behind the wheel of a Ford.

While these analyses don’t paint the complete picture as to why the number of motor vehicle-related pedestrian deaths is increasing, I do believe that my data confirms that this issue does exist and it more-or-less rules out substance use as the cause. Moving forward, I think that we, as a country, need to dig deeper into why more and more pedestrians are dying, because this suggests that we’re doing something wrong. It’s all well and good that we’ve managed to create a trend where the number of motor vehicle is decreasing, but what good is that if more and more people (who aren’t behind the wheel or even in the vehicle) are dying.

Afterword

Before closing out this article, I would like to point out the flaws in my analysis. My data relies entirely on the NHTSA’s FARS database and the accuracy of their records. Also, because of the sheer volume (and messiness) of the data, I had to pick and choose what columns and data points I used to answer my question. This means that a lot of potentially relevant data was not used and thus my analysis is not all-encompassing (in terms of finding causes to the rising number of motor vehicle-related pedestrian deaths). Lastly, I only used data from 2000 to 2018 rather than the FARS repository’s entire collection, which ranges from 1975 to 2019 (I chose not the use 2019 as it incomplete at this time).

Anyway, thanks for reading. Ciao!

--

--