Aggie Hacks 2024: How can traffic accidents in San Francisco be reduced using a data-driven approach?

Jihyun (Jenny) Kim
10 min readApr 18, 2024

--

Today is a day to remember — we won the 1st-place award at the Hackathon sponsored by DataSF, Standard Insight, and Spanda.ai. This unexpected victory makes the achievement all the more thrilling. I am incredibly proud to have worked with my outstanding team members, Cindy and Mia, and I want to express my heartfelt gratitude for the time and effort everyone dedicated to this competition.

Introducing Team Chimak: Cindy, who brings 13 years of experience in sales and marketing; Mia, who is highly skilled in Python and data visualization; and myself, with over 6 years of experience in data analytics.

As graduate students pursuing a Master of Science in Business Analytics, we participated in a Hackathon at the University of California, Davis, in partnership with DataSF, Standard Insight, and Spanda.ai. Our team, composed of individuals with diverse skill sets, was remarkably well-balanced. Cindy, our team leader, brings over 13 years of experience in sales and marketing at Samsung. Mia is exceptionally skilled in Python coding and data visualization, and I have more than six years of experience in data analytics, also at Samsung. We are excited to share our story about how we addressed the city’s traffic collision issues.

What are the primary factors affecting fatal accidents in San Francisco, and what are the solutions to address this problem?

Annually, around 30 people are killed and over 200 are seriously injured on San Francisco’s streets. Traffic fatalities are a close and personal concern, impacting everyone in our community, including our colleagues, family, and friends. A recent tragic accident that occurred in daylight near the West Portal Muni Station — just blocks away from where many of us live or work — underscores the vulnerability we all share. These incidents are not only devastating but also preventable. Proactive, data-driven solutions are essential to prevent future tragedies and safeguard our daily commute, whether on foot or by public transport.

Fueled by the urgency to end traffic fatalities in San Francisco, Vision Zero has adopted a bold plan to eliminate traffic deaths and reduce severe injuries. This initiative is educating the public on traffic safety, enforcing traffic laws, and advocating for policy changes that save lives. In support of Vision Zero’s commendable work, our team Chimak wanted to launch this project to garner increased attention and a united effort to make our streets safer for everyone.

Our project goal is straightforward. Firstly, we aimed to provide insights from data that can inform targeted interventions for road safety. This involved identifying high-risk locations, discerning collision patterns, and pinpointing factors that contribute to fatal traffic incidents. Additionally, we have developed a predictive model to forecast potential high-risk areas, which will help prioritize city resource allocation. Secondly, based on our data analysis, we proposed road safety strategies that are in line with Vision Zero’s objectives. Lastly, we also evaluated the economic and societal impacts of our strategies, including their financial implications.

For identifying high-risk areas, we first conducted clustering on geographical data to segment the city into four major areas. Upon analysis, we discovered that the upper right portion of the city, encompassing the downtown area and the Tenderloin neighborhood, exhibited the highest rate of fatal accidents. The map specifically highlighted 101 Hyde Street as a particularly vulnerable location.

As for collision patterns, we first examined the annual trend of traffic collisions in the city. The line chart generally indicates a decreasing trend in the number of accidents, but it’s important to observe the rebound in numbers nowadays following the significant drop during the pandemic. What’s more, highlighting is that despite the initiation of the Vision Zero Mission in 2014 aimed at eliminating traffic fatalities and severe injuries, an uptick in accidents is observed, indicating the need for further strategic improvements and continuous monitoring.

We then analyzed data to determine which day of the week and time of day are the most hazardous for traffic incidents overall. Our findings indicate that the most hazardous time for traffic incidents aligns with Friday, which could be attributed to heightened travel as the weekend approaches. Additionally, daytime periods throughout the week exhibit a higher frequency of serious or fatal collisions, likely corresponding with increased vehicular activity during rush hours.

We also investigated the top three types of collisions and determined the proportion of fatal accidents for each type. Our findings indicate that pedestrian incidents are the most common type of collision at 3.2%, followed by collisions with objects at 2.9%, and overturned vehicles at 1.4%. This data emphasizes the need for targeted safety measures, particularly for pedestrian protection, to address the predominant risks in urban traffic.

Who is most often at fault in fatal or severe traffic accidents? To determine this, we analyzed the proportion of fault within all severe accidents — those involving deaths or serious injuries — from 2005 to 2023. The data reinforces the prevailing assumption that drivers are the predominant cause, being at fault in 68% of these incidents. Pedestrians were found to be involved in 19% of the accidents, while bicyclists were at fault in 11% of the cases. These statistics underscore the need for comprehensive traffic safety policies that address the behaviors of all road users.

In our quest to enhance road safety, the preliminary step involved identifying factors contributing significantly to fatal accidents. Using Poisson regression, we distilled these to four main elements: sidewalk width, road surface condition, time of day, and lighting — each showing a statistically significant correlation with accident rates.

  • Sidewalk width emerged as a critical factor. Narrow sidewalks, which can inadvertently bring pedestrians into close proximity with vehicles, amplify the risk of accidents. Conversely, wider sidewalks serve as a protective barrier, decreasing the chances of pedestrian-vehicle incidents.
  • The condition of the road surface also plays a pivotal role. Poor drainage can cause hazardous conditions like hydroplaning, while adverse weather conditions can further affect driving safety. Thus, understanding how road surface interacts with environmental factors is crucial for risk mitigation.
  • Time of day significantly affects accident rates, with peak hours bringing increased traffic volume and the cover of night introducing visibility challenges. These temporal factors are vital for scheduling timely and effective interventions.
  • Lastly, lighting stands out as a factor influencing road safety. Adequate lighting is imperative for driver alertness and nighttime visibility, while insufficient lighting can mask potential dangers.

These findings serve as the foundation for our predictive model development, aimed at forecasting high-risk zones and ultimately curtailing road accidents.

Utilizing the significant factors identified in the previous stage, along with two additional features deemed important, we trained a random forest model. Our objective with this machine learning model is to generate a probability of severe or fatal collisions occurring at specific locations. This probability will be integrated into an interactive dashboard, to be introduced in the following slide. The dashboard is designed to enable city officials and stakeholders to monitor risk levels routinely and make informed decisions based on the results.

The model’s classification report indicates a 75% accuracy rate in predicting high-risk and non-high-risk areas. Furthermore, the precision, which measures the accuracy of the model in identifying high-risk locations, stands at 76% — acceptably high figure.

We didn’t stop there. Using the probabilities generated by the Random Forest model, we have developed an interactive dashboard accessible to everyone. This tool allows users to identify high-risk locations. For example, city officials can utilize this dashboard to pinpoint potential high-risk areas where the probability of a fatal accident exceeds 80%, enabling them to allocate city resources effectively to those locations.

Additionally, the dashboard includes filters that users can apply to assess the likelihood of fatal accidents under specific conditions, such as during rain. This feature could help city officials determine if infrastructure improvements are necessary in recurrent problem areas. Likewise, local residents could use this information to exercise extra caution in high-risk conditions, potentially preventing accidents

Based on our findings from the data, we developed a package of strategies, encapsulated in the acronym SAFE: Social Media, Analyze, Fix, and Engage.

  • Social Media: This strategy involves leveraging social media platforms to engage with a wider audience and advocate for the Vision Zero initiative through the creation of compelling campaigns. Establishing a presence on social media, such as an Instagram account, could enhance communication with San Francisco residents and emphasize the importance of road safety.
  • Analyze: This strategy entails the proactive utilization of the dashboard identifying potential high-risk locations. The aim is to proactively monitor and assess the risk of fatal collisions and, if necessary, to allocate increased staffing levels to those areas deemed high-risk to prevent any accidents or enhance response capabilities In addition, we can partner with emergency services to facilitate faster response times in predicted accident zones
  • Fix: This strategy focuses on the proactive remediation of infrastructure to prevent accidents. We have already recognized that lighting and road surface conditions significantly impact the occurrence of fatal accidents in the previous page. Armed with this knowledge of contributing factors, we should aim to address and fix these issues promptly as they are detected. By doing so, we can assuredly reduce the likelihood of fatal incidents occurring. Even with regular maintenance, some hidden infrastructure issues may go undetected. To address this challenge, we can utilize social media as a reporting tool for the public to report infrastructure problems. This approach can not only save costs on maintenance schedules but also increase public engagement and prevent serious accidents by enabling the proactive repair of roads as soon as they are discovered.
  • Engage: Having an excellent plan is pointless if it does not resonate with the community, as without their participation, no change will occur. Therefore, this strategy emphasizes to Increase public engagement in road safety and participation in activities to prevent collisions. The creation of neighborhood watch groups can be a powerful tool for increasing public engagement in road safety. These groups foster vigilance and mutual support among residents, especially in identified high-risk areas, ultimately aiming to reduce traffic collisions. Additionally, collaboration with “SF Safe,” a non-profit organization that aids community members in crime prevention and overall safety, should be considered. Partnership with such organizations can increase residents’ awareness and involvement with the Vision Zero SF initiatives, bolstering community engagement and collective action towards improved safety.

Now, let’s talk about the impact!

Economically, enhancing road safety could boost local business revenue and reduce the city’s expenditure on social costs arising from traffic collisions. Specifically, the SAFE strategies are anticipated to yield an incremental business impact of $35.3 million within a year, while $71.4 million could be saved from social expenses, leading to a total projected economic impact of $106 million in year.

As for the societal benefits, increasing road safety and walkability can spur economic development in the city. Moreover, fostering a spirit of collaboration and support among neighbors contributes to stronger community ties, which are vital for a healthy society. Improving the quality of life and, most critically, reducing fatalities from traffic collisions are paramount outcomes of implementing the SAFE strategies.

Of course, we did have some challenges and limitations in our project.

Firstly, despite our success in training a predictive model for fatal collisions, we must acknowledge that it is impossible to predict all accidents with 100% accuracy; unexpected events can always occur.

Secondly, there is the issue of data limitation. Time constraints limited our exploration of additional datasets from DataSF, which may contain more features that could enhance our predictive model. Moreover, our dashboard is currently static, but ideally, it should update regularly through periodic data pipelining to ensure the up to date of the information.

Finally, prioritizing traffic issues within the policy agenda of San Francisco presents its own challenge. The city is dealing with significant issues such as homelessness and drug use. Under these circumstances, obtaining attention and resources for traffic safety improvements can be difficult.

With that said, we are confident that with consistent effort and diligent application of our SAFE strategy, San Francisco will soon realize the goal of Vision Zero for everyone.

Team Chimak

If you’re curious about our interactive dashboard, you can find it HERE.

And our LinkedIn profiles below:

https://www.linkedin.com/in/jihyun-kim423/

https://www.linkedin.com/in/cindyjeon0721/

https://www.linkedin.com/in/xinyi-mia-lai-339386189/

--

--