West Africa Road Accident Analysis

Odunlade Goshen
5 min readJul 9, 2023

--

https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcRpuqYAA1XgXxOCSTlhR4exvZOJLvje1d5eNg&usqp=CAU

Introduction:

Every year, approximately 1.3 million lives are tragically cut short as a result of road traffic crashes, as stated by the World Health Organization. Furthermore, an additional 20 to 50 million individuals suffer non-fatal injuries, many of whom are left with disabilities caused by these injuries. Various risk factors contribute to these road accidents, including speeding, driving under the influence, unsafe road infrastructure, distracted driving, and failure to use essential road safety equipment such as helmets and seat belts.

Objective:

The scope of this project focuses on understanding the factors that contribute to different levels of casualties in road accidents and using that knowledge to propose strategies for improving road safety, uncovering patterns, correlations, potential risk factors, thereby reducing the number of severe and fatal injuries.

About the Dataset:

The dataset utilized for this project is fictional and not based on real-world data, it was gotten from Data Tutorial on Youtube for practice purpose. The dataset contains 307974 rows and 21 columns. Its content includes data about the Accident Index, Accident Date, Month, Year, Day of Week, Junction Control, Junction Detail, Accident Severity, Latitude, Light Conditions, Local Authority(District), Carriageway Hazards, Longitude, Number of Casualties, Number of Vehicles, Police Force, Road Surface Conditions, Road Type, Speed limit, Time Urban or Rural Area, Weather Conditions and Vehicle Type. Microsoft Excel was employed as the software tool for conducting the analysis.

A brief summary of the dataset’s contents

Data Cleaning

To initiate the data cleaning process, I implemented filters on the fields and conducted a thorough examination for any empty values, duplicate entries, or typographical errors within the dataset. Since each accident is unique, it is essential to have a dynamic accident index field without any instances of typos, blanks, or duplicates. As a result, I determined that utilizing this field as the primary key for the dataset would be the most appropriate approach.

During my analysis, I came across several blank rows in specific fields of the dataset. Specifically, I observed three blank rows in the “Carriage Way” field, 1534 blank rows in the “Road Surface Condition” field, and 6057 blank rows in the “Weather Conditions” field. After careful consideration, I decided to retain these blank rows as they were not expected to significantly impact the objective of my analysis or report. Furthermore, in the “Accident Severity” field, there were four entries: “fatal,” “fetal,” “serious,” and “slight.” Upon reviewing the data content, it became apparent that “fetal” was likely a typographical error. To rectify this, I utilized the “Find and Replace” function to replace “fetal” with “fatal” as it aligns better with the context of the data

Find and Replace to correct typo

To enhance the dataset, I created two additional fields by extracting information from the existing date field. Specifically, I utilized the Text function to derive a field for the year and another for the month. This extraction process allowed me to segregate and organize the data based on the year and month of the recorded accidents.

Data Analysis and Visualization

I employed pivot tables as a means to summarize and analyze the data. Furthermore, I incorporated pivot charts to visually represent the findings. In the initial stages, I focused on generating insights for both primary and secondary key performance indicators (KPIs). To visually present the primary KPIs, I utilized a consistent chart type. Instead of creating a new chart for each insight, I opted for a more efficient approach by duplicating and modifying the existing chart, adjusting the data source to reflect the specific insight it represented. This streamlined the visualization process and facilitated the representation of multiple insights in a cohesive manner.

Creating a Pivot Chart

Next, I proceeded to generate a pivot table and line chart to visualize the monthly trend of accidents per year. This provided a clear representation of how accident frequencies varied over time.

In order to examine the impact of road type on accidents, I created an additional pivot table. It’s worth noting that this particular field contained blank rows, which were retained and taken into consideration during the analysis. To visualize this relationship, I opted for a bar chart that effectively conveyed the data. To ensure the chart accurately represented the actual occurrences, I added data labels. However, due to space constraints, the labels appeared crowded. To address this, I adjusted the number format to a custom setting, allowing for a more concise and visually appealing representation.

To explore the influence of road conditions on accident occurrences, I created another pivot table. The relevant field included five entries: wet, flood, snow, frost, and dry. To streamline the data, I utilized the pivot table analyze feature to generate calculated fields. Specifically, I combined “wet” and “flood” into a single category called “wet,” and merged “snow” and “frost” into a category named “snow.” This consolidation enhanced the clarity of the data. Subsequently, I utilized a tree map chart to visualize this insight, providing an intuitive and informative representation of the relationships between road conditions and accidents

Creation of calculated fields

Dashboard Creation

To create a comprehensive dashboard, I dedicated a new sheet and populated it with the previously generated charts. I ensured that the visualizations for my secondary KPIs incorporated indicative icons for enhanced understanding. In order to add interactivity and improve usability, I included a timeline feature that facilitated easy switching between views based on years, months, or quarters. Additionally, I incorporated a slicer to enable streamlined filtering based on locality, specifically distinguishing between urban and rural areas. To optimize functionality, I connected the slicer and timeline to all the sheets within the dashboard.

On the leftmost section of the dashboard, I included interactive tools that provided additional functionality. These tools allowed users to navigate to an analysis sheet, which presented detailed points of analysis. Throughout the dashboard creation process, I performed various formatting adjustments on the charts and dashboard layout to ensure they aligned with the desired color scheme and visual aesthetics.

If you have any insights or recommendations after reviewing the dashboard, I would greatly appreciate your input.

Please feel free to leave a like, follow, or comment. Moreover, let’s connect on Twitter(@Gosyn_) and LinkedIn (Odunlade Goshen) so that I can share my growth journey with you.

--

--