The severity of airplane accidents

Himanshu Rawat

Published in

Analytics Vidhya

4 min readNov 27, 2020

Predicting the severity of airplane accidents based on past accidents

Introduction

Flying has been the go-to mode of travel for years now; it is time-saving, affordable, and extremely convenient. According to the FAA, 2,781,971 passengers fly every day in the US, as in June 2019. Passengers reckon that flying is very safe, considering strict inspections are conducted and security measures are taken to avoid and/or mitigate any mishappening. However, there remain a few chances of unfortunate incidents.

Here, the focus is on analyzing the data set from a HackerEarth competition consisting of certain parameters recorded during the incident such as cabin temperature, turbulence experienced, number of safety complaints prior to the accident , and the likes to predict the happening of accidents in the future.

Question in focus:

Which factor mostly affects accident ?
Did airplane score good in safety ?
When was the last inspection and how it affected the accident ?

and last we will make a model to predict severity of airplane accidents

Here is GitHub repository link to see code side by side with blog HERE

Data distribution and important feature

While working of different datasets , one of the problem arises is imbalance dataset which can lead to biased analysis. So, First thing we will be checking how well is data distributed.

Data distribution looks good

Now, we will be using a ensemble technique ‘ExtraTreesClassifier’ to find which features are important. We can see that

‘Safety_score’ most important among all features followed by ‘Days_Since_Inspection’

How safe were the airplanes before the accident?

The safety score is based on a comprehensive analysis of crash and pilot related serious incident data combined with other analysis. The highest score which can be scored was 100. I have used boxplot to understand how safe was an airplane with respect to severity of the accident

Looking at the results, we can see safety score of the airplanes lies somewhere in the middle with the average score going downward for more severe accidents. We can also see in some cases safety score were high between 80–100 for which we have to check other factors

When was the last inspection and how it affected the accident ?

Inspection is an important time in the life cycle of an aircraft. It is often during the inspection process important decisions about maintaining an aircraft and keeping it looking its best are made. So it will be important to see when was the last inspection done before the accidents.

So the result show bell shaped curve where most of the inspection were done between 11–16 days. Also we can see as the number of days increase there is noticeable increase in the count of “Highly_Fatal_And_Damaging” accident compared to other accidents

Model Building & Implementation

The model which we will be using here is Gradient Boosting. Intuition behind Gradient Boosting is that the best possible next model, when combined with previous models, minimizes the overall prediction error. The key idea is to set the target outcomes for this next model in order to minimize the error.

As we have seen above how relevant a feature is ,we can drop some features. We will be taking top 5 features which are 'Cabin_Temperature','Max_Elevation','Total_Safety_Complaints','Accident_ID','Violations' and ‘Turbulence_In_gforces’ . Also due to amount of data we have, dropping feature will also help in avoiding overfitting.

Few line of code

from sklearn.ensemble import GradientBoostingClassifier
from sklearn.model_selection import GridSearchCV,cross_val_score

pm_grid={‘learning_rate’:[0.2,0.3,0.5], ‘max_depth’:[5,6,7], ‘random_state’[10] }

grid_model=GridSearchCV(estimator=model, param_grid=pm_grid, cv=5,verbose=10, n_jobs=-1)

grid_model.fit(X,Y)

accuracy=cross_val_score(estimator=grid_model, X=X, y=Y, cv=5, scoring=’f1_weighted’)

We got average accuracy of 0.97 ,quiet high because we have small dataset

Conclusion

In this article, we analyzed the data about severity of airplane accidents in past and made prediction model.

We find most of the airplane were score less(near 40–50 out of 100)
We saw as the chance and severity of accident increase as days since last inspection increase
We created a well trained model