Titanic Passenger Survival Rate Prediction

Abigail Chen
INST414: Data Science Techniques
4 min readMay 12, 2022

Introduction

As one of the most famous maritime disasters in human history, the Titanic was the largest and most luxurious passenger ship globally, with the reputation of “never sinking”. Unfortunately, the Titanic was doomed on its maiden voyage from Southampton, England to New York, U.S.A. On April 14, 1912, at about 23:40, the Titanic collided with an iceberg and sank to the bottom of the Atlantic Ocean after its hull broke in two. Of the 2224 crew members and passengers, 1517 died, of which only 333 remains were recovered. The sinking of the Titanic was one of the deadliest maritime disasters in peacetime.

At the time of the tragedy, many people in the high-class cabin left the hope of staying alive to women and children. And I would like to use these data to make a late prediction about whether the mortality rate is related to the class of the cabin and the age of the passengers themselves. And the data I’m going to study this time is the passenger list of the Titanic back then, with specific information including the following columns:

  1. Survival — Survival (0 = No; 1 = Yes).
  2. Pclass — Passenger Class (1 = 1st; 2 = 2nd; 3 = 3rd)
  3. Name — Name
  4. Sex — Sex
  5. Age — Age
  6. Sibsp — Number of Siblings/Spouses Aboard
  7. Parch — Number of Parents/Children Aboard
  8. Ticket — Ticket Number
  9. Fare — Passenger Fare
  10. Cabin — Cabin

The data comes from a study called OSFHome.

Process

First, I cleaned up the categorical variables that could not be used as valid data, such as name and Ticket Number, when importing the data.

Next, I used various seaborn charts to take a cursory look at the individual data to filter out the most suitable data for analysis. From the bar chart, we can see intuitively that the death rate of male passengers is much higher than that of female passengers

Next, I used various seaborn charts to take a cursory look at the individual data to filter out the most suitable data for analysis. From the bar chart, we can see intuitively that the death rate of male passengers is much higher than that of female passengers. The analysis of P-class and survival shows that the higher the rank, the higher the survival rate. In contrast, the population among people of a low class is high, but the survival rate is not high.

Bugs & Limitation

The bugs I kept encountering include “STOP: TOTAL NO. of ITERATIONS REACHED LIMIT” and variables that are unable to read.

In particular, in many cases in which those categorical variables were unable to be read, I have to filter these bug-prone non-numeric variables for faster analysis. Moreover, I have to admit that I was unable to spend the time to study failure analysis in depth because I did this assignment too late and therefore could not provide the appropriate data to give a more detailed and professional analysis. For this, I am very sorry and regretful.

Conclusion

This can also reflect a side of the reality of the time, the higher the social class status, the better the service is received, in distress is also a certain space advantage (the larger the number, the lower the rank)

For the gender variable, the survival rate of women was significantly higher than that of men, indicating that in this distress, most people generally uphold the tradition of ladies first, and gentlemanly style is fully reflected.

--

--