Titanic Survival Analysis

Abdulwasiu Tajudeen
7 min readMar 29, 2023

--

https://wallpapercave.com/titanic-ship-wallpaper

Introduction

The RMS Titanic was a British passenger liner that sank in the North Atlantic Ocean in the early morning hours of 15 April 1912, after it collided with an iceberg during its maiden voyage from Southampton to New York City. There were an estimated 1308passengers and crew aboard the ship, and more than 500 died, making it one of the deadliest commercial peacetime maritime disasters in modern history. The RMS Titanic was the largest ship afloat at the time it entered service and was the second of three Olympic-class ocean liners operated by the White Star Line. The Titanic was built by the Harland and Wolff shipyard in Belfast. Thomas Andrews, her architect, died in the disaster.

The tool used for this analysis was Microsoft Excel.

Objective

The aim of this project is to determine what factors contributed to the chances of survival, and what can we learn from the historical event. Through our investigation, we hope to gain a deeper understanding of the people who perished on the Titanic and reflect on the lessons we can learn from historical data to inform our understanding of the present and future.

The Dataset

This dataset for this study was obtained from Kaggle and can be downloaded here or can also be downloaded here. The data contains the following:

  • Survival (0=No, 1=Yes)
  • Passenger Class (Pclass) [1=1st Class, 2=2nd Class, 3=3rd Class]
  • Sex = Sex of Passenger
  • Age = Age of passenger in years
  • SibSp = Number of siblings and spouses onboard the titanic
  • Parch = Number of parents and children aboard the titanic
  • ticket = ticket number
  • Cabin = Cabin number
  • Embarked = Point of embarkation [C = Cherbourg, Q = Queenstown, S = Southampton]

Data Cleaning

The dataset was imported into Microsoft Excel for cleaning and analysis…

Data imported into Microsoft Excel

Pclass: I started the cleaning with the first column in the dataset which was the Pclass. The records in the column were represented with 1, 2, and 3. In order to make it more understandable I used a conditional function to replace the numerical values with First class, Second class and Third class.

Used the IF Function to replace numerical values.
After applying the IF Function

The initial column which was the Pclass was removed and replaced with the new column created. So, in order to remove the initial column, I copied the records in the new column and paste them into the initial column using the Paste value method so that I can get rid of the new column created. I actually did this so that I won’t loose the records in the new column created. So you go to your Home Ribbon > Select Paste > Paste Special > Then Select Values.

The values were copied and the new column was deleted.

Survived: Same process was used to clean the Survived Column

Name: The name column wasn’t arranged, and it can affect our analysis, so I had to re-arrange the names.

Before

The Name Column was split by Delimiter in order to separate the brackets from each name. And this was done using the Split-to-Column from the Data ribbon. Then you select delimeter and press next.

Then I input the delimiter which was bracket “(“ and also clicked next.

After separating with delimiter

I noticed that the name column is still not arranged well so I used the Text-to-Column method again to split the name. But, instead of using bracket as my delimiter, I used the comma (,) since the new name column was separated with comma that’s the names were written as Baxter, Mrs. James, etc so that was why I used comma as delimiter.

The names were separated sucessfully, but we need to join First name with the Last name. In order to do this, I used the Concatenate Function in Excel.

After cleaning

Sex: The sex column was accurate but just changed the records to Proper Case since they were all in small letters.

Age: I noticed there were some ‘blanks’ in the age column so I replaced them all with the average age of all the passengers.

I used the Data Analysis add-in in Microsoft Excel to get the average age of all the passengers. Then input the average age in all the blank cells.

After selecting the Descriptive Analysis, I entered the input range of the column which was the Age column and also the Output range which was were you want the result to be displayed. And select Summary Statistics

Descriptive Statistics

The above image was the output after applying Descriptive statistics to get the average age of the passengers.

The mean was used to represent the average age, it was used to replace all blank cells. The blank cells were automatically filled by using the Go-to- Special option in the Find and Select ribbon. Then you select Blanks.

Filled the blank cells with the mean.

I used the IF Function in the age column in order to have good descriptions for each record to have more insights when doing our analysis.

SibSp and Parch: These columns were added together in order to get the Family Size, which will help us during our analysis. The columns were also renamed for better description.

I also applied the IF Function to the Family Size column.

Ticket: I only changed the data-type from number to text.

Fare: The records were accurate but noticed some records were zero. But, I had to leave cause after making some assumptions, I concluded that it might be the Ship attendants that didn’t pay since they will be there to guide the passengers and won’t required to pay any fee….. Just an assumption. This requires domain knowledge.

Embarked: I replaced S/C/Q in the embarked column with their corresponding names Southampton/Cherbourg/Queenstown. For clarity.

After Cleaning

Data Analysis and Visualization

I used the Pivot Table for my analysis and inserted the Pivot Charts in order to visualize the data. I didn’t list the process of inserting pivot tables and pivot charts because I’ve done that in my previous articles. So, you can always check it to have the idea of how to use Pivot Table and charts.

Dashboard

Dashboard

That’s all the insights I got from the data that I represented on the Dashboard but that’s not all. You can explore the data to find out more for yourself.

And please, if you do, share it with me….

This is the link to the Dashboard:

4. Titanic Analysis.xlsx

Find me on Twitter and LinkedIn.

Thanks for reading, I hope you practice with it.

Peace :-)

--

--

Abdulwasiu Tajudeen

I'm a highly skilled data analyst with experience in Data processing and visualization tools. I have experience in collecting, processing, and analyzing data.