Let’s unravel the mysteries of ‘The Unsinkable’…

Exploratory Data Analysis — A Case Study on Titanic Data set (Part-2)

Preetam B A
7 min readAug 13, 2020

Data is simply useless until you don’t know what it’s trying to tell you.

With this quote we’ll continue on our quest to find the hidden secrets of the Titanic. ‘The Unsinkable’, as it was claimed by its designers and makers proved that even the best of human engineering may sometimes fail when nature comes on to test it.

In last article, we saw the different attributes of the data and had quick glance on what the data looked like. If you haven’t read part 1 of this blog , I recommend you to kindly read it by clicking here before continuing. In this article we’ll look at the relationships of each of the attributes to the survival of the passenger and to continue with our quest to find out whether you would’ve survived the Titanic Sinking or not.

1. Co-Relation of Passenger Class with the survival

Since, there are 3 classes present in the ship. Let’s find out the count of each passengers in each class.

Output:

Now, let’s find out the the total number of survivors from each class

Output:

As you can see, the percentage of the passengers belonging to Upper Class who survived is better than the rest of the two having a survival percentage of around 62.96%.

The Survival Percentage of Middle Class Passengers is around 47.28% which better than the lower class but worse than that of the Upper Class

The Lower Class was hit the most, having a survival percentage of just 24.23% which is significantly lower than the above two classes.

The results indicate that the survival of the Titanic Sink was largely affected by the class in which you belong indicating the discrimination based on the class.

2. Co-Relation of Gender with the survival

Let’s start by printing the number of passengers of each gender.

Output:

Now, let’s find out the survival percentage of the passengers belonging to each gender.

Output:

The information suggests that the women were given the highest priority while saving lives. Almost 74.2% of the women survived and 18.89% of men survived. (How pure these gentlemen were!😢❤️)

3. Co-Relation of Age with Survival

Now, let’s look at the effect of age on the survival. But first, let’s have a quick glance on some stats of the age along with the values that are missing in the data-set.

Output:

There are a total of 177 missing values i.e. the age of 177 Passengers are missing in the data-set. These missing values may pose some problems while predicting and hence, need to be addressed.

Now, let’s visualize by plotting some histograms on the basis of the data

kde = True gives Kernel Density Function for the histogram and rug are the small markings which plots the exact point at which the data were recorded.

Output:

Now, let’s check out the survival in each group by plotting the following graph with kde. The y-axis actually denotes probability density function for the kernel density estimation and the area under the kde curve give the probability of respective points in x-axis.

Output:

The following plot show the distribution of gender in each age group.

Output:

Now, let’s find out comparison of survival in each of these groups using kde plot.

Output:

We can also understand what’s represented in these histograms as follows:

Output:

4. Co-Relation of no. of Siblings/Spouses of the passenger with Survival

Let’s start by understanding the distribution of values of this attribute.

Output:

Now, let’s plot the histogram describing the survival of the passengers having respective number of Siblings/Spouses.

Output:

The inference of the above histogram can be derived using the following code:

Output:

5. Co-relation of No. of Parents/Children with survival

The distribution of the number of Parents/Children are as follows

Output:

Here are the two different plots denoting the survival of passengers having respective no. of Parents/Children. The first one using ‘distplot’ and the second one using ‘countplot’

Output:

6. Co-relation of Fare with survival

Now, let’s try to understand if there was any regularity in the fare and whether there’s any relation with the survival. The code describes the distribution of the fare.

Output:

Let’s plot the distribution of the Fare classified by the Survival

Output:

Let’s check whether the passengers were charged uniformly or not. If yes, let’s try to understand what are the factors that decided the fare for the tickets.

To check whether ‘Gender’ was the factor to decide the fare of the tickets, here’s the plot for each embarkation followed by the inference of it.

Output:

Output:

Thus, as per the data, mean fare charged for women were significantly higher in Cherbourg and Southampton.

To check whether ‘Embarkation’ , ‘Class’ and ‘Age’ were the factor deciding the fare of the tickets, here’s the plot for each embarkation and class classified with ‘Survival’ Status followed by the inference of it.

Output:

Thus, it is evident from the data that tickets were priced mostly on the basis of Pclass and the point of Embarkation but not on the basis of Age.

7. Co-relation of Embarkation with survival

We have seen the description of the data having numerical attributes till now. Here’s a look at the description of the categorical data.

Output:

Here’s a plot describing the ratio of the survival of passengers from each port of Embarkation.

Output:

And now here’s the pair-plot of each of the attributes that we have discussed till now.

Output:

As you might have noticed we’ve ignored Passenger_Id, Name of the Passenger, Ticket and Cabin No. as they play little to no role in determining the survival of the passenger.

Thus, we tried to understand the data by visualizing using various techniques and uncovered various mysteries related to Titanic. In next Article we’ll be understanding the types of data and why some type of data need to be converted into the specific format to be able to fit various Machine Learning models on it. Thank you for joining throughout this journey of exploration and hope, you’ve got the experience of being a detective!🕵

Link to the Notebook: Click Here

Link to Part 1 of this Blog: Click Here

--

--