Envision the Titanic Climax with Matplotlib Numpy Pandas

Simple Data analysis with basic Python libraries used for Data Science

Diva Coders
Bloggers Bay
10 min readAug 22, 2020

--

Titanic — Movie

Did you know, a novel predicted the Titanic sinking 14 years previously to the actual disaster???

In 1898 (14 years before the Titanic sank), American author Morgan Robertson wrote a novel titled ‘The Wreck of the Titan.’

The book was about a fictional ocean liner that sinks due to a collision with an iceberg. In the book, the ship is described as being “unsinkable” and doesn’t have enough lifeboats for everyone on board, sounds familiar yeah you’re right it’s the epic story of titanic which was predicted years ago.

The Wreck of Titan

We cannot conclude whether the author had technical proofs for his prediction, but we as responsible Data science enthusiasts can predict the possibilities and outcomes of the disaster using the data set and what not we can even try to envision the various prospects of the climax.

I am sure that all of us know what happened to Rose and Jack in the movie Titanic. We all wished that the story had a different ending, didn’t we? Let’s try to make our wish come true by recreating the climax of the story by a simple analysis of the story plot,

At the end of the analysis we will be creating three climaxes and come to know the answer of three questions:

• Is there a possibility for jack to be alive and rose’s survival?

• Was there a chance for Jack and Rose together to narrate their adventurous story to their grandchildren?

• Did Cal Hockley (Rose’s Fiancé) have a higher chance of survival as he belonged to the upper-class or what would make the villain dead?

We are carrying out our analysis using the ‘Matplotlib’, ‘Numpy’,’Pandas’, and ‘Seaborn’ Libraries.

Let us see what each library function is:

Matplotlib is a python library used for visualizing data sets using various plots; it has more than 50 plots to name a few, bar plot, line plot, histogram, etc.

Numpy is also a Python library that provides a high-performance multidimensional array and basic tools to compute with and manipulate these arrays.

Pandas is a software library written for the Python programming language for data manipulation and analysis. In particular, it offers data structures and operations for manipulating numerical tables and time series.

Seaborn is a Python data visualization library based on Matplotlib. It provides a high-level interface for drawing attractive and informative statistical graphics.

Let us start our journey…

Data Exploration:

We import all the necessary libraries and read the data set which has the stat of titanic disaster using the ‘pandas’ library.

Then we display the first five entries using the head command to get a glimpse of the nature of the data and the categories of labels which we are going to explore,

Titanic.csv First 5 entries

Feature Analysis:

Looking at titanic_df.describe() we gain a lot of useful insights and find the categorical labels which we can ignore

Titanic.csv decribe command to see the column characteristics

• PassengerId: Unique for each passenger so this has no relation with the survival label hence this need not be considered for analyzing

• Survived: Survival is a binary option, 0 for the passenger is dead and 1 for the passenger is alive, so this will be only ‘Y’ variable in XY plotting

• Pclass: Integer equal to 1, 2, or 3 indicating the class of each passenger (lower, middle, or upper), this can be taken for analyzing as this has three inner categories which may contribute to the survival of passengers

• Age: Number representing the age of each passenger, though as we can see in titanic_df.tail(), some passengers have NaN for their age, this can also be considered as maybe younger ones can act swiftly and escape so this can also contribute to the survival label

• SibSp: Number of siblings also on board, we may not completely ignore this, as it may or may not support the survival label

• Parch: Number of children also on board, this also has a similar case of SibSp

• Fare: amount paid for the ticket by each passenger, this may add essence to the Passenger Class label as the higher the fare higher the class of ticket.

For a quick comparison, we’ll create use NumPy functions to verify the mean, standard deviation, min, and max of numerical columns.

Mean, Standard Deviation, Minimum and Maximum values for each Label

Insights from these are:

• Survived is a categorical label with 0 or 1 values.

• Around 38% of samples survived representative of the actual survival rate at 32%.

• Most passengers (> 75%) did not travel with parents or children.

• Nearly 30% of the passengers had siblings and/or spouse aboard.

• Fares varied significantly with few passengers (<1%) paying as high as $512.

• Few elderly passengers (<1%) within the age range 65–80.

Great numbers, Let us move on to realize our dream climaxes…

Climax 1: Jack Lived Rose Died!!!

If Jack Survived…

The vice versa case where Jack narrated his love story to his grandchildren and Rose sadly died, is there a possibility for this? Let us examine by keeping in mind Jack is a male and he belongs to the lower class, and Rose is of Female gender belonging to Upper Class,

We first need to break the analysis into several parts. First, we will look at the impact sex had on survival by pivoting the data frame.

titanic.csv description
Male Female survival calculation

This table shows us the percentage of females that survived and the percentage of males that survived. The female survival rate was 74.2%, and the male survival rate was 18.9%. The huge gap between these numbers is an immediate indication that the female survival rate on the Titanic was significantly higher than the male survival rate, and being a woman in fact increase Rose’s chances of survival.

Age Range for survivors
Age Gender Comparisons

This breakdown gives us an extremely interesting and informative view of the answer to survival rate for women and children. If we look at the data for children 5 and under, we can see that sex didn’t have much of an impact on survival. But in large, most of the children in this sample survived, including the males. Another interesting insight we can see is just how many of the males on board died (aside from male children under 5). Men were more likely to have died than to have survived. When we make the same comparison for females, you can see that females in almost every age range were more likely to survive than to have died. To validate the inferences made here, we can look at the numbers in a table once again, though it becomes harder to read with more variables. However, viewing this type of table can emphasize how helpful histograms can be for visualizing data.

Gender Age Range and Survival Comparisons

Based on these observations and numbers, we can conclude that both women and children had a higher chance of survival and hence our Climax 1 has a lesser probability of realization also we could not change factors like gender and age to make it as per our wish, So there is lesser possibility of Rose dead and Jack Alive, Poor Jack!!!

Poor Jack…

Climax 2: Jack and Rose Escaped and lived happily!!!

Imagine while rose was about finish her adventurous trip story, jack joins her and ends as, “that’s how your grandma and grandpa fell in love with each other and ended up together”

That’s the heart-warming climax one would ever want, so what all ways this could be realized, let us analyze the Passenger Class as it is one of the important labels in the titanic data set.

Were upper-class passengers more likely to have made it onto a lifeboat than middle and lower class passengers? Let us make it interesting by examining using bar and point plots.

Point Plot

This plot shows the average survival and confidence interval of passengers by class. Looking at the breakdown of average survival rates by class shows a correlation between class and rate of survival. Lower class passenger survival ranged somewhere between 20–30%, while upper-class survival ranged somewhere between 55–70%, with middle class ranging somewhere between 35–55%.

So to make our Climax 2, come true what rose could have done is maybe instead of her joining Jack for the party in his lower passenger class, she could have taken Jack along with her to Upper Class and enjoyed the day, which may have kept them alive for a happy ending.

Jack and Rose

Climax 3: Cal Hockley, want him dead or alive???

Cal Hockley, the villain who took advantage of Rose’s state and tricked her into his marriage proposal, also seemed to be escaped from the sink what could be the reason?

Let us explore in detail the passenger class and gender labels as they contribute more significantly and are varying parameters between Jack and Cal using seaborn plots

Bar Plot

We can see that middle-class female passengers had almost the same rate of survival as upper-class females, but middle-class men had about the same rate of survival as lower classmen, which further illustrates the greater likelihood of women to have survived. Overall we can observe that upper-class passengers did indeed have a higher chance of survival than lower-class passengers regardless of sex.

Correlation Matrix

A negative correlation tells us is that when class increases (1 → 2 → 3), survival decreases. So, since the lower class is represented as 3, the lower class is correlated with lower survival.

Unfortunately, there are more chances for the survival of Cal Hockley!

Cal Hockley

Apart from our assumptions to the Climax, there are certain limitations:

  • As some of these inferences were drawn based on correlation, it’s always important to remember that correlation does not imply causation (relationship).
  • Since we know that some passengers did not have a recorded age, entries with ‘NaN’ (null) were not taken into account when running these numbers.
  • Conclusions were drawn based on descriptive statistics, charts, and opted not to run t-tests on the sample.

Interesting Findings:

What proportion of passengers in the sample survived?

  • 38% of total passengers in the sample survived

Did women and children have a higher survival rate?

  • The female survival rate in this sample was 55.3% higher than the survival rate for males.
  • Women had a much higher rate of survival than men.
  • Children under the age of 5, regardless of sex, had a much higher rate of survival

Did upper-class passengers in the sample have an advantage that translated into a higher survival rate than lower-class passengers?

  • The class has a strong correlation with survival, with upper-class passengers having a much larger rate of survival than lower-class passengers, regardless of sex and age.
  • Upper-class passengers were more likely to survive than lower-class passengers.

So, as we are approaching the climax of our post, quickly let’s summarize, we got some insights about Matplotlib, Numpy, pandas, and seaborn libraries which are essential and inevitable for data science.

Also instead of mourning on the loss of Jack and the separation of true love, we tried the possibilities to change the climax, what’s exactly the duty data scientist, to analyze the data and come up with useful possibilities to attain desired outcomes.

Now, it’s your turn pals to create your own customized climaxes and conclusions with these kinds of simple analysis of the data set and come up with creative and innovative endings of your favorite historical epics, kudos for learners!

The End (Happy Ending)!

You can find the code and dataset at:

https://github.com/PradeepaK1/Envision-the-Titatnic-Climax-with-Matplotlib-Numpy-Pandas

Contributors:

Anjana M P — https://anjana21it.wixsite.com/mysite

Pradeepa K — https://ptljkpd.wixsite.com/pradeepa

--

--