A quest to find a Jack and a Rose in the real Titanic

Raghunandh GS
DataComics
Published in
4 min readMay 11, 2017

I am pretty sure that all of us would have watched this epic movie, Titanic which is a fictional story set in the RMS Titanic which crashed on a giant iceberg and sunk. I recently came across this data set which contains all the passenger information who were in Titanic along with details about whether they survived or not. This data set is very popular among the Kagglers, who use this dataset with machine learning to train and come up with a model, which will predict whether a person has survived or not, based on few given inputs.

The concept of predicting whether a person has survived or not did not sound very interesting to me. After all, the person who has survived would make sure to tell the world that he has survived. The thing that interested me is the fact that there is a fictional love story which was set in this ship and there are chances that this story would have really happened.

This data set has information on 1308 passengers of the Titanic with details such as their name, age, gender, ticket fare, passenger class, whether they have their spouses/siblings, parents/children on board and whether they survived the disaster etc. My very first step was plotting age distribution of the passengers list.

The distribution had a peak around 20, the most likely age when men and women fall in love, raise in life, swim in opportunities and drown in problems. My eyes got brighter after realising the fact that there could be more than one titanic’s ish love story here. But our concern here is not to meddle about all the love stories that were there, our prime motive is only to find Jack and Rose. Googled to find that according to the movie Jack was 20 years old and Rose was 17. From the passenger data set, filtered out and obtained only the men who are 20 years old and women who are 17. So our interest shrunk from a list of 1308 passengers to 25, which had some 8 females and 17 males. This tells us there could be only at most 8 love stories and rest of the guys would be doomed in friendzones.

Like any other classical romantic movie, this movie also portrays the love between a high-class female and poor male. According to the movie Rose was a first class passenger and Jack was a third class passenger. So, When I filtered the male passengers aged 20 in third class and female passengers aged 17 in the first class, I got some 17 males and 2 females. This number is very high for any sort of love story to happen between them, so I have to resort to other methods to filter out. The second best info we have to classify the financial status of the passenger was their ticket fare. I did plot the distribution curve of the ticket fares.

According to the movie, this love sparks between two people belonging to two extremely different financial backgrounds. Best way to account for this extreme difference is to look out for passengers in the extreme ends of this distribution. Hence, straight away I am ignoring the middle 80 percent using Pareto rule and including only the remaining 20 percent. I am considering only males below the 10th percentile and females with tickets above the 90th percentile of the ticket fare distribution. Removing females whose ticket fare is less than 78.268 and males whose ticket fare is more than 7.73 we are left with only 5 passengers.

So In the real Titanic, there were 4 Jacks. May be a Jack of Clubs, Jack of spades, a Jack of diamons and finally a Jack of hearts, the protogonist. One of them even managed to survive the disaster. There was only one Rose, who unlike the movie was not engaged to someone but married. She was only 17 but married, a sad love story indeed.

--

--