Data Analysis of a titanic dataset from Kaggle

Introduction

Cynthia Allan-Gyimah
2 min readJun 28, 2024

--

The Titanic dataset from Kaggle includes information on the passengers of the Titanic, aiming to predict survival based on various features. The dataset headers are as follows: PassengerId, Survived, Pclass, Name, Sex, Age, SibSp, Parch, Ticket, Fare, Cabin, and Embarked. The purpose of this analysis is to derive initial insights that can guide further data analysis and predictive modeling. https://hng.tech/internship and https://hng.tech/hire

Observations

Summary Statistics

  • Total Passengers: 891
  • Survival Rate:
  • Survived: 38.4%
  • Did not survive: 61.6%

Class Distribution:

  • 1st Class: 24.2%
  • 2nd Class: 20.7%
  • 3rd Class: 55.1%

Gender Distribution:

  • Male: 64.8%
  • Female: 35.2%

Initial Insights

  1. Survival by Gender:
  • Males: 18.9% survived
  • Females: 74.2% survived

Survival by Class:

  • 1st Class: 62.9% survived
  • 2nd Class: 47.3% survived
  • 3rd Class: 24.2% survived

Age Distribution:

  • Average Age: 29.7 years
  • Age Range: 0.42 to 80 years

Fare Distribution:

  • Average Fare: $32.20
  • Fare Range: $0 to $512.33

Basic Visualization

  • Survival by Gender:
Survival rate by Gender
  • Survival by Class:
Survival By Class

Conclusion

The initial analysis indicates significant differences in survival rates based on gender and class. Females and first-class passengers had a much higher survival rate compared to males and third-class passengers. Further analysis should focus on:

  • Correlation Analysis: Investigating relationships between survival and features like Age, Fare, SibSp, and Parch.
  • Predictive Modeling: Using classification algorithms to predict survival outcomes.
  • Handling Missing Data: Addressing missing values in Age, Cabin, and Embarked columns for more comprehensive analysis.

--

--

Cynthia Allan-Gyimah

Spatial Data Analyst, Environmental Activist, Geomatic Engineer Passionate about learning new things to bring solutions to business stakeholders