Introduction

Regina Rukeme Agaren
3 min readJun 28, 2024

--

The Titanic

The Titanic dataset from Kaggle is a classic dataset used for machine learning and data analysis, containing information about passengers on the Titanic. This report aims to provide initial insights derived from the dataset, highlighting key observations about the passengers and their survival rates. The findings presented here are based on a preliminary review of the data, focusing on gender distribution, survival rates across different classes, and age groups.

Observations

1. Gender Distribution
Out of the 891 passengers in the dataset, a significant majority were women, accounting for 64.76% of the total. This gender distribution is crucial for understanding survival rates, as it provides context for further analysis.

2. Survival Rates by Class
The survival rates varied significantly across different classes:
- Lower Class: 34.80%
- Middle Class: 25.44%
- Upper Class: 39.77%

This distribution indicates that passengers in the upper class had a higher chance of survival compared to those in the middle and lower classes.

3. Age Group Analysis
Young adults aged 16 to 30 had the highest survival rate at 32.46%. This insight highlights the importance of age in determining the likelihood of survival.

4. Gender and Survival
Women had a notably higher survival rate of 68.13% compared to men, who had a survival rate of 31.87%. This disparity is a significant observation, emphasizing the influence of gender on survival chances.

5. Overall Survival Rate
The overall survival rate among the passengers was 38.38%. This figure provides a general overview of the survival odds faced by passengers on the Titanic.

6. Socioeconomic Status
A considerable portion of the passengers, 55.11%, were from the lower socioeconomic class. This observation, combined with the survival rates by class, suggests a correlation between socioeconomic status and survival likelihood.

Visualizations

Gender Distribution and Survival Rates

This visual illustrates the distribution of genders among passengers and their respective survival rates, highlighting the higher survival rate of women.

Age Group and Survival Rate

This visualization depicts the survival rates among different age groups, emphasizing the higher survival rate of young adults.

Conclusion

The initial review of the Titanic dataset reveals significant insights into the factors influencing passenger survival. Gender, socioeconomic status, and age were critical determinants of survival odds. Women and upper-class passengers had higher survival rates, while young adults also fared better compared to other age groups. These observations lay the foundation for more in-depth analyses and highlight the importance of considering multiple factors when studying survival data.

This analysis was conducted as a part of my data analysis internship program with HNG. For more information about the HNG Internship program and how it can benefit aspiring data analysts, visit (https://hng.tech/internship) and (https://hng.tech/hire).

--

--

Regina Rukeme Agaren

A fervent data analyst who's on a journey to master data science and AI. Passionate about teaching, art, and football—especially Arsenal.