Limah-bee
2 min readJun 30, 2024

A review of the Titanic Dataset on Kaggle

INTRODUCTION

The Titanic Dataset on Kaggle provides comprehensive data about the passengers who boarded the Titanic. This dataset includes variables such as passenger age, gender, class, fare, and survival status. The purpose of this review is to identify initial insights from the dataset and predict the number of survivors.

OBSERVATION

  • On first sight it is observed that we have missing data in the ages and cabin column
  • The ages are in decimal (which can be expressed in terms of Years and Months).
  • Some passenger ticket has alpha-numeric, text and numeric data type.
  • Some passenger have more than one cabin assigned to them.
  • The fare column has decimal (2, 3, 4) places and whole number.

Visualization

From the analysis of the Titanic Dataset it is observed that 1309 passengers boarded with only 494 survivals and 815 death.

Based on gender analysis of the survival , it is observed that 385 females survive with 109 survival for male .

CONCLUSION

Upon preliminary examination, the Titanic dataset exhibits multiple data quality problems, such as missing values and inconsistent data formats. These findings lay the groundwork for more thorough data cleansing and in-depth examination. The connections between these factors and how they affect survival rates can be investigated further.

Be a part of this life changing transformatiom….https://hng.tech/internship https://hng.tech/premium