TDS Archive

An archive of data science, data analytics, data engineering, machine learning, and artificial intelligence writing from the former Towards Data Science Medium publication.

Member-only story

All You Need Is Statistics to Analyze Tabular Datasets

15 min readSep 10, 2024

--

Photo by Dan Cristian Pădureț on Unsplash

Tabular datasets are one of the most common forms of data and consist of a mix of variables such as binary, categorical, textual, and continuous values. A well-known tabular dataset is, for example, the Titanic dataset. The major challenge in such datasets is the way of analyzing the variables because analysis of categorical values needs different statistics and/or models than categorical values, and so on. In addition, key is also to determine multicollinearity in the dataset because variables with statistically similar behavior can affect the reliability of models. In this blog post I will demonstrate the steps of pre-processing tabular datasets and how statistical tests, such as Hypergeometric testing, can show the relationship across variables. In addition, I will explain the importance of multiple test corrections, and show how to apply Principal Component Analysis on a tabular dataset.

The Very First Step Is a Visual Inspection.

--

--

TDS Archive
TDS Archive

Published in TDS Archive

An archive of data science, data analytics, data engineering, machine learning, and artificial intelligence writing from the former Towards Data Science Medium publication.

Erdogan Taskesen
Erdogan Taskesen

Written by Erdogan Taskesen

Machine Learning | Statistics | D3js visualizations | Data Science | Ph.D | erdogant.github.io

Responses (1)