Sitemap
Omics Diary

A platform to share knowledge and information on how to use Python for systems vaccinology and omics analysis. Also covers interesting scientific literature related to infectious diseases and vaccines. Interested in contributing? Contact me to collaborate!

How to perform a comprehensive exploratory data analysis with 3 lines of code in Python

Exploratory data analysis can be a breeze with Pandas Profiling.

5 min readJan 1, 2022

--

Press enter or click to view image in full size
Exploratory data analysis is critical for data analysis. How do we systematically do it?

The first most important step in data analysis is to perform a exploratory data analysis (EDA). EDA involves inspecting for null and duplicate values, data preprocessing, assessing distribution of each variable and identifying simple trends between variables. Depending on the complexity of the datasets, data scientists can spend between 50%-90% of the time performing EDA, to ensure that the data is properly processed for further in-depth analysis.

The type of pre-processing methods used will depend on the data. For instance, if you have a few number of null values in your data, then it may be alright to drop these values. However, if a large proportion of them are null values, then you may want to use the mean, median or machine learning methods to fill up the missing values. Another example where EDA is useful is when deciding on the appropriate statistics to use. Importantly, the statistical tests used for variables with Gaussian distribution will be different from another dataset where the distribution may be skewed. Finally, the strength of correlation between different variables can provide insights for machine learning, where features with poor correlation or…

--

--

Omics Diary
Omics Diary

Published in Omics Diary

A platform to share knowledge and information on how to use Python for systems vaccinology and omics analysis. Also covers interesting scientific literature related to infectious diseases and vaccines. Interested in contributing? Contact me to collaborate!

Kuan Rong Chan, Ph.D.
Kuan Rong Chan, Ph.D.

Written by Kuan Rong Chan, Ph.D.

Kuan Rong Chan, PhD, Senior Principal Research Scientist in Duke-NUS Medical School. Virologist | Data Scientist | Loves mahjong | Website: kuanrongchan.com

No responses yet