Sweetviz: EDA in 2 Lines of Python Code
Overview and Implementation of Sweetviz library
Published in
4 min readJan 8, 2021
Exploratory data analysis (EDA) is an approach to analyze the data and summarize its main characteristics, often with visual methods. A data scientist spends most of the time understanding data and getting insights. EDA is an essential and time-consuming step in the end-to-end machine learning pipeline.
EDA involves a lot of steps including some statistical tests, quantitative tests, visualization of data, and many more. Some of the key steps for EDA are:
- Data Quality Check: It refers to the analysis of each feature such as data types, duplicate values, missing values, etc. It can be done using pandas functions:
describe(), info()
. - Type Inference: Referring to each feature as text, numerical, or categorical.
- Statistical Test: Statistical test to get maximum information about their correlation and statistical significance, using some statistical test such as ANOVA, Pearson correlation, etc.
- Visualization: Graphical techniques like bar plots, pie charts are used to get an understanding of categorical features, whereas scatter plots, histograms are used for numerical features. Plots also include a comparison of the train and test data or a comparison between two…