Sweetviz: EDA in 2 Lines of Python Code

Overview and Implementation of Sweetviz library

Satyam Kumar
The Startup

--

Image by Mudassar Iqbal from Pixabay

Exploratory data analysis (EDA) is an approach to analyze the data and summarize its main characteristics, often with visual methods. A data scientist spends most of the time understanding data and getting insights. EDA is an essential and time-consuming step in the end-to-end machine learning pipeline.

EDA involves a lot of steps including some statistical tests, quantitative tests, visualization of data, and many more. Some of the key steps for EDA are:

  • Data Quality Check: It refers to the analysis of each feature such as data types, duplicate values, missing values, etc. It can be done using pandas functions: describe(), info().
  • Type Inference: Referring to each feature as text, numerical, or categorical.
  • Statistical Test: Statistical test to get maximum information about their correlation and statistical significance, using some statistical test such as ANOVA, Pearson correlation, etc.
  • Visualization: Graphical techniques like bar plots, pie charts are used to get an understanding of categorical features, whereas scatter plots, histograms are used for numerical features. Plots also include a comparison of the train and test data or a comparison between two…

--

--