Sweetviz: EDA in 2 Lines of Python Code

Overview and Implementation of Sweetviz library

Satyam Kumar
The Startup


Image by Mudassar Iqbal from Pixabay

Exploratory data analysis (EDA) is an approach to analyze the data and summarize its main characteristics, often with visual methods. A data scientist spends most of the time understanding data and getting insights. EDA is an essential and time-consuming step in the end-to-end machine learning pipeline.

EDA involves a lot of steps including some statistical tests, quantitative tests, visualization of data, and many more. Some of the key steps for EDA are:

  • Data Quality Check: It refers to the analysis of each feature such as data types, duplicate values, missing values, etc. It can be done using pandas functions: describe(), info().
  • Type Inference: Referring to each feature as text, numerical, or categorical.
  • Statistical Test: Statistical test to get maximum information about their correlation and statistical significance, using some statistical test such as ANOVA, Pearson correlation, etc.
  • Visualization: Graphical techniques like bar plots, pie charts are used to get an understanding of categorical features, whereas scatter plots, histograms are used for numerical features. Plots also include a comparison of the train and test data or a comparison between two…

