EDA using python, just one command
Exploratory Data Analysis is an important step for any data science project. Today I will explain about one command in python can give you a complete report of the EDA , Honey Production data set used as a sample.
Let us download the data of honey production from Kaggle to explore the data
CODE :
Understanding the code
We will install the library name “pandas-profiling” and import “pandas_profiling” from ProfileReport. also need to import pandas to read our data, “import pandas as pd” and “pd.read_csv()” belongs to the pandas library.
After running the code it may take some time to install this library also generating report it will generate a large report we will understand page by page
When process gets complete.
This will create html file in your project folder with name output.html open this file.
Overview: This section will provide a summary of our data, how many variables, the total number of observations, is there missing information? , is data duplicate, what are the type of variables how many numeric and how many categorical.
Next to Summary this report provides detail information about each variable
# How many unique values
# Type of variable
# Max and Min values
# Mean
# Data distribution graph
Interactions: Find the correlation between each variable with this plot, it has a classic features with tab combination to view the correlations between all variables.
This report also provides various corr plot.
it is easy to find highly correlated variables here under the warning tab
Check out each variable details with data , histogram and other details
This is one of the best tool for EDA.