Quick Exploratory Data Analysis: Pandas Profiling
Secret Sauce for EDA
Data is nothing until you understand it and visualize it most effectively and this is what we call Exploratory Data Analysis(EDA)
EDA cycle: Understanding data quality, description, shape, patterns, relationships, and visualizing it for better understanding. Read more about EDA.
pandas is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool, built on top of…
Pandas is a Python library that provides extensive means for data analysis. We often work with data stored in table formats like .csv, .xlsx, and Pandas makes it very convenient to load, process, and analyze such data. Pandas in conjunction with Matplotlib and Seaborn,provides a wide range of opportunities for data analysis. If you are familiar with Jupyter Notebook, then I am sure you have already used pandas in one way or another.
EDA can be more tedious or thrilling
For some, EDA can be more tedious. For someone more thrilling, whatsoever, the ultimate goal is to understand it and visualize it with the motive of finding some original patterns and trends within the underlying data.
If you belong to EDA: being tedious or thrilling pandas_profiling will be your secret sauce.
Generates profile reports from a pandas DataFrame. The pandas df.describe() function is great but a little basic for…
Once you install them, open up Jupyter Notebook.
then, let’s import the required packages/ dependencies
Now, let’s import our dataset. For the demo purpose, I am using the dataset of Student Alcohol Consumption | Kaggle
Now here comes the Secret Sauce
One line magical code that ultimately gives you an entire EDA report.
Here pandas_profiling extends the pandas DataFrame with ProfileReport(df) for quick data analysis.
Gives entire data report inside a single cell of Jupyter Notebook
So for a given dataset, it computes the following statistics:
1. Essentials: type, unique values, missing values.
2. Quantile statistics like minimum value, Q1, median, Q3, maximum, range, interquartile range.
3. Descriptive statistics like mean, mode, standard deviation, sum, median absolute deviation, coefficient of variation, kurtosis, skewness.
4. Most frequent values.
6. Correlations show the correlated variables, Spearman and Pearson matrices.
7. Sample of dataset
And for each variable:
Toggle details we can see Statistic, Histogram, Common Values and etc.
Last, it gives Sample
Pandas profiling is a great tool to speed up your exploratory data analysis (EDA). In just one line of python code (your Secret Sauce ) to generate detailed insights from the data, which helps to boost our productivity as a data scientist/analyst. Saying this does not mean that your EDA is complete. To understand the data more deeply, sometime we should complete the EDA manually.
Stay tuned for next Data Science related Post.