Quick Exploratory Data Analysis: Pandas Profiling

Secret Sauce for EDA

Mala Deep
Mala Deep
Feb 3 · 3 min read

Data is nothing until you understand it and visualize it most effectively and this is what we call Exploratory Data Analysis(EDA)

EDA cycle: Understanding data quality, description, shape, patterns, relationships, and visualizing it for better understanding. Read more about EDA.

Pandas is a Python library that provides extensive means for data analysis. We often work with data stored in table formats like .csv, .xlsx, and Pandas makes it very convenient to load, process, and analyze such data. Pandas in conjunction with Matplotlib and Seaborn,provides a wide range of opportunities for data analysis. If you are familiar with Jupyter Notebook, then I am sure you have already used pandas in one way or another.

EDA can be more tedious or thrilling

For some, EDA can be more tedious. For someone more thrilling, whatsoever, the ultimate goal is to understand it and visualize it with the motive of finding some original patterns and trends within the underlying data.

If you belong to EDA: being tedious or thrilling pandas_profiling will be your secret sauce.

Action Time

Prerequisites:

Install Python and Jupyter Notebook ( I prefer using Anaconda as it is simple and easy

Once you install them, open up Jupyter Notebook.

then, let’s import the required packages/ dependencies

Now, let’s import our dataset. For the demo purpose, I am using the dataset of Student Alcohol Consumption | Kaggle

Now here comes the Secret Sauce

One line magical code that ultimately gives you an entire EDA report.

Here pandas_profiling extends the pandas DataFrame with ProfileReport(df) for quick data analysis.

Gives entire data report inside a single cell of Jupyter Notebook

So for a given dataset, it computes the following statistics:

1. Essentials: type, unique values, missing values.
2. Quantile statistics like minimum value, Q1, median, Q3, maximum, range, interquartile range.
3. Descriptive statistics like mean, mode, standard deviation, sum, median absolute deviation, coefficient of variation, kurtosis, skewness.
4. Most frequent values.
5. Histogram.
6. Correlations show the correlated variables, Spearman and Pearson matrices.
7. Sample of dataset

Overview:

Image for post
Image for post

And for each variable:

Image for post
Image for post
Image for post
Image for post

On Clicking Toggle details we can see Statistic, Histogram, Common Values and etc.

Image for post
Image for post

Correlation:

Image for post
Image for post

Last, it gives Sample

Image for post
Image for post

Conclusion
Pandas profiling is a great tool to speed up your exploratory data analysis (EDA). In just one line of python code (your Secret Sauce ) to generate detailed insights from the data, which helps to boost our productivity as a data scientist/analyst. Saying this does not mean that your EDA is complete. To understand the data more deeply, sometime we should complete the EDA manually.

Happy profiling!

Stay tuned for next Data Science related Post.

Analytics Vidhya

Analytics Vidhya is a community of Analytics and Data…

Sign up for Data Science Blogathon: Win Lucrative Prizes!

By Analytics Vidhya

Launching the Second Data Science Blogathon – An Unmissable Chance to Write and Win Prizesprizes worth INR 30,000+! Take a look

By signing up, you will create a Medium account if you don’t already have one. Review our Privacy Policy for more information about our privacy practices.

Check your inbox
Medium sent you an email at to complete your subscription.

Mala Deep

Written by

Mala Deep

Data Science | Data Visualization | Community Work Focused | Philekoos | https://www.linkedin.com/in/maladeep/

Analytics Vidhya

Analytics Vidhya is a community of Analytics and Data Science professionals. We are building the next-gen data science ecosystem https://www.analyticsvidhya.com

Mala Deep

Written by

Mala Deep

Data Science | Data Visualization | Community Work Focused | Philekoos | https://www.linkedin.com/in/maladeep/

Analytics Vidhya

Analytics Vidhya is a community of Analytics and Data Science professionals. We are building the next-gen data science ecosystem https://www.analyticsvidhya.com

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store