Understand a Dataset in Seconds Using Pandas Profiling

Doing EDA of the whole dataset in one shot. Super Easy! Don’t believe it, do it for yourself!!

Published in

Nerd For Tech

5 min readApr 29, 2021

In this blog, we will see the amazing types of mini-reports and EDA generated by Pandas Profile, how can we analyze data from this, how to save the report in HTML and other format so as to be able to give instant presentation and drive amazing data analysis from it.

About Pandas Profiling:

Pandas profiling is a package of Pandas that lets you do Exploratory analysis of your database. Much like the pandas df.describe() function (which does basic EDA) pandas_profiling extends the analysis of DataFrame with df.profile_report() for getting a complete Report.

Pandas Profiling is an incredible open-source tool that every data scientist should consider for data exploration.

It is an efficient way to digest and analyze an unfamiliar dataset by providing in-depth descriptive statistics, visual distribution graphs, and a set of correlation tools.

The Pandas-profiling report offers:

The complete Dataset overview
Report on each attribute and variables
Gives analyzed different types of correlations between attributes
Shows warning: Inaccuracies,duplicacy in the dataset, you might need to work upon
Variable Types: Categorical, Numerical, etc
Reports upon missing values and zeroes(with graphs)
Creates superfast, detailed report
Distinct values, common values, cardinality, memory usage,
Statistical Report: Descriptive, Quantile

and much more…………

You can Toggle on further details for each sub-report, all this being offered in few code lines!!

With Pandas profiling we can quickly do an exploratory data analysis with just a few lines of code.

If this is not enough to convince us to use this tool, it also generates interactive reports in a web format that can be presented to any person, even if they don’t know to program.

In short, what pandas profiling does is save us all the work of visualizing and understanding the distribution of each variable. It generates a report with all the information easily available.

Installation:

Using Pip:

pip install pandas-profiling[notebook]

From GitHub:

pip install https://github.com/pandas-profiling/pandas-profiling/archive/master.zip

Using conda:

You can install using the conda package manager by running

conda install -c conda-forge pandas-profiling

Documentation

You can find the documentation of pandas_profiling here.

Using pandas profiling

#pip install pandas_profiling

Importing libraries

import pandas as pd
import pandas_profiling

Hands-on With Database:

You can check out the complete clean code and dataset on my Jupyter Notebook here: https://github.com/shelvi31/Pandas-Profiling

The dataset has covid-19 cases reported county-wise.

Code:

import pandas as pd
import pandas_profilingdf2 = pd.read_csv("corona_dataset")
profile2 = df2.profile_report(title="Corona Small dataset report")
profile2

Here’s the Output

Output Dataset Overview: The Profile Report gives a statistical overview of our complete dataset. Including no. of categorical and numerical variables, duplicacy, missing. It’s basically a statistical snapshot of the database. The Statistics related to my covid database,

2. Output Report on Variables: The profile report offers us an individual detailed report on each variable. It’s so detailed that we would hardly need to look upon anything else.

3. Output: Interaction Report

The profile report generates interaction Reports between all individual entities. Shown in the image is the output interaction report between India-Pakistan Covid-19 cases. You can find these interactions for any sets of columns in your database.

4. Output Report Correlational Matrix:

Relationship of variables with each other.You can always toggle for more details.

Other co-relational graph developed by the report:

5. Output Report Warnings Issued:

The profile report issues the warnings and alert where we might need to work upon on our database or we might have to be cautious about, including high cardinality, high correlation etc

5. Output Dataset Sample: Randomly picked dataset values, that gives in detail views, first rows, last rows etc

6. Output Report on Missing Values:

The profile Report shows missing values per column, here he missing values for each country

Trying For larger Dataset:

You can find the dataset here : https://github.com/shelvi31/Pandas-Profiling

df = pd.read_csv("worldometer_coronavirus_daily_data.csv")

The pandas_profiling library in Python include a method named as ProfileReport()

Generating Profile Report for Large Dataset

pandas_profiling.ProfileReport(df)

Output:

Converting to Jupyter Widget

Ways to save the generated report:

Jupyter Widgets

profile.to_widgets()

Iframes

profile.to_notebook_iframe()

As a string

json_data = profile.to_json()

As a file:

profile.to_file("report.json")

As an HTML File:

profile.to_file(“profile.html”)

Converting My report to Jupyter Widget: will give a result something like this

profile2.to_widget

You can check out the complete clean code and dataset on my Jupyter Notebook here: https://github.com/shelvi31/Pandas-Profiling

Also checkout Output Report on your Live server: https://raw.githubusercontent.com/shelvi31/Pandas-Profiling/main/output.html

… and if you like this article, feel free to leave a few hearty claps :)

Understand a Dataset in Seconds Using Pandas Profiling

Doing EDA of the whole dataset in one shot. Super Easy! Don’t believe it, do it for yourself!!

About Pandas Profiling:

About Pandas Profiling:

The Pandas-profiling report offers:

You can Toggle on further details for each sub-report, all this being offered in few code lines!!

Installation:

Using Pip:

Using conda:

Documentation

Using pandas profiling

Importing libraries

Hands-on With Database:

Here’s the Output

Generating Profile Report for Large Dataset

Ways to save the generated report:

Jupyter Widgets

Iframes

As a string

As a file:

As an HTML File:

Converting My report to Jupyter Widget: will give a result something like this

Written by Shelvi Garg