Quickly Generate Data Reports With Python

An intro to Pandas Profiling.

Vinicius Porfirio Purgato
Analytics Vidhya
4 min readFeb 16, 2021

--

Image by Pandas Profiling

Before we start

Do not forget to follow me on my GitHub and LinkedIn accounts. I love to write about Data Science and to share cool stuff with people on the internet.

Easing your EDA

How about having an easier way to start your Exploratory Data Analysis (EDA) and make data reports that give you great insights? Sounds nice, eh?

With Pandas Profiling, that is possible.

You might be asking yourself what Pandas Profiling is. No, it’s not a bunch of Chinese pandas computing data.

Photo by Damian Patkowski on Unsplash

Pandas Profiling is an open-source python library, which allows you to do your EDA very quickly. By the way, it also generates an interactive HTML report, which you can show to anyone. Imagine going to your boss, who doesn’t code, with an interactive description of the company’s data. Great for your branding, right?

These are some of the things you get in your report:

  • Type inference: detect the types of columns in a Data Frame.
  • Essentials: type, unique values, missing values.
  • Quantile statistics like minimum value, Q1, median, Q3, maximum, range, interquartile range.
  • Descriptive statistics like mean, mode, standard deviation, sum, median absolute deviation, coefficient of variation, kurtosis, skewness.
  • Most frequent values.
  • Histogram.
  • Correlations highlighting of highly correlated variables, Spearman, Pearson and Kendall matrices.
  • Missing values matrix, count, heat-map and dendrogram of missing values.
  • Text analysis learns about categories (Uppercase, Space), scripts (Latin, Cyrillic) and blocks (ASCII) of text data.
  • File and Image analysis extract file sizes, creation dates and dimensions and scan for truncated images or those containing EXIF information.

Given this, let’s get going.

First of all, you need to install the package.

Now, let’s import both pandas and panda_profiling.

We will be using the Titanic dataset to complete our analysis, let’s import it:

After you import it, you should always take a look at your dataset and then merely link report to it:

Now you simply have “to tell” Pandas Profiling to make a report out of your dataset.

There you go. As simple as that. You can check the result here.

GIF by Pandas Profiling.

If you use a Jupyter Notebook, your report is embedded in it. However, you may want to use it in other places, and Pandas Profiling also allows you to do that. Just type this to save your report as an HTML file:

If you want the HTML source “code” (don’t kill me for calling it code), which would be quite rare, however possible, just type:

It will return the whole HTML source code.

HTML returned

You can even save it as a JSON file:

Conclusion

Today you learned the basics (it doesn’t get much more complex than that) of Pandas Profiling, a simple, however powerful tool.

In your report you will have the following sections:

  • Overview.
  • Variables.
  • Interactions.
  • Correlations.
  • Missing Values.
  • Sample.

With four lines of code, you can have this beautiful report. If I were you, it would totally be on my tool list for my Data Analysis routine. It just makes your work much more dynamic.

It can even save you a couple of hours.

Not to say that the report is beautiful, minimalist and interactive, making it easy for anyone who takes a look at it to understand.

By the way, you can also edit the report, however, that’s something for another post. :)

Reference:

--

--

Vinicius Porfirio Purgato
Analytics Vidhya

Computer Science student in love with teaching and learning Data Science. Python lover and R bully :)