Exploratory Data Analysis Made Easy: Using Pandas Profiling

@lee-rowe
Nerd For Tech
Published in
3 min readJun 14, 2021

--

In statistics, exploratory data analysis is an approach of analyzing data sets to summarize their main characteristics, often using statistical graphics and other data visualization methods. This is a word for word definition of what going through the process of EDA involves. At first this may seem like a daunting task when first starting in the data science world, but luckily a very intuitive tool has been designed to combat this. This open source python library is called Pandas Profiling and in this short blog post I will be discussing how to use it as well as the vast amount of helpfulness it provides.

Photo by Pascal Müller on Unsplash

This module is super easy to use and work with, it works using the Pandas library from Python. When you want to begin your analysis of data you no longer will want to start by using df.describe() as the profile report command, df.profile_report() completes not only everything that the describe would have but also generates an entire report of your data frame that is displayed back in an interactive and easy to use HTML report. The following statistics are taken into account for each column varying depending on the column’s type and the values it includes.

  • Type inference: detect the types of columns in a data frame.
  • Essentials: type, unique values, missing values

--

--