Exploratory Data Analysis in Seconds

Kelvin Jose
The Startup
Published in
5 min readJun 4, 2020

--

Exploratory Data Analysis (EDA) is not a formal process with a strict set of rules. More than anything, EDA is a state of mind. During the initial phases of EDA you should feel free to investigate every idea that occurs to you. Some of these ideas will pan out, and some will be dead ends. As your exploration continues, you will home in on a few particularly productive areas that you’ll eventually write up and communicate to others.

Exploratory Data Analysis (EDA) is an important part of any data analysis, even if the questions are handed to you on a platter, because you always need to investigate the quality of your data. Data cleaning is just one application of EDA: you ask questions about whether your data meets your expectations or not. To do data cleaning, you’ll need to deploy all the tools of EDA: visualization, transformation, and modelling.

Today we’re going to discuss about a new tool called pandas_profiling which does all the EDA for us, in a handy way. It generates profile reports from a pandas DataFrame. The pandas df.describe() function is great but a little basic for serious exploratory data analysis. pandas_profiling extends the pandas DataFrame with df.profile_report() for quick data analysis.

We can get started with pandas_profiling using the pip package manager by running

--

--