Exploratory Data Analysis & Data Visualization in Less Than 10minutes

Your secret sauce for exploratory data analysis (Using Pandas Libraries).

Gaurav Rajgor
4 min readJul 31, 2022
by KDnuggets.

Exploratory data analysis & data visualization in less than 10 minutes with Low Code or No Code??

Yes, You Heard it Right!

Exploratory Data Analysis is the most important part of Data Analytics Journey. It usually takes around 70%-80% of time in any Data Science project cycle, which majorly focuses on data quality, description, shape, patterns, relationships, and visualizing it for better understanding.

In John Tuckey’s Words,

“Exploratory data analysis can never be the whole story, but nothing else can serve as the foundation stone –as the first step.”

EDA might be more time-consuming for some. For someone who finds it more exciting, the final aim is to analyze and visualize it with the intention of discovering some unique patterns and trends within the underlying data.

Wait a minute… but how we’re gonna do 80% of such time-consuming task in less than 10 minutes???

Let me spill the beans.

Prerequisites:

Install Python and Jupyter Notebook (I prefer using Google colab as it is simple and easy)

Let’s import the necessary packages/dependencies.

Now here comes the Secret Sauce

One-line magical code that ultimately gives you an entire EDA report.

Provides a comprehensive data report within a single Jupyter Notebook cell.

As a result, it computes the following statistics for a given dataset:

1. Essentials: type, unique values, missing values.
2. Quantile statistics like minimum value, Q1, median, Q3, maximum, range, interquartile range.
3. Descriptive statistics like mean, mode, standard deviation, sum, median absolute deviation, coefficient of variation, kurtosis, skewness.
4. Most frequent values.
5. Histogram.
6. Correlations show the correlated variables, Spearman and Pearson matrices.
7. Missing Values.

Overview:

And for each variable:

On Clicking Toggle details we can see Statistic, Histogram, Common Values and etc.

Customize your plot:

Correlation:

To understand how the attributes are connected to one another, use a correlation matrix.

Interactions:

Handling missing values:

Missing values are those rows or columns which have no data recorded in particular observation. Analyzing these values is important as this may lead to weak or biased analysis.

And that’s it……

Conclusion

As you can see Pandas profiling is an excellent tool for accelerating exploratory data research (EDA). In only one line of Python code (your Secret Sauce), we can extract deep insights from data, boosting our productivity as data scientists/analysts. However, this does not imply that your EDA is finished. We should sometimes finish the EDA manually to better comprehend the data.

References

  1. https://docs.dataprep.ai/index.html
  2. Titanic Exploratory Data Analysis | Kaggle
  3. https://www.youtube.com/channel/UC7OpZsQwWcmuD0SUaOjGBMA
  4. https://www.youtube.com/watch?v=5iWoOMgo5I0&t=5s

There’s always room for improvement. I welcome feedback and discussions.

Thank you for reading. Please post comments if you have any suggestions.

--

--