Exploratory Data Analysis

Manish Kumar Thota
Analytics Vidhya
Published in
2 min readJun 11, 2020

Welcome to EDA

“Far better an approximate answer to the right question, which is often vague, than an exact answer to the wrong question, which can always be made precise.”

John Tukey

Fast Fourier Transform (FFT) algorithm and box plot were the two exceptional works of John Tukey, he is also known as the father of EDA.

In this blog, I’ll be discussing various steps involved in Exploratory data analysis and its relevance to the industry.

I will be showing you how to extract the maximum insights from the data with minimum effort required and ways to make useful business decisions out of it.

EDA is an eloquent step in any kind of data analysis, It is all about uncovering the interesting patterns in the data which includes these following steps:

  • Data sourcing
  • Data cleaning
  • Univariate analysis
  • Bivariate and multivariate analysis

At this stage, you might be having a doubt of why do we need EDA?

So to answer that, EDA helps in engaging with machine learning algorithms. And before you jump into machine learning or modeling of your data, your first priority should go through these minor steps which would make your data more insightful and would be relevant to take any important decisions.

EDA is a critical step in any data science activity and findings from it can completely direct the ML/non-ML activities that we do later.

Bonus: We could distinguish the major chunk of data through plotting Boxplots which detects the outliers and anomalies.

Here the boxplot gives a clear idea that more positive responses came from people with higher salaries
box plot gives a clear idea that more positive responses came from people with higher salaries because 50% of the data with a ‘yes’ response lies in the higher salary region. This is despite the fact that people with positive as well as negative responses have almost the same median values.

In the coming days ill be adding a thread of posts related to each step in EDA and then explaining how it will help to model your data including machine learning algorithms, keep in touch so that we could have fun with data.

Support my work❤️

Keep coding!

Manish Kumar

Data Science Enthusiast

--

--