Exploratory Data Analysis in Python

Raghavendra R
astringe
Published in
3 min readJan 22, 2021

An Exploratory Data Analysis, or EDA, is an exhaustive look at existing data from current and historical surveys conducted by a company.

In addition, the appropriate variables from your company’s customer database — such as information about rate plans, usage, account management, and others — are typically included in the analysis.

The intent of the EDA is to determine whether a predictive model is a viable analytical tool for a particular business problem, and if so, which type of modeling is most appropriate.

The deliverable is a low-risk, low-cost comprehensive report of findings of the univariate data and recommendations about how the company should use additional modeling.

At the very least, the EDA may reveal aspects of your company’s performance that others may not have seen.

Why EDA

An EDA is a thorough examination meant to uncover the underlying structure of a data set and is important for a company because it exposes trends, patterns, and relationships that are not readily apparent.

You can’t draw reliable conclusions from a massive quantity of data by just gleaning over it — instead, you have to look at it carefully and methodically through an analytical lens.

Getting a “feel” for this critical information can help you detect mistakes, debunk assumptions, and understand the relationships between different key variables. Such insights may eventually lead to the selection of an appropriate predictive model.

Why EDA is necessary for Machine Learning?

Sometimes even the things we see with our naked eyes are not the “naked” truth. It needs time, conviction, and certainty to get behind the truth. EDA — Exploratory Data Analysis — does this for Machine Learning enthusiasts. It is a way of visualizing, summarizing, and interpreting the information that is hidden in rows and column format. EDA is one of the crucial steps in data science that allows us to achieve certain insights and statistical measure that is essential for the business continuity, stockholders and data scientists. It performs to define and refine our important features variable selection, which will be used in our model.

Once EDA is complete and insights are drawn, its feature can be used for supervised and unsupervised machine learning modeling. The EDA is executed majorly by Uni-variate visualization, Bi-variate visualization, Multivariate Visualization, and Dimensionality reduction.

We initially make several hypotheses by looking at the data before we hit the modeling. And it's quite a good practice cause that will engage you more with EDA part. EDA helps you in confirming and validating the hypothesis you make. And from here you start your feature engineering part and take a flight to machine learning modeling.

--

--