What is Exploratory Data Analysis?

Brijesh Singh
Nucleusbox
Published in
2 min readFeb 3, 2022

One day a client ask me, look we have a set of users and they got some data, but they are not sure what question to ask of it.

Now you have to analyze the data but I am not going to tell you what to analyze.

This is not an uncommon question, Often we use to get this question, “here is a data do some magic”. Is there any science for this or is it just that some people are good with data and can find some interesting things with this.

At least This process has a name and that is called Exploratory Data Analysis also called EDA

EDA (Getting a better understanding of data)

Exploratory data analysis is the first and foremost step to analyzing any kind of data. EDA is an approach, which seeks to explore the most important and often hidden pattern in the data set.

While doing data exploration we form a hypothesis, which can prove using the hypothesis testing technique. Statisticians called this a bird’s eye view of data and try to make some sense of it.

To solve the business problems we required clean historical data. The better the data, the more insight you can get. Better data means we required a clean, normalized, and standardized data set. Data comes from different sources, for example, Relational data sources like Oracle, MySql, Postgres, all other relational databases systems. Web, IOT data, Marketing, Sales, and the list goes on and on.

There are two main sources of data. Private and Public data source.

  • Private Data: Private data is more sensitive in nature and available in the public domain. Banking, Retail, telecom, and media are some of the sectors which rely on data to make decisions. An organization leverage data analytics to become customer-centric.
  • Public Data: Government and public agencies have collected a large amount of data for research purposes. Such data is not required any special permission for access. public data available on the internet and various other platforms. Anyone can you these datasets for analysis purposes.

Data Cleaning Approach

There is various type of data quality issue that comes in data. Data Cleaning is the most time-consuming job in the analytics process.

Data cleaning steps by www.nucleusbox.com

Read more…

--

--

Brijesh Singh
Nucleusbox

Working at @Informatica. Master in Machine Learning & Artificial Intelligence (AI) from @LJMU. Love to work on AI research and application. (1+2+3+…~ = -1/12)