Basics to know before even you start exploratory data analysis (EDA)
Data enthusiasts just love EDA. I am sure people who have gone through lots of data now have their pathways or templates created which saves them a lot of time and comes to the conclusion.
But for data aspirants who are just starting EDA can be exhausting sometimes if you are not reiterating questions in your mind again and again so that you are not lost inside.
In this article, I am going to list down a few things which can be helpful for guiding through EDA.
Majorly doing EDA has the following objectives :
- Maximise Insights
- Uncover underlying structure
- Extract important variables
- Detect Anomalies
- Test underlying assumptions
The objective can be different from what I have listed, but we have to have an objective before starting.
Univariate Analysis
This means looking at each variable at a single time. In this analysis typically five-point summary is calculated.
Measure of Central tendencies: Mean, Median and Mode
Measure of Dispersion: Standard deviation, Variance
Measure of tailedness (Kurtosis): Right-skewed, Left-skewed
Bivariate Analysis
This means looking at relationships among two variables.The thing to take care is that while analysis we should always take mean or proportion into account rather absolute row numbers.
Types of variables:
Continuous variable: A continuous variable is a specific kind of quantitative variable used in statistics to describe data that is measurable in some way. If your data deals with measuring a height, weight, or time, then you have a continuous variable.
Categorical variables: Categorical variables contain a finite number of categories or distinct groups. Categorical data might not have a logical order. For example, categorical predictors include gender, material type, and payment method.
Univariate Visualisation
Now let’s see some code in action for step by step EDA :
The following code snippet will give us the five-point summary for all the continuous variables:
Visualising Mean , median and mode
BI-Variate Visualization
Thanks!