Member-only story
Data Exploration and Analysis Using Python
Simple ways to make your data talk
Data exploration is a key aspect of data analysis and model building. Without spending significant time on understanding the data and its patterns one cannot expect to build efficient predictive models. Data exploration takes major chunk of time in a data science project comprising of data cleaning and preprocessing.
In this article, I will explain the various steps involved in data exploration through simple explanations and Python code snippets. The key steps involved in data exploration are:
> Load data
> Identify variables
> Variable analysis
> Handling missing values
> Handling outliers
> Feature engineering
Load data and Identify variables:
Data sources can vary from databases to websites. Data sourced is known as raw data. Raw data cannot be directly used for model building, as it will be inconsistent and not suitable for prediction. It has to be treated for anomalies and missing values. Variable can be of different types such as character, numeric, categorical, and continuous.
Identifying the predictor and target variable is also a key step in model building. Target is the dependent variable and predictor is the independent variable based on which the prediction is made…