Member-only story

Data Exploration and Analysis Using Python

Simple ways to make your data talk

Raji Rai
Towards Data Science
6 min readJun 12, 2020

--

Data exploration is a key aspect of data analysis and model building. Without spending significant time on understanding the data and its patterns one cannot expect to build efficient predictive models. Data exploration takes major chunk of time in a data science project comprising of data cleaning and preprocessing.

In this article, I will explain the various steps involved in data exploration through simple explanations and Python code snippets. The key steps involved in data exploration are:

> Load data
> Identify variables
> Variable analysis
> Handling missing values
> Handling outliers
> Feature engineering

Load data and Identify variables:

Data sources can vary from databases to websites. Data sourced is known as raw data. Raw data cannot be directly used for model building, as it will be inconsistent and not suitable for prediction. It has to be treated for anomalies and missing values. Variable can be of different types such as character, numeric, categorical, and continuous.

Variable Type

Identifying the predictor and target variable is also a key step in model building. Target is the dependent variable and predictor is the independent variable based on which the prediction is made

--

--

Towards Data Science
Towards Data Science

Published in Towards Data Science

Your home for data science and AI. The world’s leading publication for data science, data analytics, data engineering, machine learning, and artificial intelligence professionals.

No responses yet