In-Depth ETL in Machine Learning

Published in

My Data Science Journey

Nov 6, 2021

Data extraction

We want to extract the data from the csv file, and make it usable for our experimental purposes. To do, first we create a small Python Data Manager class that will take care of parsing the csv, extract, and format any relevant data for our analysis.

Data transformation

We’ll focus on two transformation phases:

Preprocessing transformations
Analytics transformations

The idea is that we absolutely need to preprocess incoming raw data, eliminate duplicates, drop null and missing values. Furthermore, doing univariate analysis we’ll soon observe that many of the samples had out of range values for ratio variables. Typically, we need to detect and delete outliers.

With usable data, we’ll start implementing Factor Analysis to extract insightful features that best explain variance and correlation.

In-Depth ETL in Machine Learning

Data extraction

Data transformation

Preprocessing transformations

Written by Andre Vianna