TDS Archive

An archive of data science, data analytics, data engineering, machine learning, and artificial intelligence writing from the former Towards Data Science Medium publication.

Member-only story

3 Common Techniques for Data Transformation

7 min readAug 1, 2021

--

Data Transformation Overview (image by author from www.visual-design.net)

Data transformation is the process of converting raw data into a format or structure that would be more suitable for model building and also data discovery in general. It is an imperative step in feature engineering that facilitates discovering insights. This article will cover techniques of numeric data transformation: log transformation, clipping methods, and data scaling.

Why need data transformation?

  • the algorithm is more likely to be biased when the data distribution is skewed
  • transforming data into the same scale allows the algorithm to compare the relative relationship between data points better

When to apply data transformation

When implementing supervised algorithms, training data and testing data need to be transformed in the same way. This is usually achieved by feeding the training dataset to building the data transformation algorithm and then apply that algorithm to the test set.

Basic Feature Engineering and EDA

For this exercise, I am using the Marketing Analytics dataset from Kaggle.

--

--

TDS Archive
TDS Archive

Published in TDS Archive

An archive of data science, data analytics, data engineering, machine learning, and artificial intelligence writing from the former Towards Data Science Medium publication.

Destin Gong
Destin Gong

Written by Destin Gong

On my way to become a data storyteller | Website: www.visual-design.net

Responses (2)