Member-only story
3 Common Techniques for Data Transformation
How to Choose the Appropriate One for Your Data
Data transformation is the process of converting raw data into a format or structure that would be more suitable for model building and also data discovery in general. It is an imperative step in feature engineering that facilitates discovering insights. This article will cover techniques of numeric data transformation: log transformation, clipping methods, and data scaling.
Why need data transformation?
- the algorithm is more likely to be biased when the data distribution is skewed
- transforming data into the same scale allows the algorithm to compare the relative relationship between data points better
When to apply data transformation
When implementing supervised algorithms, training data and testing data need to be transformed in the same way. This is usually achieved by feeding the training dataset to building the data transformation algorithm and then apply that algorithm to the test set.
Basic Feature Engineering and EDA
For this exercise, I am using the Marketing Analytics dataset from Kaggle.