Exploring Popular Normalization Techniques: CRISP-DM Data Preparation

Data Mastery Series — Episode 8: Normalization Techniques

Donato_TH

Follow

Published in

Donato Story

3 min readMar 12, 2023

--

If you are interested in articles related to my experience, please feel free to contact me: linkedin.com/in/nattapong-thanngam

Normalization is a fundamental data preprocessing technique in data science that aims to transform data into a common scale or range. This technique is widely used to improve the accuracy and performance of machine learning algorithms, as well as to facilitate the interpretation of data. Normalization ensures that the data is more comparable and reduces the impact of outliers or extreme values. Different normalization methods have their strengths and weaknesses, and choosing the appropriate method depends on the nature of the data and the task at hand.

Data Set:

The dataset comprises 10,000 randomly generated samples (Gamma distribution with a shape parameter of 3 and a scale parameter of 3)

7 Popular Normalization Techniques

StandardScaler Method

The StandardScaler method, also known as Z-score normalization or Standardization, scales the data to have a mean of 0 and a standard deviation of 1

2. Yeo-Johnson Transformation

The Yeo-Johnson transformation is a more robust version of the Box-Cox transformation that can handle both positive and negative values. This method works by applying a power transformation to the data that is optimized using maximum likelihood estimation. It is a method for transforming non-normal data into normal data

Yeo-Johnson Transformation (Image by Author)

3. Min-Max Scaler

The Min-Max Scaler, also known as Linear normalization or Scaling to a range, is a method for scaling data to a fixed range of values, typically between 0 and 1.

4. Robust Scaler

The Robust Scaler method, also known as the Median-MAD method or median and IQR normalization, scales the data to have a median of 0 and a median absolute deviation of 1 (scales the data to the median and interquartile range).

5. Max Absolute Scaler

The Max Absolute Scaler method scales the data so that the maximum absolute value of each feature is 1.

6. Log Transformation

The Log Transformation applies a logarithmic function to the data to compress the range of values.

7. Root Transformation

Root transformation is a type of data transformation method that involves taking the nth root of each value in a dataset

Summary:

Normalization methods play an important role in data preparation and feature engineering. Selecting the right method depends on the characteristics of the data and the goals of the analysis. Understanding the pros, cons, and limitations of each method can help in making an informed decision.

Please feel free to contact me, I am willing to share and exchange on topics related to Data Science and Supply Chain.
Facebook: facebook.com/nattapong.thanngam
Linkedin: linkedin.com/in/nattapong-thanngam

Exploring Popular Normalization Techniques: CRISP-DM Data Preparation

Data Mastery Series — Episode 8: Normalization Techniques

Data Set:

7 Popular Normalization Techniques

Summary:

Written by Donato_TH