Exploring Popular Normalization Techniques: CRISP-DM Data Preparation
Data Mastery Series — Episode 8: Normalization Techniques
If you are interested in articles related to my experience, please feel free to contact me: linkedin.com/in/nattapong-thanngam
Normalization is a fundamental data preprocessing technique in data science that aims to transform data into a common scale or range. This technique is widely used to improve the accuracy and performance of machine learning algorithms, as well as to facilitate the interpretation of data. Normalization ensures that the data is more comparable and reduces the impact of outliers or extreme values. Different normalization methods have their strengths and weaknesses, and choosing the appropriate method depends on the nature of the data and the task at hand.
Data Set:
- The dataset comprises 10,000 randomly generated samples (Gamma distribution with a shape parameter of 3 and a scale parameter of 3)
7 Popular Normalization Techniques
- StandardScaler Method
- The StandardScaler method, also known as Z-score normalization or Standardization, scales the data to have a mean of 0 and a standard deviation of 1
2. Yeo-Johnson Transformation
- The Yeo-Johnson transformation is a more robust version of the Box-Cox transformation that can handle both positive and negative values. This method works by applying a power transformation to the data that is optimized using maximum likelihood estimation. It is a method for transforming non-normal data into normal data
3. Min-Max Scaler
- The Min-Max Scaler, also known as Linear normalization or Scaling to a range, is a method for scaling data to a fixed range of values, typically between 0 and 1.
4. Robust Scaler
- The Robust Scaler method, also known as the Median-MAD method or median and IQR normalization, scales the data to have a median of 0 and a median absolute deviation of 1 (scales the data to the median and interquartile range).
5. Max Absolute Scaler
- The Max Absolute Scaler method scales the data so that the maximum absolute value of each feature is 1.
6. Log Transformation
- The Log Transformation applies a logarithmic function to the data to compress the range of values.
7. Root Transformation
- Root transformation is a type of data transformation method that involves taking the nth root of each value in a dataset
Summary:
Normalization methods play an important role in data preparation and feature engineering. Selecting the right method depends on the characteristics of the data and the goals of the analysis. Understanding the pros, cons, and limitations of each method can help in making an informed decision.
Please feel free to contact me, I am willing to share and exchange on topics related to Data Science and Supply Chain.
Facebook: facebook.com/nattapong.thanngam
Linkedin: linkedin.com/in/nattapong-thanngam