07] Standardization and Normalization Techniques in Machine Learning: StandardScaler(), MinMaxScaler(), Normalizer()&RobustScaler()

Vinod Kumar G R
8 min readJan 6, 2024

--

Data is rarely perfect, and it often comes in various shapes and forms, with values that span different scales and ranges. Ensuring that your data is in the right form can make all the difference when training machine learning models. This is where standardization and normalization come into play, offering strategies to prepare your data for the most optimal model performance.

In this article, we will explore these techniques, their differences, and the scenarios where each is best applied. Whether you’re dealing with feature scaling in the broader context or looking to understand how to make your data machine-learning-ready, the insights you gain here will be invaluable.

In the last article, we discussed Feature scaling and different types of machine learning, and we took a deep dive into the topic it provided a solid foundation in understanding the fundamental concepts clearly.

I’ll give you a simple example of when we use these scaling methods,

Suppose you are dealing with an image dataset, you have data of image pixels and it contains pixel values from 0–255. Where 255 is a larger number, so you use scaling methods to scale the data and it will fall into a common range.

Now let’s get into the topic,

1. Standardization

Standardization, also known as Z-score scaling or zero-mean scaling, is a common method used in data preprocessing to scale and center features in machine learning. This method transforms the data in a way that makes it suitable for algorithms that assume a standard normal distribution. Standardization makes the data more Gaussian-like, which is useful for some machine-learning algorithms.

Note: This Standardization doesn’t scale data to range(0,1) instead it scales the data to have a mean of 0 and standard deviation of 1. [You understand this line clearly when we discuss the normalization topic]

The mathematical formula for standardization:

x' = (x - mean(x)) / std(x)

where

  • x is the original feature.
  • x’ is the scaled feature.
  • mean(x) represents the mean (average) of all data points (features) in the dataset.
  • std(x) represents the standard deviation of all data points (features) in the dataset.

Explanation

  1. Calculate Mean and Standard Deviation: For each feature, you calculate the mean (average) and standard deviation. These statistics are used to determine the center and the spread of the data.
  2. Subtract the Mean: You subtract the mean of each feature from every data point. This operation centers the data, making the new mean of the feature 0.
  3. Divide by the Standard Deviation: You divide each data point by the standard deviation of the feature. This scaling operation makes the standard deviation of the feature 1.

You might have doubts, about where you need to use this standardization technique. Standardization is beneficial when you have data with varying spreads and want to make it suitable for machine learning algorithms that assume a standard normal distribution or are sensitive to feature scales. It transforms the data to have a mean of 0 and a standard deviation of 1, but it doesn’t necessarily constrain the data to the range (0, 1). The standardized values can have both positive and negative values, and the specific range depends on the characteristics of the original data.

I got the image from Google, So you can see the top data graph in the image, it is both right-skewed (which means most of the data points fall on the right side of the graph) and left-skewed (which means most of the data points fall on the left side of the graph). You can clearly see the scale is around 100 and 200. As there is a high numerical value, it is challenging for the model since it needs to capture the important patterns in the widely spread data.

When you apply the standardization, it scales the data to have a mean of 0 and a standard deviation of 1, which brings the data to the center of the graph. You can see the bottom image, the data falls at the center of the graph. This Standardization process will improve the training process of the model.

Practical Implementation

Official webpage of sklearn.preprocessing.StandardScalar()

"""
This is the basic code for standard scaling implementation
"""

# import the StandardScaler library from sklearn
from sklearn.preprocessing import StandardScaler

# load Sample data
data = [[1.0], [2.0], [3.0], [4.0], [5.0]]

# Create a StandardScaler instance
scaler = StandardScaler()

# Fit and Transform the scaler to the data and transform the data
scaled_data = scaler.fit_transform(data)

# Print the scaled data
print(scaled_data)

I have written code in the colab notebook that I have mentioned below.

You can see the practical implementation of the standardization in the above-given colab link. So go through this colab notebook once and if you have any questions I’ll provide my email at the end of this article ping me once I’ll try to clarify.

2. Normalization

Let me define, what is normalization.

Normalization is the process of transforming the features (variables) in a dataset to a common scale, typically within the range of (0, 1) or (-1, 1). The objective of normalization is to ensure that all features have similar scales, which helps prevent certain features from dominating the modeling process due to their larger numerical values.

We have already discussed that it will help in rescaling the data into a common scale in a range of (0, 1) or (-1, 1).

We have seen why feature scaling is important for machine learning in a previous article. If you haven’t read the previous article, please go through the Feature Scaling in Machine Learning article.

I got this image from Google, as you can see the plots before scaling(which is the actual data) and after applying normalization and standardization. When you apply the normalization, the data fall into a range of (0,1).

Above, in standardization content, I have mentioned a note. On that note, I mentioned, “Standard_scalar will not scale data into a range of (0,1) instead it will scale the data to have a mean of 0 and standard deviation of 1.”
This is the main difference between standardization and normalization.

Different Methods in Normalization

Yes, in normalization there are mainly 4 different methods of Normalization and they are:

  1. Min-Max scaling
  2. Mean normalization scaling
  3. Max-absolute scaling
  4. Robust scaling(Uses IQR method)

These are the several methods to normalize data, and the choice of method depends on the characteristics of your dataset and the requirements of your modeling task.

1. Min-Max Scaling

Min-Max scaling is also known as Min-Max normalization, transforms data into a specific range, often [0, 1] or [-1, 1]. It rescales the data to ensure that the minimum value maps to 0, and the maximum value scales to 1 (or -1 if using the [-1, 1] range).

Mathematical Formula

For [0, 1] range:

x_normalized = (x - min(x)) / (max(x) - min(x))

For [-1, 1] range:

x_normalized = 2 * ((x - min(x)) / (max(x) - min(x))) - 1

Advantages:

  • Simple and intuitive method.
  • Preserves the relationships between data points.
  • Suitable for algorithms that assume data within a bounded range.

Disadvantages:

  • Sensitive to outliers, as they can affect the range of the scaling.
  • May not work well with data that does not have clear boundaries.

2. Mean Normalization Scaling

Mean normalization is also known as Z-Score normalization or Standardization, transforms data to have a mean of 0 and a standard deviation of 1. It is particularly useful when dealing with data that follows a Gaussian distribution.

Mathematical Formula:

x_normalized = (x - mean(x)) / std(x)

Advantages:

  • Makes data compatible with algorithms that assume a standard normal distribution.
  • Reduces sensitivity to outliers.
  • Preserves the relative distances between data points.

Disadvantages:

  • Data may not be bounded within a specific range.
  • Not suitable for data that does not follow a normal distribution.

3. Max-Absolute Scaling

Max-absolute scaling scales data to the [-1, 1] range by dividing each data point by the maximum absolute value in the dataset.

Mathematical Formula

x_normalized = x / max(|x|)

Advantages:

  • Preserves the relative distances between data points.
  • Suitable for data with unknown or varying ranges.
  • Robust against outliers.

Disadvantages:

  • Data is not centered around 0.
  • Sensitivity to negative values may be an issue in some cases.

4. Robust Scaling (Using IQR Method)

Definition: Robust scaling, often referred to as IQR scaling, is a method that scales data using the Interquartile Range (IQR). It is robust to outliers as it uses the middle 50% of the data.

Mathematical Formula

x_normalized = (x - median(x)) / IQR(x)

Advantages:

  • Robust to outliers, as it focuses on the central portion of the data.
  • Preserves the relative distances between data points.
  • Suitable for data with extreme values.

Disadvantages:

  • Data is not bounded within a specific range.
  • May not be as effective with data that does not have a central tendency.

In the formula,
x represents the original data point,
min(x) is the minimum value in the dataset, and
max(x) is the maximum value in the dataset.

Practical implementation

Official Webpage of sklearn.preprocessing.Minmaxscaler()

Official Webpage for sklearn.preprocessing.Normalizer()

Official Webpage for sklearn.preprocessing.MaxAbsoluteScaler()

Official Webpage for sklearn.preprocessing.RobustScaler()

"""
This is the basic code for MinMaxScaler implementation
"""

# import the MinMaxScaler library from sklearn
from sklearn.preprocessing import MinMaxScaler
from sklearn.preprocessing import Normalizer
from sklearn.preprocessing import MaxAbsScaler
from sklearn.preprocessing import RobustScaler

# load Sample data
data = [[1.0], [2.0], [3.0], [4.0], [5.0]]

# Create a MinMaxScaler instance
min_max_scaler = MinMaxScaler()
normalize_scaler = Normalizer()
max_abs_scaler = MaxAbsScaler()
robust_Scaler = RobustScaler()

# Fit and Transform the scaler to the data and transform the data
scaled_data = min_max_scaler.fit_transform(data)

# Print the scaled data
print(scaled_data)

This is the simple code to implement the normalization methods, now take a sample dataset and apply the normalization methods, and let’s see the insights in the data.

Go through with this Colab notebook, I have explained this normalization method with an example dataset. If you have any queries just write a mail(mentioned below), and I’ll try to respond.

That’s it for today's topic, we’ll discuss another topic in the next articles.

……………………………………TO BE CONTINUED………………………………..

Thank you for taking the time to read this article. I hope it has provided you with valuable insights into the world of feature scaling and how it can be used to enhance the performance of machine learning models. I’m excited to share these hands-on insights and make the content more engaging. Stay tuned for upcoming articles.

Previous article: 6. Feature Scaling and different scaling methods in Machine Learning

Next article: 8. Data Encoding In Machine Learning

**************************************************************************

Youtube channel link:

My LinkedIn Account:

Vinod Kumar G R

And if you have any queries reach me out :

Email

*************************************************************************

Keep learning……🖤

ALL LOVE NO HATE……🖤

Thank you……

--

--