Standardization VS Normalization

Difference between Standardization and Normalization , formula, when to use.

Yash Singh

--

Introduction:

Standardization and Normalization both are feature-scaling techniques that are used to convert the data into a shortened range to achieve faster convergence and improve accuracy of the distance based algorithms such as KNN, SVM etc. as the features may a huge numerical difference between them.

For example: The age of a person and their salary will have a significant numerical difference which may impact the performance of our model. To resolve this issue, we can perform feature scaling.

Standardization:

Standardization is a Linear Transformation technique, that allows us to transform the data in a uniform manner by making the:

Mean of the feature = 0 and Standard deviation = 1

It is also known as Z-Score

Formula

Standardization can be implemented by using the sklearn library as: from sklearn.preprocessing import StandardScaler

When to use Standardization:

We can use Standardization when our data follows Normal or Gaussian Distribution. In other words, the graph of the features makes a bell shaped curve.

Gaussian Distribution

One thing to remember here is that Standardization DOES NOT makes a distribution Normal or Gaussian Distribution.

Normalization:

Normalization is a scaling technique that is used to bring all the data points in the range of 0 to 1.

Normalization Formula

Normalization can be implemented by using the sklearn library as: from sklearn.preprocessing import MinMaxScaler

When to use Normalization :

Normalization can be used when our data does not follows Gaussian or Normal Distribution instead the graph makes a shape as:

One thing to remember here is that normalization is heavily influenced by the outliers in the data. Therefore the data should be treated for outliers before performing normalization.

--

--