What is the difference between Standardization & Normalization?
Scaling is one of the data preprocessing stage. In data preprocessing the following steps are as such:
- Dealing with outlier
- Dealing with null value
- Scaling the data
- Labeling/encoding of data in case of binary
- Dimensionality Reduction
Scaling the data one of these important technique. It is required because if there is certain number of independent variables having the different numeric range, they will be effected during computation & calculation of the features.
Here range of X1 & X2 are different. When they will go through calculation there is a chance of X1 getting neglected by X2, because value of X1 is too small compared to X2. So scaling is required for both the training set & test set. We can move forward in two ways.
- Standardization: It is a process where the data is restructured. It is not effected by outliers. We use standard scaler for this. We need to calculate the mean & standard deviation for this. The formula of standardization is :
Z=new standardized the value, x= sample data, µ=mean of population data
σ=standard deviation of x variable.
The value ranges from -∞ to +∞.
It is also called Z score normalization in statistics. It is not effected by outliers. For the coding part the following are as such:
from sklearn.preprocessing import StandardScaler .....importing the scaling library
scaler=StandardScaler()
Scaled_X_train=scaler.fit_transform(X_train) ....scaling X train
Scaled_X_test=scaler.transform(X_test) ....scaling X test
2. Normalization: It is a process of scaling the data where there is no outliers and we are not sure about the distribution of data. The formula of Normalization is following as:
The data varies from 0 to 1. So if there is a outlier, that may be the max or min value which will lead us to miscalculation. For the coding part the following are as such:
from sklearn.preprocessing import MinMaxScaler
mmscaler = MinMaxScaler()
Scaled_X_train = mmscaler.fit_transform(X_train)
Scaled_X_test = mmscaler.transform(X_test)