Feature Scaling and Normalization: Do They Matter to Machine Learning Algorithm?
What is Z-score Normalization?
The result of Z-score normalization is that the features will be rescaled so that they’ll have the properties of a standard normal distribution with:
where μ is the mean and σ is the standard deviation from the mean; z scores of the samples are calculated as follows:
In summary, normalizaing the features so that they are centered around 0 with a standard deviation of 1 is not only important if we are comparing measurements that have different units, but it is also a general requirement for many machine learning algorithms. Intuitively, we can think of gradient descent as a prominent example (an optimization algorithm often used in logistic regression, SVMs and neural networks); with features being on different scales, certain weights may update faster than others since the feature values play a role in the weight updates
Where η is the learning rate, t is the target class label, and o is the actual output. Other intuitive examples include K-Nearest Neighbour (KNN) algorithms and clustering algorithms that use, for example, Euclidean distance measures.