Feature Scaling and Normalization: Do They Matter to Machine Learning Algorithm?

T Z J Y
4 min readOct 14, 2021

What is Z-score Normalization?

The result of Z-score normalization is that the features will be rescaled so that they’ll have the properties of a standard normal distribution with:

where μ is the mean and σ is the standard deviation from the mean; z scores of the samples are calculated as follows:

In summary, normalizaing the features so that they are centered around 0 with a standard deviation of 1 is not only important if we are comparing measurements that have different units, but it is also a general requirement for many machine learning algorithms. Intuitively, we can think of gradient descent as a prominent example (an optimization algorithm often used in logistic regression, SVMs and neural networks); with features being on different scales, certain weights may update faster than others since the feature values play a role in the weight updates

Where η is the learning rate, t is the target class label, and o is the actual output. Other intuitive examples include K-Nearest Neighbour (KNN) algorithms and clustering algorithms that use, for example, Euclidean distance measures.

--

--