# Feature Scaling: Normalization vs Standardization

Feature scaling is one of the most important data preprocessing steps in machine learning. Algorithms that compute the distance between the features are biased towards numerically larger values if the data is not scaled. Tree-based algorithms are fairly insensitive to the scale of the features. Also, feature scaling helps machine learning, and deep learning algorithms train and converge faster.

# Why Feature Scaling?

If we have a dataset, which has two features namely height and weight. There will be two measures magnitude and Unit. The unit represents the form of the value like kg, gm, mg etc. Magnitude represents the exact value. Basically, machine learning algorithms use euclidean distance to predict values. If the magnitude is very high, then it will take more time to analyze the value. So, we need feature scaling in these situations. There are two major types Normalization and Standardization.

# Normalization:

Normalization (also called, **Min-Max normalization**) is a scaling technique such that when it is applied the features will be rescaled so that the data will fall in the *range of [0,1]*

The normalized form of each feature can be calculated as follows:

* X’ = (X — X_min)/(X_max — X_min)*

# Standardization:

Standardization (or **Z-score normalization**) is the process of rescaling the features so that they’ll have the properties of a Gaussian distribution with

*μ*=0 and σ=1

where *μ* is the mean and *σ* is the standard deviation from the mean; standard scores (also called ** z** scores) of the samples are calculated as follows:

` `* Z = (X - μ)/σ*

## Which one would you choose?

For example, in clustering analyses, standardization may be especially crucial in order to compare similarities between features based on certain distance measures. However, this doesn’t mean that normalization is not useful at all! A popular application is image processing, where pixel intensities have to be normalized to fit within a certain range (i.e., 0 to 255 for the RGB color range). Also, a typical neural network algorithm requires data on a 0–1 scale.

If you would like to see code snippets of these concepts. Just jump into my notebook here. Keep Exploring!