Saksham
6 min readJun 4, 2023

Standardization or Normalization ?

Whenever someone enters into the beautiful world of Data Science, they are bombarded with a plethora of new terminologies which can be very intimidating for beginners, just like everyone else, I have also faced this issue (don’t worry, it gets easier with time) , and as such I decided to help my fellow learners ease into this transition.

There are various steps, algorithms, parameters etc. when it comes to Data Science, today our focus is going to be on the Data Transformation techniques- specifically on Standardization and Normalization. We will try to cover the basics of what they are, when should they be used, and the easiest way to implement them in your own projects. (Before we start, I would recommend you to have some basic idea about statistics as it will help you further in understanding the topic)

Standardization

Any dataset has some statistics related to it (a statistic is just a variable that can be used to describe the general properties of the distribution). The two most commonly used statistics are

  1. Mean — which “means” the average of a data set, found by adding all numbers together and then dividing the sum of the numbers by the number of numbers.
  2. Standard Deviation — which is a measure of the amount of variation or dispersion of a set of values, calculated as the square root of variance by determining each data point’s deviation relative to the mean.
Image taken from : Wikipedia (read for more details about the distribution)

A Gaussian distribution is a very interesting distribution of data which appears in a lot in nature, a special form- is a standard normal distribution where the mean is 0 and the standard deviation is 1.

We can calculate the standardized value of any variable X as follows:

where μ stands for the mean of our data and σ stands for the standard deviation of our data.

This transformation converts our data into a form which has mean 0 and standard deviation 1 as explained above.

But why to do this at all ? You might ask- well the simple reason is the fact that most Machine Learning algorithms consider something called as “weights”, let’s consider a simple example- suppose we have to calculate the house pricing of a house in a certain neighborhood, as such the house will have various variables associated with it- for example number of bedrooms, area of the house(in sq.ft.), number of hospitals nearby etc.

If we do not standardize our data before feeding it to our algorithm, it may basically ignore the number of bedrooms or hospitals and just focus on the area of the house as it will be surely more in value than the other two variables, however this is not right as both the other factors are also very important in deciding the price of the house, as such standardization, and even normalization is done so that the algorithm does not favor some particular parameter while ignoring other important parameters.

(Note: this is more needed for gradient based algorithms)

Now I hope that gives you a fair idea about “Why to standardize data”, let’s move to how to go about doing so.

Well, we use our beloved “scikit-learn” library. It contains the ‘StandardScaler’ class in the ‘sklearn.preprocessing’ module. To use it we first import the library, create an object of the class, call the fit method and …. that’s it! We’re done. Let me show you with an example —

The first column is the age of our customer and the second column is there salary for the given dataset that I have taken from this course .

Before Standardizing our data
After Standardizing our data

Here, notice one important detail, our X_standard values do not strictly lie between 0 and 1, rather they usually lie somewhere between -3 to +3 but this is not a hard and fast rule.

Now that we know why and how to do it, the last and the most important question remains- ‘when’ to do it?

Standardization is used for feature scaling when your data follows Gaussian distribution. It is most useful for:

  • Optimizing algorithms such as gradient descent
  • Clustering models or distance-based classifiers like K-Nearest Neighbors
  • High variance data ranges such as in Principle Component Analysis.

So that is pretty much all about what you need to get started with standardization, now we shift our focus to the popular technique aka normalization- the why, how and when of it.

Normalization

Unlike standardization, normalization aims to approach the problem of feature scaling slightly differently(more intuitive in my opinion)-

It is performed as follows

Image taken from here

but what does it mean? The formula transforms any value x in the dataset by taking the difference from the minimum value and then dividing by the difference between the maximum and minimum value in present in the dataset, which as a result gives the X_norm some interesting properties — since we are doing x-min(x) the value of numerator will always range between 0 to (max(x)-min(x)), dividing it by (max(x)-min(x)) ensures the value will lie between 0 and 1, which I may remind you is not the case with standardization of the data.

Interesting right, but why would we use normalization if we already have standardization?

Well two reasons-

  1. Firstly, and obviously when we need our data to strictly be between 0 and 1, normalization is the way to go.
  2. If you don’t know the distribution of your dataset, normalization is the better choice in that case as well.

The “why?” remains the same- we use normalization to bring our data to a scale such that the model does not treat any one variable unfairly as described above.

Let us now get to the ‘how’ of it, how do I use normalization in my own code?

We again take the help of the ‘scikit-learn’ library. We import the ‘Normalizer’ class from preprocessing module of ‘sklearn’ just like we did for StandardScaler.

Image before Normalization of data.
After Normalization of our data.

That’s it. In three lines of code we normalized our data using python and scikit-learn.

Notice how all the values are between 0 and 1 now? That is the power of Normalizer.

I would like to conclude by giving you some pros and cons of both of the above methods so that you can make a more informed decision as to when you should apply them if at all.

The main reason people prefer standardization is that it treats outliers way better than normalization, but on the other hand normalization allows us to go about our business without making some strong assumption about our data- so if you have less or no outliers, normalization is the way to go, otherwise prefer standardization.

I would finally take your leave by providing this chart that summarizes which algorithms require our features to be scaled.

(Note: Algorithms like Naive Bayes, Linear Discriminant Analysis, and Tree-Based models (gradient boosting, random forest, etc.) do not require feature scaling as they are not really dependent on distances )

Image taken from KDNuggets .

Conclusion

Both techniques — Standardization and Normalization are used to transform our features into values of similar scale . Standardization aims to achieve this by transforming the μ to 0 and σ to 1, whereas Normalization aims to do so by converting every value into a 0 to 1 range.

Saksham

pretty interested in ML, natural sciences and the world around me