Feature Scaling — 4th Step of Pre-Processing Data!

Zeros & Ones!
3 min readFeb 27, 2023

--

Before we understand what feature scaling is lets see an example:

So, we have the following data:

Now, if we want to see how this two variables are moving, we can make a line chart. Right!! We know that age is increasing and with that, some where the salary is increasing and at other ages, it is decreasing. Let me draw a simple line chart in excel.

Now now, the values of salary is soooo big that the age looks like a flat line — a constant line at 0. This can’t be so. If I have to interpret whether age has any impact on the salary, I would interpret it as NO. NONE. According to me, age is not important and is insignificant for the study.

Imagine if there are other variables like distance from office to home or work-life satisfaction level, the graph would given more constant or straight lines and just one correctly moving graph of the salary.

All my interpretation may be wrong. And to tackle this issue, we use feature scaling. It is very important to scale the data. Or one feature will outshine and will affect the analysis of the data.

What are ways of feature scaling:

  • Normalization
  • Standardization

Normalization

This is a process of feature scaling when the data doesn’t have any outlier. The formulae for finding the normalized value is::

= (Actual Value — Minimum Value)/(Maximum Value — Minimum Value)

One important thing to note here is that the normalized data ranges from 0 to 1.

Now, let me normalize the age and salary and build the graph:

We see that even though salary value was very large, the values of age is also equally considered. We see both the variables moving ahead.

Standardization

This is a process of feature scaling when the data has outliers. The formulae for finding the standardized value is::

= (Actual Value — Mean)/ Standard Deviation

One important thing to note here is that the standardized data ranges from — 3 to 3.

Now, let me standardize the age and salary and build the graph:

The difference between raw data and normalized/standardized data is very clear. It is mandatory to scale the data when we are doing regression problems.

Data used for the above graph:::

Do try it yourself, its thrilling to see why we need to do feature scaling and how it affects the interpretation.

--

--