Why Data Normalization is necessary for Machine Learning models

Urvashi Jaitley
4 min readOct 7, 2018

Normalization is a technique often applied as part of data preparation for machine learning. The goal of normalization is to change the values of numeric columns in the dataset to a common scale, without distorting differences in the ranges of values. For machine learning, every dataset does not require normalization. It is required only when features have different ranges.

For example, consider a data set containing two features, age(x1), and income(x2). Where age ranges from 0–100, while income ranges from 0–20,000 and higher. Income is about 1,000 times larger than age and ranges from 20,000–500,000. So, these two features are in very different ranges. When we do further analysis, like multivariate linear regression, for example, the attributed income will intrinsically influence the result more due to its larger value. But this doesn’t necessarily mean it is more important as a predictor.

To explain further let's build two deep neural network models: one without using normalized data and another one with normalized data and at the end, I will compare the results of these 2 models and will show the effect of normalization on the accuracy of the models.

First Few Rows Of Original Data

Below is a Neural Network Model built using original unnormalized data:

'''Using covertype dataset from kaggle to predict forest cover type'''#Import pandas, tensorflow and kerasimport pandas as pd
from sklearn.cross_validation import train_test_split
import tensorflow as tf
from tensorflow.python.data import Dataset
import keras
from keras.utils import to_categorical
from keras import models
from keras import layers
#Read the data from csv filedf = pd.read_csv('covtype.csv')#Select predictors
x = df[df.columns[:54]]
#Target variable y = df.Cover_Type#Split data into train and test x_train, x_test, y_train, y_test = train_test_split(x, y , train_size = 0.7, random_state = 90)'''As y variable is multi class categorical variable, hence using softmax as activation function and sparse-categorical cross entropy as loss function.'''