Data Normalization— A Brief Explanation

3 min readJul 29, 2023

Article level: Beginner

My clients often ask me about the specifics of certain data pre-processing methods, why they’re needed, and when to use them. I will discuss a few common (and not-so-common) preprocessing methods in a series of articles on the topic.

In this preprocessing series:

Data Standardization — A Brief Explanation — Beginner
Data Normalization — A Brief Explanation — Beginner
One-hot Encoding — A Brief Explanation — Beginner
Ordinal Encoding — A Brief Explanation — Beginner
Missing Values in Dataset Preprocessing — Intermediate
Text Tokenization and Vectorization in NLP — Intermediate

Outlier Detection in Dataset Preprocessing — Intermediate

Feature Selection in Data Preprocessing — Advanced

In this specific short writeup I will explain what Normalizing data is generally about. This article is not overly technical, but some understanding of specific terms would be helpful, so I attached a short explanation of the more complicated terminology. Give it a go, and if you need more info, just ask in the comments section!

preprocessing technique — Transforming raw data before modeling to improve performance.

rescale — Change the scale of values.

normalization — Rescaling data to a common range.

features — Input variables to a model.

convergence — Model parameters stabilizing.

gradient descent — Optimizing by taking gradient steps.

loss function — Metric to minimize in training.

regression — Models that predict continuous outcomes from relationships between features.

perceptron — Early single neuron model.

Data Normalization

The Why

Normalization is an essential preprocessing step for machine learning models. It rescales each feature to a common range, making them directly comparable. This provides two key benefits:

A) Avoiding features with greater ranges dominating the objective function — Features are measured on different scales, such as height in meters versus weight in kilograms. Features with a broad range of values can unduly influence the model, simply because their variation overwhelms more subtle effects from other features. Normalization rescales these ranges to be more uniform.

B) Helping convergence during gradient descent optimization — Optimization algorithms like gradient descent take “steps” proportional to the gradient to minimize the loss function. If features vary greatly in scale, this can cause unstable updates. Normalization helps rescale the gradients to be more uniform, improving convergence.

The How

The most common approach is min-max scaling which normalizes features to the 0–1 range using this formula:

x_normalized = (x — x_min) / (x_max — x_min)

Where x is the original value, and x_min and x_max are the min and max values. This scales proportionally to fit the 0–1 interval.

Some other normalization techniques you can check out are: Z-score standardization, Log transforms, and Quantile normalization. Additonally, you can check out my standardization intro as well.

Additional Considerations

1. Consider normalizing only on the training data to avoid test data going out of bounds — Fitting the min-max bounds only on training data prevents test set outliers from getting clipped during normalization.

2. Works well for sigmoid activation functions which expect 0–1 inputs — The sigmoid expects inputs between 0–1, so normalized inputs help activate these neuron outputs most sensitively.

3. May not help models that are scale-invariant like linear regression — Linear models don’t require normalization since adding a constant input bias term absorbs any offsets in scale.

Useful Python Code

Using the scikit-learn library’s MinMaxScaler

import numpy as np
import pandas as pd
from sklearn.preprocessing import minmax_scale

# example df of 6 rows and 3 columns
df = pd.DataFrame(np.random.randn(6, 3))
normalized = minmax_scale(df)

print(normalized)

This will output the following (actual values will vary):

[[0.28577697 0.6581599  0.        ]
 [0.         1.         0.48624898]
 [0.17229142 0.49684344 0.57570335]
 [0.37777359 0.42279875 0.69525902]
 [1.         0.         0.08697903]
 [0.46528936 0.74879511 1.        ]]

And that’s all! I will leave you with some “fun” trivia 😊

Trivia

“normalization” refers to transforming data to be more regular, standardized, and aligned to statistical assumptions — hence more “normal
Normalization has been used in machine learning since 1950s perceptron models — Early neural network pioneers like Frank Rosenblatt recognized the need for normalization as far back as the 1950s when developing the perceptron.