Data Normalization in Python

Published in

Digital Studio Stream

3 min readDec 24, 2020

When working on machine learning projects, you need to properly prepare the data before feeding it into a model. One method to perform on a dataset is normalization.

What is Normalized Data?

Normalized data within statistics in most cases involves eliminating units of measurement from a dataset. As a result, this enables you to easily compare data with different scales and are measured from different sources.

Why is Normalized Data Important?

When training a machine learning model, we aim to bring the data to a common scale and so the various features are less sensitive to each other. In this case, we can utilize data normalization as a method of transforming our data, which may be of different units or scales. This allows our model to train using features that could lead to more accurate predictions.

Methods of Normalizing Data

Simple Feature Scaling

The simple feature scaling will normalize a value between -1 and 1 by dividing by the max value in the dataset.

We can implement this in python:

#importing pandas
import pandas as pd#creating dataframe our data 
df = pd.read.csv('example.csv')#function which return return of simple eq
def norm(item):
	return (item)/(item.abs().max())#apply norm function to each item in dataframe
df = df.apply(norm)
df

We can implement this using scikit-learn:

#importing maxabsscaler 
from sklearn.preprocessing import MaxAbsScaler#creating maxabsscaler object
norm = MaxAbsScaler()#applying norm to dataframe
df_norm = pd.Dataframe(norm.fit_transform(df), columns=df.columns)df_norm

Min-Max

The min-max method will scale the feature to a fixed range between 0 and 1.

We can implement this in python:

#function which return return of min-max eq
def norm(item):
	return (item - item.min())/(item.max() - item.min())#apply norm function to each item in dataframe
df = df.apply(norm)df

We can implement this using scikit-learn:

#importing minmaxscaler 
from sklearn.preprocessing import MinMaxScaler#creating minmaxscaler object
norm = MinMaxScaler()#applying norm to dataframe
df_norm = pd.Dataframe(norm.fit_transform(df), columns=df.columns)df_norm

Z-Score

The Z-Score is the measure of standard deviations between the actual value and a predicted value. In order to calculate this value, we must first know the mean value and the standard deviation.

We can implement this in python:

#function which return return of z-score eq
def z_score(item):
	return (item - item.mean())/(item.std())#apply norm function to each item in dataframe
df = df.apply(z_score)df

We can implement this using scikit-learn:

#importing standardscaler 
from sklearn.preprocessing import StandardScaler#creating standardscaler object
norm = StandardScaler()#applying norm to dataframe
df_norm = pd.Dataframe(norm.fit_transform(df), columns=df.columns)df_norm

Normalizing data allows for transforming each item to a common scale. Implementing data normalization is simple as we have shown by also utilizing scikit-learn to easily normalize without using the equations.

Linear Regression Using Scikit-Learn Library

Linear Regression is the simplest and the first machine learning algorithm one would study. In this post, I further…

medium.com

Python Packages for Data Science

Essential Packages for Data Science

medium.com

Privacy In A Big Data Economy

Big data has significant benefits but presents challenges in creating a policy that protects privacy in the big data…

medium.com

Data Normalization in Python

What is Normalized Data?

Why is Normalized Data Important?

Methods of Normalizing Data

Simple Feature Scaling

Min-Max

Z-Score

Linear Regression Using Scikit-Learn Library

Linear Regression is the simplest and the first machine learning algorithm one would study. In this post, I further…

Python Packages for Data Science

Essential Packages for Data Science

Privacy In A Big Data Economy

Big data has significant benefits but presents challenges in creating a policy that protects privacy in the big data…

Written by Jose Dominguez