## Methods of Feature Scaling with scikit-learn | Towards AI

# Feature Scaling with Python’s scikit-learn

One of the primary objectives of normalization is to bring the data close to zero. That makes the optimization problem more “numerically stable”.

Now, the scaling using mean and standard deviation assumes that the data is normally distributed, that is, most of the data is sufficiently close to the mean. So shifting the mean to zero ensures that most components of most data points are close to 0. Specifically, 68% of data would be between -1 and 1, as can be seen from the following figure:

In this post we explore 3 methods of feature scaling that are implemented in scikit-learn:

`StandardScaler`

`MinMaxScaler`

`RobustScaler`

`Normalizer`

# Standard Scaler

The `StandardScaler`

assumes your data is normally distributed within each feature and will scale them such that the distribution is now centered around 0, with a standard deviation of 1.

The mean and standard deviation are calculated for the feature and then the feature is scaled based on:

If data is not normally distributed, this is not the best scaler to use.

Let’s take a look at it in action:

In [1]:

**import** **pandas** **as** **pd**

**import** **numpy** **as** **np**

**from** **sklearn** **import** preprocessing

**import** **matplotlib**

**import** **matplotlib.pyplot** **as** **plt**

**import** **seaborn** **as** **sns**

%**matplotlib** inline

matplotlib.style.use('ggplot')

In [2]:

np.random.seed(1)

df = pd.DataFrame({

'x1': np.random.normal(0, 2, 10000),

'x2': np.random.normal(5, 3, 10000),

'x3': np.random.normal(-5, 5, 10000)

})scaler = preprocessing.StandardScaler()

scaled_df = scaler.fit_transform(df)

scaled_df = pd.DataFrame(scaled_df, columns=['x1', 'x2', 'x3'])fig, (ax1, ax2) = plt.subplots(ncols=2, figsize=(6, 5))ax1.set_title('Before Scaling')

sns.kdeplot(df['x1'], ax=ax1)

sns.kdeplot(df['x2'], ax=ax1)

sns.kdeplot(df['x3'], ax=ax1)

ax2.set_title('After Standard Scaler')

sns.kdeplot(scaled_df['x1'], ax=ax2)

sns.kdeplot(scaled_df['x2'], ax=ax2)

sns.kdeplot(scaled_df['x3'], ax=ax2)

plt.show()

All features are now on the same scale relative to one another.

# Min-Max Scaler

The `MinMaxScaler`

is probably the most famous scaling algorithm, and follows the following formula for each feature:

It essentially shrinks the range such that the range is now between 0 and 1 (or -1 to 1 if there are negative values).

This scaler works better for cases in which the standard scaler might not work so well. If the distribution is not Gaussian or the standard deviation is very small, the min-max scaler works better.

However, it is sensitive to outliers, so if there are outliers in the data, you might want to consider the `Robust Scaler`

below.

For now, let’s see the `min-max`

scaler in action

In [3]:

df = pd.DataFrame({

# positive skew

'x1': np.random.chisquare(8, 1000),

# negative skew

'x2': np.random.beta(8, 2, 1000) * 40,

# no skew

'x3': np.random.normal(50, 3, 1000)

})scaler = preprocessing.MinMaxScaler()

scaled_df = scaler.fit_transform(df)

scaled_df = pd.DataFrame(scaled_df, columns=['x1', 'x2', 'x3'])fig, (ax1, ax2) = plt.subplots(ncols=2, figsize=(6, 5))

ax1.set_title('Before Scaling')

sns.kdeplot(df['x1'], ax=ax1)

sns.kdeplot(df['x2'], ax=ax1)

sns.kdeplot(df['x3'], ax=ax1)

ax2.set_title('After Min-Max Scaling')

sns.kdeplot(scaled_df['x1'], ax=ax2)

sns.kdeplot(scaled_df['x2'], ax=ax2)

sns.kdeplot(scaled_df['x3'], ax=ax2)

plt.show()

Notice that the skewness of the distribution is maintained but the 3 distributions are brought into the same scale so that they overlap.

# Robust Scaler

The `RobustScaler`

uses a similar method to the Min-Max scaler but it instead uses the interquartile range, rather than the min-max, so that it is robust to outliers. Therefore it follows the formula:

For each feature.

Of course, this means it is using less of the data for scaling so it’s more suitable for when there are outliers in the data.

Let’s take a look at this one in action on some data with outliers

In [4]:

x = pd.DataFrame({

# Distribution with lower outliers

'x1': np.concatenate([np.random.normal(20, 1, 1000), np.random.normal(1, 1, 25)]),

# Distribution with higher outliers

'x2': np.concatenate([np.random.normal(30, 1, 1000), np.random.normal(50, 1, 25)]),

})scaler = preprocessing.RobustScaler()

robust_scaled_df = scaler.fit_transform(x)

robust_scaled_df = pd.DataFrame(robust_scaled_df, columns=['x1', 'x2'])scaler = preprocessing.MinMaxScaler()

minmax_scaled_df = scaler.fit_transform(x)

minmax_scaled_df = pd.DataFrame(minmax_scaled_df, columns=['x1', 'x2'])fig, (ax1, ax2, ax3) = plt.subplots(ncols=3, figsize=(9, 5))

ax1.set_title('Before Scaling')

sns.kdeplot(x['x1'], ax=ax1)

sns.kdeplot(x['x2'], ax=ax1)

ax2.set_title('After Robust Scaling')

sns.kdeplot(robust_scaled_df['x1'], ax=ax2)

sns.kdeplot(robust_scaled_df['x2'], ax=ax2)

ax3.set_title('After Min-Max Scaling')

sns.kdeplot(minmax_scaled_df['x1'], ax=ax3)

sns.kdeplot(minmax_scaled_df['x2'], ax=ax3)

plt.show()

Notice that after Robust scaling, the distributions are brought into the same scale and overlap, but the outliers remain outside of the bulk of the new distributions.

However, in Min-Max scaling, the two normal distributions are kept separate by the outliers that are inside the 0–1 range.

# Normalizer

The normalizer scales each value by dividing each value by its magnitude in nn-dimensional space for nn number of features.

Say your features were x, y, and z Cartesian co-ordinates your scaled value for x would be:

Each point is now within 1 unit of the origin on this Cartesian coordinate system.

In [5]:

frommpl_toolkits.mplot3dimportAxes3Ddf = pd.DataFrame({

'x1': np.random.randint(-100, 100, 1000).astype(float),

'y1': np.random.randint(-80, 80, 1000).astype(float),

'z1': np.random.randint(-150, 150, 1000).astype(float),

})scaler = preprocessing.Normalizer()

scaled_df = scaler.fit_transform(df)

scaled_df = pd.DataFrame(scaled_df, columns=df.columns)fig = plt.figure(figsize=(9, 5))

ax1 = fig.add_subplot(121, projection='3d')

ax2 = fig.add_subplot(122, projection='3d')

ax1.scatter(df['x1'], df['y1'], df['z1'])

ax2.scatter(scaled_df['x1'], scaled_df['y1'], scaled_df['z1'])

plt.show()

Note that the points are all brought within a sphere that is at most 1 away from the origin at any point. Also, the axes that were previously different scales are now all one scale.