Techniques of Feature Scaling with SAS Custom Macro that can increase the Accuracy and Performance of Predictive Machine Learning Models

suraj saini (Amar)
Analytics Vidhya
Published in
4 min readDec 18, 2020

--

Part -1

What is Feature Scaling?

Feature scaling is a process that is used to normalize data, it is one of the most preponderant steps in data pre-processing. Feature scaling is done before feeding data into machine learning, deep learning and statistical algorithms/models. In most cases, it has been noticed that the performance of the models increases when features are scaled, especially in models that are based on Euclidian distance. Normalization and Standardization are the two main techniques of feature scaling. I am going to define and explain how we can implement different feature scaling techniques in SAS Studio or Base SAS by using SAS Macro facility.

What is Normalization?

Normalization is the process of feature scaling in which data values are rescaled or bound into two values, most commonly between (0, 1) or (-1, 1). Min_MaxScaler and Mean_Normalization are very common examples of Normalization.

1. Min_MaxScaler

It ranges /rescales the data values between 0 and 1, the mathematical formula is here.

1.1 How can you use Min_MaxScaler in SAS?

Code is available here on Github: https://github.com/Suraj-617/Blogs/blob/master/Techniques%20of%20Feature%20Scaling%20with%20SAS%20Custom%20Macro-%20A.sas

1.2 Min_MaxScaler SAS Custom Macro Definition

1.3 What does Min_MaxScaler SAS Macro do behind the scenes?

Min_MaxScaler takes the variable that you want to scale and creates a new variable “MMVariableName” with scaled values. It also creates a univariate report where you can see the histogram of both the Actual Variable and the new Scaled Variable.

2. Mean_Normalization

It rescales the data values between (-1, 1), the mathematical formula is here.

2.1 How can you use Mean_Normalization in SAS

2.2 Mean_Normalization SAS Custom Macro Definition

2.3 What does Mean_Normalization SAS Macro do behind the scenes?

Mean_Normalization takes the variable that you want to scale and creates a new variable “MNVariableName” with scaled values. It also creates a univariate report where you can see the histograms of both the Actual Variable and the new Scaled Variable.

What is Standardization?

Standardization is a technique of feature scaling in which data values are centred around the mean with 1 standard deviation, which means after the standardization, data will have a zero mean with a variance of 1.

How to Use StandardScaler and MinMaxScaler Transforms in Python

3. Standard_Scaler

It rescales the distribution of data values so that the mean of the observed value will be 0 and the standard deviation equals to 1, the mathematical formula is here.

“Standardization assumes that your observations fit a Gaussian distribution (bell curve) with a well-behaved mean and standard deviation. You can still standardize your data if this expectation is not met, but you may not get reliable results.” 3.1 Standard_Scaler in SAS

3.2 Standard_Scaler SAS Custom Macro Definition

3.3 What does Standard_Scaler SAS Custom Macro do behind the scenes?

Standard_Scaler takes the variable that you want to scale and creates a new variable “SDVariableName” with scaled values. It also creates a univariate report where you can see the histogram of both the Actual Variable and the new Scaled Variable.

4. Robust_Scaler

A Robust_Scaler converts the data values. First, by subtracting the median for the data values, then dividing by IQR, which is the Inter Quartile Range (3Quantile — 1Quantile), which means it centres the median value at zero and very robust method for outliers. The mathematical formula is here.

4.1 How can you use Robust_Scaler in SAS?

4.2 Robust_Scaler SAS Custom Macro Definition

4.3 What does Robust_Scaler SAS Custom Macro do behind the scenes?

Robust_Scaler takes the variable that you want to scale and creates a new variable “RSVariableName” with scaled values. In the work library, it will create a STAT table where you can find the Median, Quantile 1 and Quantile 3 values to verify your results. It also creates a univariate report where you can see the histograms of both the Actual Variable and the new Scaled Variable.

--

--

suraj saini (Amar)
Analytics Vidhya

SAS Certified Programming Specialist, passionate about Machine Learning, Feature Engineering and Data Science.