Understanding Power Transformer
Power transforms are a technique for transforming numerical input or output variables to have a uniform or a Gaussian probability distribution. A power transform will make the probability distribution of a variable more Gaussian.
This is often described as removing a skew in the distribution, although more generally described as stabilizing the variance of the distribution.
Many machine learning algorithms prefer or perform better when numerical variables have a Gaussian or standard probability distribution.
Power transforms are a family of parametric, monotonic transformations that are applied to make data more Gaussian-like. This is useful for modeling issues related to heteroscedasticity (non-constant variance), or other situations where normality is desired.
Currently, Power Transformer supports the Box-Cox transform and the Yeo-Johnson transform. The optimal parameter for stabilizing variance and minimizing skewness is estimated through maximum likelihood.
Box-Cox requires input data to be strictly positive, while Yeo-Johnson supports both positive and negative data.
By default, zero-mean, unit-variance normalization is applied to the transformed data.
Notes: NaNs are treated as missing values: disregarded in ‘fit’, and maintained in ‘transform’.
Application- It can be used where the desired output is more “Gaussian” like.
Note: PowerTransformation can be used for linear-based algorithms and is not required when working with Tree-based algorithms like Decision Trees, Random Forests, etc.
Example- We can generate a sample of random Gaussian numbers and impose a skew on the distribution by calculating the exponent. The Power Transformer can then be used to automatically remove the skew from the data.
Input- In ATH, to run the function, select the numeric data column(s), and use the path:
Data Mining => Data Transformers (Tree Based) => Power Transformer (Py) to launch the function.
The user needs to specify the followings, which goes as input into the function.
Power Transform Method- ‘yeo-johnson’ works with positive and negative values and ‘box-cox’ only works with strictly positive values
Dataobjectname- Provide a name to save the model to file.
Standardize- Set to True to apply zero-mean, unit-variance normalization to the transformed output.
Output and Interpretation:
The output contains mainly two parts: new columns in the data table, and a model summary in the output table.
1. The table contains the following values
lambdas_: Lambdas Chosen for Each Column
2. The plots below show before and after power transformation.
See Also:
Apply Transformer Model (Py), Quantile Transformer(Py) on ATH LEAPS