Power Transformations for Machine Learning

2 min readApr 5, 2024

Data transformation is a crucial step in machine learning to prepare data for analysis and model training. Power transformations are a family of techniques used to address skewed data, where the distribution of values is not symmetrical. This article explores some common power transformations used in machine learning.

Why Use Power Transformations?

Reduce Skewness: When data is skewed, the majority of values cluster on one side of the distribution. Power transformations can help normalize the data by compressing the spread on the side with higher values.
Homogenize Variance: In skewed data, the variance (spread) of data points often increases with the mean (average). Power transformations can help stabilize the variance across the data range, leading to more consistent model performance.

Common Power Transformations

Log Transformation (lambda = 0):

This transformation involves taking the logarithm of each data point. It is useful when the data is highly skewed and the variance increases with the mean.

y = log(x)

Square Root Transformation (lambda = 0.5): This transformation involves taking the square root of each data point. It is useful when the data is highly skewed and the variance increases with the mean.

y = sqrt(x)

Box-Cox Transformation: This transformation is a family of power transformations more general that includes the log and square root transformations as special cases. It is useful when the data is highly skewed and the variance increases with the mean. in addition It introduces a lambda parameter that allows for more flexibility in handling different types of skewness.

y = [(x^lambda) - 1] / lambda if lambda != 0 
y = log(x) if lambda = 0

2. Yeo-Johnson Transformation: This transformation is similar to the Box-Cox transformation, but it can be applied to both positive and negative values. It is useful when the data is highly skewed and the variance increases with the mean.

y = [(|x|^lambda) - 1] / lambda if x >= 0, lambda != 0          
y = log(|x|) if x >= 0, lambda = 0         
y = -[(|x|^lambda) - 1] / lambda if x < 0, lambda != 2       
y = -log(|x|) if x < 0, lambda = 2

Power Transformation: This transformation involves raising each data point to a power. It is useful when the data is highly skewed and the variance increases with the mean. The power can be any value, and is often determined using statistical methods such as the Box-Cox or Yeo-Johnson transformations.

y = [(x^lambda) - 1] / lambda if method = "box-cox" and lambda != 0 
y = log(x) if method = "box-cox" and lambda = 0
y = [(x + 1)^lambda - 1] / lambda if method = "yeo-johnson" and x >= 0, lambda != 0    
y = log(x + 1) if method = "yeo-johnson" and x >= 0, lambda = 0        
y = [-(|x| + 1)^lambda - 1] / lambda if method = "yeo-johnson" and x < 0, lambda != 2      
y = -log(|x| + 1) if method = "yeo-johnson" and x < 0, lambda = 2

Power Transformations for Machine Learning

Why Use Power Transformations?

Common Power Transformations

To Sum up

Written by Mouad En-nasiry