09] Data Transformations in ML: Different transformations in Machine Learning: Log Transformer, Reciprocal Transformer, Square Transformer, Square Root Transformer, Box-Cox Transformer, Yeo Johnson Transformer, Right-skewed and Left-skewed data.

Vinod Kumar G R
9 min readJan 20, 2024

--

As I mentioned earlier we cannot feed the data as it is, we need to convert or transform the data into a machine-readable format. There are several ways to transform the data, and this Function transformer is one of the transformation techniques.

So let’s dive into today’s topic,

What is a Data Transformation?

Data Transformations are like backstage artists preparing the actors (features) for the grand performance (model training). They are special tools that apply a user-defined function to each column of the dataset, performing custom transformations on the data. This process ensures that the data is not just fed to the model but is crafted and tailored to enhance its predictive prowess.
In short, it is a method that we use to transform the data by using some mathematical or custom functions so that your model will get better data for training.

Why do we need to use transformation on our data?

You need to know why we need to apply this transformation. You need to know about Normal distribution. What is normal distribution?
The normal distribution, also known as the Gaussian distribution, is a continuous probability distribution that is widely used in Machine Learning and statistical modeling. It is a bell-shaped curve that is symmetrical around its mean and is characterized by its mean and standard deviation.

Source: Google

The above image says the data is distributed normally and sometimes data can be skewed to either left to to right also. In a right-skewed data scenario, more data points fall to the right side and in a left-skewed, data points fall towards the left.

Source: Google

In the above image, the first graph is right-skewed data which is a positive skew and the last one is left-skewed data which is a negative skew. The center one is a normal distribution.

Let’s understand Right-skewed data:

  • Mean: The mean is usually greater than the median due to the influence of the longer right tail.
  • Median: The median is the middle value when the data is ordered. In a right-skewed distribution, it is less than the mean.
  • Mode: The mode is the most frequent value. It tends to be on the left side, where the majority of the data is concentrated.

And now Left-skewed data:

  • Mean: The mean is usually less than the median due to the influence of the longer left tail.
  • Median: The median is still the middle value when the data is ordered. In a left-skewed distribution, it is greater than the mean.
  • Mode: The mode is still the most frequent value but now tends to be on the right side, where the majority of the data is concentrated.

Maybe this skewed data is not good for model training, why? Machines learn from patterns, and extreme values can mess up their understanding. We want to make sure the model sees the full story, not just a part of it. So, we often tweak the data to make it more balanced and fair, helping the machine learn better. So these transformations will help you achieve balanced data.

Data Transformations are of two types:

  1. Function Transformation:
    It is a method that we use to transform the data by using some mathematical functions.
  2. Power Transformations:
    It is a method that involves raising each data point to a certain power.

Different techniques used in Function Transformation:

  1. Log Transformer.
  2. Reciprocal Transformer.
  3. Square Transformer.
  4. Square Root Transformer.
  5. Log Transformer:

The log transformation is a mathematical operation applied to each data point in a dataset using the natural logarithm (base ‘e’). The natural logarithm of a number x is denoted as ln(x) or log_e(x).

Why Log Transformation?

The purpose of log transformation is to reduce the impact of extreme values and make the data more interpretable and suitable for certain types of analyses or modeling.

Mathematical Formula:

A mathematical formula for log transformer is,

y = log_e​(x)

In this formula:

  • y is the transformed value after applying the logarithm.
  • x is the original value.
  • log_e​ denotes the natural logarithm, which has the base e.
  • e is the mathematical constant approximately equal to 2.71828

Remember these points:

  • This transformation makes our data close to a normal distribution but not able to exactly abide by a normal distribution.
  • This transformation is not applied to those features which have negative values.
  • This transformation is mostly applied to right-skewed data.

Simple code:

import numpy as np
import matplotlib.pyplot as plt

# Generate example data
original_data = np.linspace(1, 100000, 100)

# Apply log transformation
log_transformed_data = np.log(original_data)

# Plot original and log-transformed data
plt.figure(figsize=(10, 4))

plt.subplot(1, 2, 1)
plt.scatter(range(len(original_data)), original_data)
plt.title('Original Data')

plt.subplot(1, 2, 2)
plt.scatter(range(len(log_transformed_data)), log_transformed_data)
plt.title('Log-Transformed Data')

plt.show()

You can use the above code to apply log transformation and plot the resultant graph of features before and after applying the transformation.

I have written the code in detail and explained each line of code. The colab file link is attached at the end of the article.

2. Reciprocal Transformation

It is another technique of transformation that involves converting the non-zero data points(x) into their reciprocal(1/x).

Why Reciprocal Transformation:

  • If a feature contains large values, taking the reciprocal can help scale them down.
  • In a right-skewed distribution where most values are small, taking the reciprocal can spread out the small values and bring them closer together.

Mathematical Formula:

The reciprocal y of a non-zero number x is given by:

y = 1/x

  • x is the original value.
  • y is the reciprocal of x.

If x is a small positive number, its reciprocal will be a larger value, and vice versa.

Remember these points:

  • This transformation is not defined for zero.
  • This transformation reverses the order among values of the same sign, if x is a small positive number, its reciprocal will be a larger value, and vice versa.

Simple Code:

import numpy as np

# Original data
data = np.array([2, 4, 8, 16, 32])

# Reciprocal transformation
reciprocal_data = np.reciprocal(data)

# Print the results
print("Original Data:", data)
print("Reciprocal Transformation:", reciprocal_data)

#Output
Original Data: [ 2 4 8 16 32]
Reciprocal Transformation: [0.5 0.25 0.125 0.0625 0.03125]

3. Square Transformation:

This technique involves in converting the data points(x) into their square().

Why Square Transformation:

By squaring the values, the transformation can spread out the data, reducing the skewness and potentially achieving a more symmetrical distribution.

Formula:

y=(x²)

where y is a squared result and x is the original data point.

Remember these points:

  • This technique is applied to the left-skewed data.
  • When you apply this technique, squaring a negative value will result in a positive value.
  • However, it’s important to note that, the squaring of a smaller negative value results in a higher positive value. Eg: (-5)**2 -> 25, (50)**2 -> 2500.

Simple Code:

import numpy as np

# Original data
data = np.array([-3, -2, -1, 0, 1, 2, 3])

# Squaring transformation
squared_data = np.square(data)

# Displaying the results
print("Original Data:", data)
print("Squared Data:", squared_data)

#Output
Original Data: [-3 -2 -1 0 1 2 3]
Squared Data: [9 4 1 0 1 4 9]

4. Square Root Transformation:

This technique involves taking the square root of each value in the data, i.e., applying the formula (sqrt(x)).

Why Square Root Transformation:

This transformation is beneficial when dealing with right-skewed data or a few extreme outliers. Taking the square root can help compress the larger values and make the distribution more symmetric.

Formula:

y = sqrt(x)

Remember these points:

  • This transformation is defined only for positive numbers.
  • This transformation is weaker than Log Transformation.
  • This can be used to reduce the skewness of right-skewed data.

Simple Code:

import numpy as np

# Original data points
data_points = np.array([4, 9, 16, 25, 36])

# Square root transformation
transformed_data = np.sqrt(data_points)

# Display original and transformed data
print("Original Data Points:", data_points)
print("Transformed Data Points (Square Root):", transformed_data)

#Output
Original Data Points: [ 4 9 16 25 36]
Transformed Data Points (Square Root): [2. 3. 4. 5. 6]

Here is the colab link for a detailed code explanation for Function Transformation, go through with this and if you have any queries, let me know in the responses.

Different Power Transformations:

  1. Box-Cox Transformer.
  2. Yeo Johnson Transformer.

1. Box-Cox Transformer:

The Box-Cox transformation is a statistical technique used in machine learning to stabilize the variance and make the data more closely approximate a normal distribution.

Why Box-Cox Transformer:

This transformation is particularly useful when dealing with data that exhibits heteroscedasticity (unequal variance across different levels of the independent variable) or non-constant spread.

Formula:

Source: Google
  • y represents the original variable.
  • λ is the transformation parameter.
  • The transformation is applied for different values of λ (excluding λ=0), and the optimal value is chosen to maximize the normality of the resulting distribution.

When λ=0, the transformation simplifies to the natural logarithm of y. The choice of λ is crucial, and it is typically determined through optimization techniques to maximize the goodness of fit or log-likelihood.

The Box-Cox transformation is a generalization that includes square root, logarithmic, and reciprocal transformations as special cases.

Simple Code:

import numpy as np
from scipy.stats import boxcox

# Generate 10 random positive numbers
original_data = np.random.exponential(size=10)

# Apply Box-Cox transformation
transformed_data, lambda_value = boxcox(original_data)

# Display the original and transformed data
print("Original Data:")
print(original_data)
print("\nTransformed Data:")
print(transformed_data)
print("\nOptimal Lambda Value:", lambda_value)

#Output
Original Data:
[0.89621983 0.0483009 0.51123247 0.13397667 0.00573868 0.36838719
0.24917142 1.01316943 0.41818676 0.25416257]

Transformed Data:
[ 0.61002597 -2.02849658 -0.11039456 -1.09107594 -4.06239837 -0.5731429
-1.45740922 0.78800485 -0.03594684 -1.43746666]

Optimal Lambda Value: 0.2522738826462975

2. Yeo-Johnson Transformer:

The Yeo-Johnson transformation is an extension of the Box-Cox transformation, designed to handle both positive and negative values in the dataset. Similar to Box-Cox, the Yeo-Johnson transformation aims to stabilize variance, make the data more symmetric, and bring it closer to a normal distribution.

Why Yeo-Johnson transformation?

  • Unlike Box-Cox, Yeo-Johnson can handle datasets with both positive and negative values. It adapts its transformation based on the sign of the data.
  • Similar to Box-Cox, the Yeo-Johnson transformation searches for the optimal λ value that maximizes the normality of the transformed data.

Formula:

Source: Google
  • y is the response variable.
  • λ is the transformation parameter.
  • The transformation is applied differently depending on whether y is positive, negative, or zero.
  • The search for the optimal λ involves maximizing the normality of the resulting distribution.

Simple Code:

from scipy.stats import yeojohnson
import numpy as np

# Generating example data
original_data = np.array([2, 5, 8, 12, 15, 18, 22, 25, 28, 32])

# Applying Yeo-Johnson transformation
transformed_data, lambda_value = yeojohnson(original_data)

# Displaying results
print("Original Data:", original_data)
print("Transformed Data:", transformed_data)
print(f"Optimal lambda value: {lambda_value}")

#Output
Original Data: [ 2 5 8 12 15 18 22 25 28 32]
Transformed Data: [ 1.53657772 3.66474078 5.54892469 7.99999986 9.66474105 11.2700034
13.8563896 15.35812179 16.80080805 18.32525574]
Optimal lambda value: 1.2328483983981867

Here is the colab link for a detailed code explanation of Power Transformation, go through with this and if you have any queries, let me know in the responses.

We are left out with two other important Data Transformation Techniques, we’ll cover those in another article which is Data Transformation Part 2.

In conclusion, the journey through data transformations in machine learning has revealed the versatility and significance of tailoring our data to meet the assumptions and requirements of different models. From the simplicity of the Log Transformer to the adaptability of the Box-Cox and Yeo-Johnson methods, each transformation plays a crucial role in reshaping our datasets. Whether dealing with right-skewed or left-skewed data, these transformations empower us to extract meaningful patterns and enhance the performance of our machine learning models.

Previous article: 8. Data Encoding In Machine Learning

Next article: 10. Data Transformations in Machine Learning Part 2

--

--