Feature Transformation- Part of Feature Engineering

6 min readJul 10, 2024

Feature transformation involves changing the original features in your dataset to new representations that may be more suitable for model building. This process can improve the performance of machine learning models by making the data more understandable and manageable for algorithms.

The important transformation techniques we are going to read in this blog:

Scaling- Normalization, Standardization
Log Transformation
Box Cox Transformation
Encoding Categorical Variables
Binning
Reciprocal Transformation
Polynomial Features
Interaction features

Here are key aspects and techniques of feature transformation:

1. Scaling: Scaling helps prevent features with very large values from having undue influence over a model compared to features with smaller values, but which may be equally important as predictors.

Normalization: Rescaling features to a range, usually between 0 and 1. This is useful when you need to ensure that all features contribute equally to the distance computations in algorithms like KNN. This process can be particularly useful when the features in your data have different units or widely varying magnitudes.

Xi normalized = (Xi-Xmin) / (Xmax-Xmin)

#Lets do an example

import pandas as pd
from sklearn.preprocessing import MinMaxScaler

# Example DataFrame with an 'age' column
data = {
    'age': [21, 30, 25, 35, 40]
}

df = pd.DataFrame(data)

# Initialize MinMaxScaler
scaler = MinMaxScaler()

# Fit and transform the 'age' column
df['scaled_age'] = scaler.fit_transform(df[['age']])

# Print the original and scaled DataFrame
print("Original DataFrame:")
print(df[['age']])
print("\nDataFrame with scaled age:")
print(df[['scaled_age']])

Standardization: Transforming features to have a mean of 0 and a standard deviation of 1. This is commonly used when the data follows a Gaussian distribution.

#Lets do an example
import numpy as np

# Assuming X is your feature matrix
X = np.array([[1, 2, 3],
              [4, 5, 6],
              [100, 500, 950]])

# Create a StandardScaler instance
scaler = StandardScaler()

# Fit the scaler on the data and transform it
X_standardized = scaler.fit_transform(X)

print("Original Data:\n", X)
print("\nStandardized Data:\n", X_standardized)

2. Log Transformation: Log transformation can help to normalize skewed data distributions, making them more normally distributed. This is especially useful for features with long tails or outliers.

#Let's do an example
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

# Sample data with right-skewed distribution
data = np.array([1, 10, 100, 1000, 10000])

# Set up the figure
plt.figure(figsize=(8, 5))

# KDE plot for log-normalized data
sns.kdeplot(data, color='green', fill=True, label='Unnormalized Data')

# Log normalize the data
log_normalized_data = np.log(data)

# Set up the figure
plt.figure(figsize=(8, 5))

# KDE plot for log-normalized data
sns.kdeplot(log_normalized_data, color='green', fill=True, label='Log-normalized Data')

# Add labels and title
plt.xlabel('Log-normalized Values')
plt.ylabel('Probability Density')
plt.title('Kernel Density Estimation (KDE) Plot')

# Add a legend
plt.legend()

3. Box-Cox Transformation: A family of power transformations that can stabilize variance and make data more normally distributed. It is more flexible than log transformation.

λ=0: The natural logarithm transformation (ln⁡(y)).
λ=1: No transformation applied (y).
λ=−1: The reciprocal transformation (1/y).
λ=0.5: The square root transformation (sqrt{y}).

import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import boxcox, boxcox_normmax
from scipy.special import inv_boxcox

# Generate example data
data = np.random.exponential(scale=2, size=1000)

# Find the optimal lambda using maximum likelihood estimation
lambda_opt = boxcox_normmax(data)

# Apply Box-Cox transformation with the optimal lambda
data_boxcox = boxcox(data, lmbda=lambda_opt)

# Print the optimal lambda value
print(f"Optimal Lambda value: {lambda_opt}")

# Plot original and Box-Cox transformed data
plt.figure(figsize=(12, 6))

plt.subplot(1, 2, 1)
plt.hist(data, bins=30, edgecolor='black')
plt.title('Original Data')
plt.xlabel('Value')
plt.ylabel('Frequency')

plt.subplot(1, 2, 2)
plt.hist(data_boxcox, bins=30, edgecolor='black')
plt.title('Box-Cox Transformed Data')
plt.xlabel('Value')
plt.ylabel('Frequency')

plt.tight_layout()
plt.show()

4. Encoding Categorical Variables

Label Encoding: Assigning a unique integer to each category.

import pandas as pd
from sklearn.preprocessing import LabelEncoder

# Sample data
data = {
    'color': ['red', 'blue', 'green', 'blue', 'red', 'green']
}
df = pd.DataFrame(data)

# Initialize the LabelEncoder
label_encoder = LabelEncoder()

# Fit and transform the data
df['color_encoded'] = label_encoder.fit_transform(df['color'])

# Show the original and encoded data
print(df)

One-Hot Encoding: Creating binary columns for each category.

import pandas as pd
from sklearn.preprocessing import OneHotEncoder

# Sample data
data = {
    'color': ['red', 'blue', 'green', 'blue', 'red', 'green']
}
df = pd.DataFrame(data)

# Initialize the OneHotEncoder
one_hot_encoder = OneHotEncoder(sparse=False)

# Fit and transform the data
one_hot_encoded = one_hot_encoder.fit_transform(df[['color']])

# Get the feature names for the one-hot encoded columns
one_hot_encoded_df = pd.DataFrame(one_hot_encoded, columns=one_hot_encoder.get_feature_names_out(['color']))

# Show the one-hot encoded DataFrame
print(one_hot_encoded_df)

Ordinal Encoding: Encoding categories based on their order.

import pandas as pd
from sklearn.preprocessing import OrdinalEncoder

# Sample data
data = {
    'Customer ID': [1, 2, 3, 4, 5],
    'Satisfaction Level': ['Satisfied', 'Very Unsatisfied', 'Neutral', 'Very Satisfied', 'Unsatisfied']
}

# Create a DataFrame
df = pd.DataFrame(data)

# Define the categories with the specified order
categories = [['Very Unsatisfied', 'Unsatisfied', 'Neutral', 'Satisfied', 'Very Satisfied']]

# Initialize the OrdinalEncoder
encoder = OrdinalEncoder(categories=categories)

# Fit and transform the 'Satisfaction Level' column
df['Satisfaction Level Encoded'] = encoder.fit_transform(df[['Satisfaction Level']])

# Display the encoded DataFrame
print(df)

Binary Encoding: Combining label and one-hot encoding. In binary encoding, each category is first label encoded as an integer. Then, the integer is converted into binary digits. Each binary digit becomes a separate column. This can be more efficient than one-hot encoding when dealing with a large number of categories.

import pandas as pd
import category_encoders as ce

# Sample data
data = {
    'city': ['New York', 'Los Angeles', 'Chicago', 'Houston']
}
df = pd.DataFrame(data)

# Initialize the BinaryEncoder
binary_encoder = ce.BinaryEncoder(cols=['city'])

# Fit and transform the data
df_encoded = binary_encoder.fit_transform(df)

# Show the original and encoded data
print(df_encoded)

5. Binning: Converting continuous features into categorical ones by dividing them into intervals. This can be useful in various situations, such as when you want to reduce the impact of minor observation errors or when preparing data for algorithms that require categorical input.

import pandas as pd

# Sample data
data = {
    'age': [22, 25, 47, 35, 46, 50, 65, 68, 70, 45]
}
df = pd.DataFrame(data)

# Define the bins and labels
bins = [0, 18, 35, 50, 65, 100]
labels = ['Child', 'Youth', 'Adult', 'Middle-aged', 'Senior']

# Use cut to bin the continuous data
df['age_group'] = pd.cut(df['age'], bins=bins, labels=labels)
sns.countplot(data=df, x='age_group')

# Show the original and binned data
print(df)

6. Reciprocal Transformations: Transforming features to reduce skewness and to manage large differences in scale. T This transformation is particularly useful when dealing with data that has a skewed distribution with a long tail towards smaller values. It can help to compress large values and expand small values, making the distribution more symmetrical.

# Sample data
data = {
    'value': [1, 2, 5, 10, 20]
}
df = pd.DataFrame(data)
# Apply reciprocal transformation
df['reciprocal_value'] = 1 / df['value']
# Show the original and transformed data
print(df)

7. Polynomial Features: Creating new features by raising existing features to a power or by multiplying them together. Polynomial features are used in various scenarios where the relationship between the dependent and independent variables is non-linear. Some machine learning algorithms, like polynomial regression or certain types of support vector machines (SVMs), may require or benefit from polynomial features to perform better.

import pandas as pd
from sklearn.preprocessing import PolynomialFeatures

# Sample data
data = {
    'feature1': [1, 2],
    'feature2': [4, 5]
}
df = pd.DataFrame(data)

# Initialize the PolynomialFeatures with the desired degree
poly = PolynomialFeatures(degree=3, include_bias=False)

# Fit and transform the data
poly_features = poly.fit_transform(df)

# Show the polynomial features
print(poly_features)

#Output:
[[  1.   4.   1.   4.  16.   1.   4.  16.  64.]
 [  2.   5.   4.  10.  25.   8.  20.  50. 125.]]

Here's the detailed breakdown of the columns in the output:
feature1
feature2
feature1^2
feature1 * feature2
feature2^2
feature1^3
feature1^2 * feature2
feature1 * feature2^2
feature2^3

8. Interaction Features: Creating new features that capture the interaction between existing features. They are particularly useful when there are non-linear relationships between variables or when the combined effect of multiple variables is important.

from sklearn.preprocessing import PolynomialFeatures

# Sample data
data = {
    'feature1': [1, 2, 3, 4, 5],
    'feature2': [2, 3, 4, 5, 6]
}
df = pd.DataFrame(data)

# Initialize PolynomialFeatures to capture interaction terms
interaction = PolynomialFeatures(degree=2, interaction_only=True, include_bias=False)

# Fit and transform the data to create interaction features
interaction_features = interaction.fit_transform(df)

# Convert the result to a DataFrame for better readability
interaction_df = pd.DataFrame(interaction_features, columns=interaction.get_feature_names_out(df.columns))

# Show the original and interaction features
print(interaction_df)

The difference between Polynomial features and interaction features is that unlike polynomial features, interaction features do not include the individual features raised to powers, they only include the interaction terms.

Feature transformation is a critical step in the machine learning pipeline that can significantly impact model performance. It involves changing the original features into new formats that are more suitable for model building. This can include scaling, encoding, transforming distributions, and creating new features through polynomial or interaction terms. Each transformation technique has its use cases and benefits, and the right choice often depends on the nature of the data and the specific requirements of the machine learning model being used.

Related Blogs:

Feature Engineering and it’s types

Feature Transformation

Complete Data Science Roadmap

Give it :👏👏👏👏:
If you found this guide helpful , why not show some love? Give it a Clap 👏, and if you have questions or topics you’d like to explore further, drop a comment 💬 below 👇. If you appreciate my hard work please follow me. That is the only way I can continue my passion.

Feature Transformation- Part of Feature Engineering

Written by Rina Mondal