Box-Cox Transformation, Explained

Box-Cox transformation example (Image by Author)

The Cox Box transformation is to transform the data so that its distribution is as close to a normal distribution as possible, that is, the histogram looks like a bell.

This technique has its place in feature engineering because not all species of predictive models are robust to skewed data, so it is worth using when experimenting. It probably won’t provide a spectacular improvement, although at the fine-tuning stage it can serve its purpose by improving our evaluation metric.

All code examples can be also found in this Colab notebook

Box-Cox Equation in code

The transformation itself has the following formula

Box-Cox equation (Source)

Let’s express them in code using the standard Python library

Box-Cox implementation in Python (Image by Author)

Or using NumPy package

Box-Cox implementation in NumPy (Image by Author)

I have the data, but how to select the lambda?

The case is not complicated, we need a normality test, compare its results for several lambdas in the range (customarily) <-5, 5> then choose the one whose test result is the best. A out-of-the-box solution is provided by the SciPy package

When the second argument (lambda) is not given to the boxcox function, it will be matched and returned.

Box-Cox lambda selection in ScipPy (Image by Author)

The only problem we encounter when using this implementation is the requirement that the input data elements have to be greater than zero. However, we have to just shift the values by the minimum of the dataset.

Shift to positives function (Image by Author)
Shift to positives function example (Image by Author)

Example for population by state in 2007

The full version of the code can be found in this online notebook, here I will only comment on the results.

Distributions comparison (Image by Author)

On the left, we see the distribution of our input data. A keen eye will notice that imposing the logarithm (middle column) perfectly approximates our data to the normal distribution, but the best effect is achieved by using the title transformation (right column)

Box-Cox as a Scikit-learn transformer

Let’s implement it as a ready-to-use scikit-learn transformer, so you can use it in a Pipeline or FeatureUnion. Additionally, it allows you to use it in train/test data split. Remember, lambda has to be picked using the training dataset only.

Box-Cox as a scikit-learn transformer (Image by Author)

Originally published at on February 7, 2022.

Thanks for reading. Don’t hesitate to clap and follow if you like this content. It takes only 2 seconds to help.




AWS Certified ML Specialist, IT Team Lead

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Solving for Retail — Google’s innovations in Data and Artificial Intelligence

Imagine a world without Protocol Deviations: Pushing Past Industry Norms to Truly Optimize Site…

Health Data Science FAQ Series| 002 — Technology Tools Part 2

5 Pre-requisites to strategic RPA

Pandas:The ML Building Block-Part 2

Join us. Become a WiDS Ambassador for 2021!

Predicting Your Heart Disease Risk

How to overcome major healthcare data challenges using BI

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Radoslaw Białowąs

Radoslaw Białowąs

AWS Certified ML Specialist, IT Team Lead

More from Medium

Feature scaling with scikit-learn. Understand it correctly

Detecting and overcoming imbalanced datasets

Image taken from Hu & Li (2013), A Novel Boundary Oversampling Algorithm Based on Neighborhood Rough Set Model: NRSBoundary-SMOTE.

Selecting Features with the Population Stability Index

Population Stability Index in data science

Generating a random sample following a Gaussian distribution