Understanding Marketing Analytics in Python. [Part 8] Transforming Variables.

Published in

Data At The Core !

7 min readSep 24, 2023

This is part 8 of the series on Marketing Analytics, have a look at the entire series introduction with details of each part here.

Transforming Variables before Computing Correlations :

A process distribution can take any shape. We cannot state that one distribution of the data is better — it just depends on the nature of the process.

Various methods are used to convert distributions to normal before going ahead with further analysis :

1. Check for Outliers

2. Box-Cox Transformation

3. Johnson Transformation

Check for Outliers [Clipping]:

The first thing we need to do check if the data is not normal because of any outliers. A normal data does not have any outliers — hence, if there are outliers in your data, then that may be the reason that the data is not normally distributed.

a. First we need to check if the outliers in the data are because of any data entry errors. If so, we can correct the data and then check if the data is normally distributed.

b. If there are no data entry errors, the next question to ask is if the outliers are because of some special causes which are not going to recur in the future. If so, it may be okay to note the reasons and then delete these outliers.

c. If these outliers have a chance of recurring in the future then it would not be appropriate to just blindly delete them from analysis. We need to look for other ways of handling this data.

Now, coming back to our example ( from previous part of this series )

Many relationships in marketing data are nonlinear. For example, as we see in the cust_df data, the number of trips a customer makes to a store may be inversely related to distance from the store. When we compute the correlation between the raw values of distance_to_store and store_spend, we get a modest negative correlation:

However, if we transform distance_to_store to its inverse (1/distance), we find a much stronger linear association:

In fact, the inverse square root of distance shows an even greater linear association:

Interpretation of the above result: there is a smaller effect per mile as you get further away … how ? Because of the inverse square root relationship

For eg : Someone who lives 1 mile from the nearest store will spend quite a bit more than someone who lives 5 miles away, yet someone who lives 20 miles away will only buy a little bit more than someone who lives 30 miles away.

These transformations are important when creating scatterplots between variables as well.

For example, examine the scatterplots in the following figures for raw distance_to_store versus store_spend, as compared to the inverse square root of distance.to.store versus store.spend. We create those two charts as follows:

scatterplot before transforming distance values

scatterplot after transforming the distance values

It is important to consider transforming variables to approximate normality before computing correlations or creating scatterplots; the appropriate transformation may help you to see associations more clearly.

Typical Marketing Data Transformations

Because marketing data often concern the same kinds of data in different datasets — counts, sales, revenue, and so forth — there are a few common transformations that often apply.

Box-cox Transformation

While we do have common transformations that are often helpful with different types of marketing variables, to determine the very best transformation, there is a general-purpose transformation function that can be used instead, the Box-cox Transformation.

The Box Cox transformation is a statistical tool that transforms non-normal data into a normal distribution. This transformation can improve the accuracy of predictions made using linear regression. This transformation can also make data more understandable and easier to work with.

There are three main reasons for using the Box Cox transformation:

To stabilize the variance — it ensures that the results of statistical tests are not influenced by variability in the data .
To improve normality — because many statistical techniques assume that the data is normally distributed.
To make patterns in the data more easily recognizable — when we are trying to identify relationships between variables or trends over time.

UNDERSTANDING BOX-COX TRANSFORMATIONS :

Box-Cox transformation is a statistical technique that involves transforming your target variable so that your data follows a normal distribution. A target variable is the variable in your analytical model that you are trying to estimate. Box-Cox transformation helps to improve the predictive power of your analytical model because it cuts away white noise.

What Is the Box-Cox Transformation Equation?

Many of the transformations in the transformation table involve taking a power of x. [ see chart above] The Box-Cox transformation generalizes the use of power functions mentioned in the transformation table as shown in the figure below and is defined as:

One could try different values of lambda to see which transformation makes the distribution best fit the normal distribution.

Instead of trying values of lambda by hand, the scipy.stats.boxcox() function calculates the optimal lambda for the input data and then transforms the data using that lambda.

Example: We find the best Box-Cox transformation for distance_to_store using boxcox() as follows:

applying boxcox transformation ondistance values

understanding the result :

This tells us that the value of lambda to make distance as similar as possible to a normal distribution is 0.01844.
boxcox() also returned the transformed data, which we saved in the dts_bc variable.

To see how this changes cust_df.distance_to_store, we plot two histograms comparing the transformed and untransformed variables:

A scewed distribution of distance values changed to normal distribution using boxcox transformation

Looking at correlation coefficients after box cox transformations

Finally, we can compute correlations for the transformed variable. These correlations will often be larger in magnitude than correlations among raw, untransformed data points. We check r between distance and in-store spending.

We already transformed dist_to_store, we will also transform store_spend dts_bc, lmda = stats.boxcox(cust_df.distance_to_store)

Observation : The relationship between distance to the store and spending can be interpreted as strong and negative

Johnson Transformation

A third approach to transform the data to a normal distribution is to use another type of more complex transformation called the Johnson family of transformations. There are three different families of Johnson distributions:

Where, Y is the transformed data, X is the raw data, and eta, epsilon, and lambda are the Johnson parameters. Decision rules have been formulated for the selection of the appropriate Johnson family of distributions SU, SB, and SL. There are several algorithms available to fit the Johnson parameters for a given data set. However, due to complex nature of these algorithms, the solutions are not very straightforward and require the use of appropriate software to estimate these parameters.

Similar to a Box-Cox transformation, a computer can run through several combinations of these Johnson parameters to determine which set of parameters makes the transformed data as close to normal as possible. Since there are several parameters to fit the Johnson transformation, we usually find that a Johnson transformation does a better job of transforming the data to a normal distribution compared to a Box-Cox transformation.

More about understanding distribution :

Some commonly used distributions and applications are shown in the table below.

Various kinds of distributions of data values [https://sigmamagic.com/blogs/how-do-i-transform-data-to-normal-distribution/]

Some of the statistical analysis assumes that the data is normally distributed, for example 1-sample t test, ANOVA, regression etc. This is because the normal distribution has some special properties which may have been used to derive the statistical properties. If the assumption of normality is not satisfied, then the results of the statistical analysis may be incorrect. Hence, when performing statistical analysis, we should always be aware if there are any assumptions about normality of data and if so, we need to check these assumptions before the use of the statistical analysis.

Next part of this series, part 9 , which is our last story of this series we shall look into handling cardinal [ or ranking] variables , using our same dataset cust_df and see how to apply simple changes to data to get better visualization results which can tell us significantly about our data.

To go deeper into understanding transformations you can go through the reference links mentioned below.

ref :