Understanding Transformation of Random Variables using Python

S Joel Franklin
Analytics Vidhya
Published in
6 min readNov 13, 2019

A random variable is a numerical description of the outcome of a statistical experiment. It can be discrete or continuous depending upon the outcome of experiment.

We would be looking at two kinds of transformations of random variables namely ‘Scaling’ and ‘Shifting’.

SHIFTING TRANSFORMATION

Let ‘P’ be a random variable and ‘Q’ be the transformed random variable where Q= P+ t (t is any real number). Let us look at one particular case where Q= P+ 8. We plot the random variables on x-axis and a set of 100 values equally spaced between 1 and 10 on y-axis.

import numpy as np # The necessary packages are imported.
import matplotlib.pyplot as plt
b = list(np.linspace(1,10,100)) # ‘b’ is initialised as 100 numbers equally spaced between 1 and 10.
P = list(np.random.randn(100)) # ‘P’ random variable is initialised as 100 random numbers.
plt.figure(figsize=(20,10)) # The size of figure is adjusted.mean_Px = sum(P)/len(P) # ‘mean_Px’ is the mean value of ‘P’.
mean_Py = sum(b)/len(b) # ‘mean_Py’ is the mean value of ‘b’.
plt.plot(mean_Px,mean_Py,’ro’,markersize = 10) # The mean value (mean_Px,mean_Py) point is plotted.Q = [num+8 for num in P] # ‘Q’ random variable is defined as (P+8).
mean_Qx = sum(Q)/len(Q) # ‘mean_Qx’ is the mean value of ‘Q’.
mean_Qy = sum(b)/len(b) # ‘mean_Qy’ is the mean value of ‘b’.
plt.plot(mean_Qx,mean_Qy,’ro’,markersize = 10) # The mean value (mean_Qx,mean_Qy) point is plotted.
plt.scatter(P,b) # Scatter plot of ‘P’ on x-axis and ‘b’ on y-axis.
plt.scatter(Q,b) # Scatter plot of ‘Q’ on x-axis and ‘b’ on y-axis.
‘X’ values are shaded in blue. ‘Y’ values are shaded in yellow. Means of both data are shaded in red.

From the above scatter plot, we observe that mean of ‘P’ has shifted to the right by 8 units after ‘shifting transformation’. Let us verify this by calculating the mean difference between ‘P’ and ‘Q’.

mean_Px — mean_Qx # Mean difference between ‘P’ and ‘Q’ is calculated.

As expected the mean difference between ‘P’ and ‘Q’ is 8 units. Hence shifting a random variable by ‘t’ units also shifts the mean of random variable by ‘t’ units. E[P+t] = E[P] + t where E[P] is expected value of P which is same as mean value of ‘P’.

Now let us look at what shifting does to Variance of random variable. Variance gives you a measure of distribution of points about the mean. It is the average of the squared distances from each point to the mean. A high value of Variance indicates the points lie far away from mean. A low value of Variance indicates the points lie closer to mean. If you observe the previous scatter plot, we see that blue and yellow points are distributed about their respective means in a similar fashion. So we would expect the Variance to remain same after ‘shifting transformation’. Let us verify this by calculating the variance difference between ‘X’ and ‘Y’.

Variance_Px = np.var(P) # Variance of random variable ‘P’ is calculated.
Variance_Qx = np.var(Q) # Variance of random variable ‘Q’ is calculated.
Variance_Px — Variance_Qx # Variance difference between ‘P’ and ‘Q’ is calculated.

As expected the variance difference between random variables ‘P’ and ‘Q’ is 0 units. Hence shifting a random variable by ‘t’ units does not affect the variance of random variable. Var[P+t] = Var[P] where Var[P] is Variance of random variable ‘P’.

SCALING TRANSFORMATION

Let ‘P’ be a random variable and ‘Q’ be the transformed random variable where Q= t*P (t is any real number). Let us look at one particular case where Q= 8*P. We plot the random variables on x-axis and a set of 100 values equally spaced between 1 and 10 on y-axis.

import numpy as np # Import the necessary packages.
import matplotlib.pyplot as plt
b = list(np.linspace(1,10,100)) # ‘b’ is initialised as 100 numbers equally spaced between 1 and 10.
P = list(np.random.rand(100)) # ‘P’ random variable is initialised as 100 random numbers.
plt.figure(figsize=(20,10)) # The size of figure is adjusted.mean_Px = sum(P)/len(P) # ‘mean_Px’ is the mean value of ‘P’.
mean_Py = sum(b)/len(b) # ‘mean_Py’ is the mean value of ‘b’.
plt.plot(mean_Px,mean_Py,’ro’,markersize = 10) # The mean value (mean_Px,mean_Py) point is plotted.Q = [num*8 for num in P] # ‘Q’ random variable is defined as (8*P).mean_Qx = sum(Q)/len(Q) # ‘mean_Qx’ is the mean value of ‘Q’.
mean_Qy = sum(b)/len(b) # ‘mean_Qy’ is the mean value of ‘b’.
plt.plot(mean_Qx,mean_Qy,’ro’,markersize = 10) # The mean value (mean_Qx,mean_Qy) point is plotted.plt.scatter(P,b) # Scatter plot of ‘P’ on x-axis and ‘b’ on y-axis.
plt.scatter(Q,b) # Scatter plot of ‘Q’ on x-axis and ‘b’ on y-axis.
‘X’ values are shaded in blue. ‘Y’ values are shaded in yellow. Means of both data are shaded in red.

From the above scatter plot, we observe that mean value of ‘P’ has become 8 times its previous mean value after ‘scaling transformation’. Let us verify this by the code given below.

mean_Qx/mean_Px # Mean of ‘Q’ divided by mean of ‘P’.

As expected mean value of ‘P’ has become 8 times its previous mean value after ‘scaling transformation’. Hence scaling a random variable by ‘t’ units also scales the mean of random variable by ‘t’ units. E[t*P] = E[P]*t .

Now let us look at what scaling does to Variance of random variable. If you observe the previous scatter plot, we see that the yellow points are distributed far away from mean than the red points indicating that Variance of ‘P’ has increased significantly after scaling transformation. Let us verify this by the code given below.

Variance_Px = np.var(P) # Variance of random variable ‘P’ is calculated.
Variance_Qx = np.var(Q) # Variance of random variable ‘Q’ is calculated.
Variance_Qx/Variance_Px # Variance of ‘Q’ divided by Variance of ‘P’.

As expected Variance of ‘Q’ = 64 times Variance of ‘P’. Hence scaling a random variable by ‘t’ units scales the variance of random variable by ‘t²’ units. Var[t*P] = Var[P]*t² where Var[P] is Variance of ‘P’.

SHIFTING & SCALING TRANSFORMATION

Let us see what happens when a random variable is both scaled and shifted. Let ‘P’ be a random variable and ‘Q’ be the transformed random variable where Q= n*P+ t (n and t are real numbers). Let us look at one particular case where Q= 10*P+ 8. We plot the random variables on x-axis and a set of 100 values equally spaced between 1 and 10 on y-axis.

import numpy as np # The necessary packages are imported.
import matplotlib.pyplot as plt
b = list(np.linspace(1,10,100)) # ‘b’ is initialised as 100 numbers equally spaced between 1 and 10.
P = list(np.random.rand(100)) # ‘P’ random variable is initialised as 100 random numbers.
plt.figure(figsize=(20,10)) # The size of figure is adjusted.
mean_Px = sum(P)/len(P) # ‘mean_Px’ is the mean value of ‘P’.
mean_Py = sum(b)/len(b) # ‘mean_Py’ is the mean value of ‘b’.
plt.plot(mean_Px,mean_Py,’ro’,markersize = 10) # The mean value (mean_Px,mean_Py) point is plotted.
Q = [10*num+8 for num in P] # ‘Q’ random variable is defined as (10*P+8).
mean_Qx = sum(Q)/len(Q) # ‘mean_Qx’ is the mean value of ‘Q’.
mean_Qy = sum(b)/len(b) # ‘mean_Qy’ is the mean value of ‘b’.
plt.plot(mean_Qx,mean_Qy,’ro’,markersize = 10) # The mean value (mean_Qx,mean_Qy) point is plotted.
plt.scatter(P,b) # Scatter plot of ‘P’ on x-axis and ‘b’ on y-axis.
plt.scatter(Q,b) # Scatter plot of ‘Q’ on x-axis and ‘b’ on y-axis.
‘X’ values are shaded in blue. ‘Y’ values are shaded in yellow. Means of both data are shaded in red.

From the scatter plot we observe both the mean and variance of random variable ‘P’ has increased.

First we apply the ‘scaling’ transformation and then the ‘shifting’ transformation to the random variable ‘P’. The mean is scaled by 10 units and then shifted by 8 units. The variance is scaled by 100 units.

--

--

S Joel Franklin
Analytics Vidhya

Data Scientist | Fitness enthusiast | Avid traveller | Happy Learning