A Simple Primer on Meta Learner from ‘The Effect’
When experimentation has its magic to identify causal effect on changes, more than often, it is not the solution to all of the day-to-day scenarios. When strategic decision is made to all, business always comes back with questions about “how was the decision made?”, and that’s when causal analysis comes to play an important role to drive business refinement and more.
I have been searching for materials to study causal models, and The Effect is definitely one of the best I have read so far, that I would recommend to data science folks with different background. But, it is a 600+ page book and takes time to read and digest. As a way to give back to the data science community and also for me to mark down parts of the book that interest me, I would like to write a series of posts to share my understanding to some of the content of the book and welcome discussion and reviews for all to get to learn.
This is the 1st blog and I want to focus on the Conditional Conditional Means. This is a part from the chapter that focus on describing relationships between variables. The conditional conditional means focus on the unexplained (aka, the residuals).
Imagine two variables: X and Y. if we get the mean of Y conditional on X, there will be part of Y explained by X, and the other part of Y cannot explained by X.
Adding a third variable Z, where Z is the confounder of X and Y (aka, Z cause X and Y together). If we don’t do anything, X and Y will look correlated because of Z. But How do we know if X and Y are related given Z?
There are many ways to ‘control for Z’ and then investigate the relationship between X and Y. Here we are using the residuals.
The logic is like this: if we can subtract the part that is explained by Z from both X and Y (aka, the residual of X and Y), the remaining part is purely X and Y not varied by Z. If there is still a strong relationship between the two remaining parts, then X and Y maybe do have a strong relationship.
Here is a quick coding illustration.
# generate a sample from the standard normal distribution to represent z
z = np.random.randn(300)
# make x_1 and x_2 both directly caused by z
x_1 = z * 3 + np.random.randn(300)
x_2 = z * 5 + np.random.randn(300)
# look at the correlation coef between x_1 and x_2, they are strongly correlated
np.corrcoef(x_1, x_2)[0,1] # 0.93
# plot to see the relationship
plt.scatter(x_1, x_2) # figure 1
# calculate the residual from x_1, controling for z
y_1 = LinearRegression().fit(np.array(z).reshape(300,1), x_1)
pred_1 = y_1.predict(np.array(z).reshape(300,1))
residual_1 = (x_1 - pred_1)
# calculate the residual from x_2, controling for z
y_2 = LinearRegression().fit(np.array(z).reshape(300,1), x_2)
pred_2 = y_2.predict(np.array(z).reshape(300,1))
residual_2 = (x_2 - pred_2)
# plot the residuals, figure 2
plt.scatter(residual_1, residual_2)
As you can see, after removing the z-part from x_1 and x_2, the rest seems to be none-related, as it is supposed to be.
Let’s change the dynamic a little bit and see again.
z = np.random.randn(300)
x_1 = z * 3 + np.random.randn(300)
# x_2 is a direct cause by x_1
x_2 = x_1 * 2 + np.random.randn(300)
# the correlation coef is .99
np.corrcoef(x_1, x_2)[0,1]
#fitting the linear regression for both(ignore this part as it is the same)
# plot the residuals, figure 3
plt.scatter(residual_1, residual_2)
After reading through the part, I quickly relate this content to the meta learner algorithm. I found it interesting to see how the same logic root from.