Hi James! Glad you’re enjoying the series!
I’m actually brushing aside the problem of graph inference, and showing the value of knowing the true graph.
For each of those data points (for R²), I generate a random causal graph over N variables (what did i choose, 50?). On top of that graph, I generate a random set of structural equations (by drawing random values for linear regression coefficients). Then, I choose an arbitrary node in that graph as Y (without loss of generality, say, the first in the vertex set). Then, I perform the regression by regression Y on its parents.
What this demonstrates is that if you know the causes of Y, your subset selection problem is solved. No other variables in the graph should be included in your regression: you’ve blocked all other possible causes from Y by regressing on its parents, and so all of their information about Y is contained in the regression already.
By solving the subset selection problem, you’ve also correctly estimated the direct effects of each of the causes of Y on Y, so you’ve solved the beta-hat problem as well.