Hi Andre!

I’m working on an article now that shows how to use the g-formula in more general machine learning contexts. Watch out for that one in the next week or so! The basic idea is you want to move away from the classic contrasts like E(Y|do(x=1))-E(Y|do(x=0)), toward the basic quantity P(Y|do(x)), so you actually know the effect of an intervention (and not just relative to another state). The g-formula gives you this quantity, if you have a set Z to control for. Any probabilistic machine learning algorithm could produce these inputs necessary, namely P(Y|X,Z) and P(Z). That’s another good reason to go for probabilistic models!

I’m not really an expert on deep nets, but I’ve implemented some and have read a little about them. They look a lot like the functional version of bayesian networks, where the internal relationships are deterministic, and the inputs are stochastic (i.e. draws from some distribution). I don’t have much intuition for whether they’re somehow implementing causal graph inference in a way the blurs out the causal graph over the whole NN. The could be fun to explore.