but would need to get some direct observations of the overseer in order to make good predictions.
Human-in-the-counterfactual-loop
Paul Christiano
72
Interesting, you could boost the model of a human decision by dreaming up lots of realistic examples, and then submitting the ones that look marginal to a human observer. (an AI that does ‘thought experiments’?) This is an advantage of explicitly modelling the human’s preferences over worlds / having a counterfactual human in the loop.