Approval-directed agents
Paul Christiano

I had one source of confusion when reading this. It seems that “Hugh” is being used in several ways: as a human, as an agent acting on behalf of the human, and as Arthur’s model of the human. Is that correct?

For example:

“If Arthur is making billions of small decisions each second, then Hugh can think in depth about each of them, and the resulting system can be much smarter than Hugh.”

In that sentence, the first instance of Hugh seems to be an agent acting on behalf of the human (since it can examine billions of decisions, which a human cannot do), while the second instance seems to clearly be a reference to the actual human.

