I think the existing systems you are talking about are based on imitation not because imitation is…
Vadim Kosoy
3

A box which can imitate a human is weaker (and easier to construct) than a box which can maximize human approval, at least when the action space is large.

But a box which can imitate any physical system (including other copies of itself) is equivalent to a box that can maximize approval of any physical system (including copies of itself), both in terms of power and feasibility.

Act-based agents are intended to receive training data produced by a system involving both humans and other act-based agents. So the situation is more like the second case, and the important distinctions depend on characteristics of a particular learning system.

A learning system can neither imitate nor maximize approval very well in this setting, and doing either perfectly would yield good results. The question is what happens when the system does a mediocre job of imitation vs approval-maximization.

For now I do see a number of problems with using imitation, but I think that the situation is not nearly so clear, and the obstacles not nearly so fundamental, as you suggest.