Prosaic AI alignment
Paul Christiano

First of all, I have my own article on this point with similar reasoning and conclusions:

From section 2a of Paul Christiano article: “we have no way to use RL to actually interpret and implement human wishes, rather than to optimize some concrete and easily-calculated reward signal”

We have to create the Ethics dataset to teach neural networks all the complexity of human wishes. This Ethics dataset shall include many examples from “Superintelligence” among other things.

We all know how important the datasets are in machine learning. The quality, size and diversity of datasets are approximately as important as the quality of learning algorithms. That’s why we need to get great funding to begin creating and improving the Ethics dataset right now. This would also attract much attention from mass media because all the ethics is highly controversial issue (which is one of reasons why AI alignment is hard to implement).

If we don’t start creating really big&diverse Ethics dataset, we would end up in a world that all other tasks have their big&diverse datasets except for Ethics. So AGI would be great in enough number of tasks to outcompete humans except just for Ethics problems.

I think the idea of Ethics dataset isn’t represented enough in section 2c in the article. It’s not just inferring human values from human behaviour. We can pretrain our neural network on Ethics dataset of any form (ethical games, tests, predictions of court decisions) and fine-tune on IRL task when this IRL framework is ready.

Overall, I agree with the article.
I agree that prosaic AGI is conceivable (and moreover that it’s at least ~30% likely to be invented in next 10 years), that it is a very appealing target for research on AI alignment. I’m also focused on prosaic AGI.

Show your support

Clapping shows how much you appreciated Sergej Shegurin’s story.