ALBA: an explicit proposal for aligned AI
Paul Christiano
1159

“I’ve described one approach to bootstrapping. ALBA requires any mechanism for turning a weak agent into a (much slower) strong agent without compromising alignment.”

I would suggest looking at reinforcement learning approaches for game playing programs (like chess), in particular TDLeaf(\lambda). This algorithm is somewhat underdeveloped in the literature, but it does work pretty well. It has the key property that you want for bootstrapping: you can get a stronger agent by simply allowing that agent to search the game tree for more time using the same reward function.

You might be able to construct a clever little game in which there is some moral alignment for play in order to test and refine your ideas.

Like what you read? Give Tim Kaler a round of applause.

From a quick cheer to a standing ovation, clap to show how much you enjoyed this story.