This project has some relevance to AI control, but I don’t currently consider it to be an especially promising direction
Security and AI alignment
Paul Christiano

i.e. boxing.
Seems like a pretty promising approach to me. It will hurt performance (which is maybe why you’d say it’s not promising), but I’ve been hoping that researchers (e.g. at OpenAI) will be using boxing as a research tool to try and elicit dangerous instrumental-goal-driven behaviour in a safe virtual setting.

Like what you read? Give David Krueger a round of applause.

From a quick cheer to a standing ovation, clap to show how much you enjoyed this story.