This project has some relevance to AI control, but I don’t currently consider it to be an especially promising direction
Security and AI alignment
Paul Christiano

i.e. boxing.
Seems like a pretty promising approach to me. It will hurt performance (which is maybe why you’d say it’s not promising), but I’ve been hoping that researchers (e.g. at OpenAI) will be using boxing as a research tool to try and elicit dangerous instrumental-goal-driven behaviour in a safe virtual setting.

Show your support

Clapping shows how much you appreciated David Krueger’s story.