DeepMind AGI safety researcher Rohin Shah recently published an interesting paper on how agents can learn the wrong goal, described in this post: Goal Misgeneralisation: Why Correct Specifications Aren’t Enough For Correct Goals
By Rohin Shah, Vikrant Varma, Ramana Kumar, Mary Phuong, Victoria Krakovna, Jonathan Uesato, and Zac Kenton. For more…deepmindsafetyresearch.medium.com Then @DavidSKrueger QT’d a related paper from his group “Goal Misgeneralization in Deep Reinforcement Learning” https://arxiv.org/abs/2105.14111 (Langosco et al 2021), with a similar focus.