Are Unfriendly Values Unstable?

Source: The Moral Economist

In artificial intelligence (AI) safety, there is a concept known as human friendly values. In short, if an AI has human friendly values, then it will do things that the humans wants them to do.

This is complex for a number of reasons. First, it is difficult to define human friendly values. Second, it is hard to program complex concepts accurately. Third, humans don’t always know what they want. Finally, do…




