
These are rarely new problems; rather, the formal process of explaining our desires to a computer — the ultimate case of someone with no cultural context or ability to infer what we don’t say — forces us to be explicit in ways we generally aren’t used to. Whether this involves making a life-or-death decision years ahead of time, rather than delaying it …
What people are good at, it turns out, isn’t explaining how they made decisions: it’s coming up with a reasonable-sounding explanation for their decision after the fact. Sometimes this is perfectly innocent: for example, we identify some fact which was salient for us i…
For any situation the model encounters, the only “explanation” it has of what it’s doing is “well, the following thousand variables were in these states, and then I saw the letter ‘c,’ and I know that this should change the probability of the user talking about a dog according to the following polynomial…”