I am not yet convinced that “lack of philosophical understanding” is a really significant driver of…
Paul Christiano

I seem to be more philosophically uncertain than you in general (e.g., I seem to be less sure that UDT is right or applicable to humans, or that utilitarianism is correct). This may account for some part of our disagreement, if you think that perhaps the “default” high-coordination outcome won’t be too bad under “actual” normativity, whereas I think there’s a much higher chance that normativity would look rather exotic/surprising and we likely won’t get close to what’s actually normative without prior understanding. For example, consider if actual normativity consists of tiling the universe with hedonium, or some form of negative utilitarianism, or putting everything towards trying to escape from simulation (or otherwise trying to influence something besides our apparent environment).

Another part seems to be that I think making AI defer to future human reflection will be difficult and costly, so doing that will require placing a high value on philosophical reflection, not just lacking antipathy towards it. How many people talk about anything like this or indirect normativity, even now that AI safety is a hot topic? Adding this feature may well be considered too costly even in terms of safety, unless someone is deeply uncertain about their values. (Think of system vulnerabilities introduced by bugs in crypto code.)

From my perspective, it seems that aside from an unlikely coincidence between what will happen and what is normative, there are only a few scenarios where something close to optimal will be achieved. 1) Some AI group (that shares my philosophical uncertainties) obtains such a large lead that they can afford to build in a lot of AI control features, including deferring to human reflection, and still win the AI race and obtain a decisive strategic advantage. 2) Some sort of metaphilosophical breakthrough is made that offers obvious benefits at such a low cost that every AI incorporates it and uses it instead of relying on human philosophical understanding. 3) No AI until both global coordination and human philosophical understanding are greatly improved (i.e., at least to the extent that most people realize how uncertain they ought to be and how much they ought to value reflection). Maybe a few more scenarios like these but hopefully you get the idea.