Capacities aren’t totally ordered.
Paul Christiano

>I don’t think that an AI should unpack the entire action a or state s

Presumably the vulnerable A_n will still end up looking at some functions of a and s, and if A_n+1 can predict what those functions are, it can work backwards and figure out what a and s need to be such that A_n ends up looking at the exploit string.

>In particular, suppose that it has reasonable views about what constitutes a persuasive argument in the abstract, and what kind of analysis is appropriate for determining if an argument is reasonable.

It seems to me that we are still pretty far from having such reasonable views with regard to philosophical arguments, and it seems hard to create an AI with such reasonable views when we don’t ourselves have them. Do you disagree with either of these statements?

>But the point of bootstrapping is to instead break this question down into manageable parts and to answer each part separately

Do you think we can do this with humans? Take a group of people and have them evaluate some philosophical argument without anyone ever looking at the whole argument? Even if with this approach, you can fix some problems compared with individual humans evaluating arguments as a whole, how do you know that overall it’s an improvement, that you haven’t, through incomplete or incorrect understanding of how to break down the task of evaluating a philosophical argument, introduced more (or just different) problems than you fixed? If you can’t eliminate this possibility, how can you show that bootstrapping increases capacity, rather than just moving to another capacity that’s neither higher nor lower?