Eliciting Latent Knowledge: the A.I. Contest

~ the problem-formulation is equivalent to others ~

Photo by Daniel Vogel on Unsplash

“We gave the A.I. capabilities that we CANNOT DETECT. And, we have no independent verification — we routed all of our sensors through the A.I. exclusively, without access ourselves. And… the A.I. might be lying, with plans to destroy us. How do we make sure the A.I. will do the right thing?”

Um, wait. You gave it capabilities you can’t track? And, you have no independent sensors? Yet, you somehow hope to find what is really going on, in case the A.I. is lying… Well. You *set-up* an impossible problem.

It’s easy to demonstrate that, IF this A.I. problem has some PROTOCOL X that solves it, then that *same* protocol can be used to solve related problems:

A scientist who has a HYPOTHESIS which might lie, using PROTOCOL X, would no longer need to resort to experimental data. Instead of “A.I. that might lie” we have a “Hypothesis which might lie.” And, instead of “you don’t have access to sensors which can detect the A.I.’s manipulations” you replace that with “you don’t have access to the experimental data.” They are equivalent problems. So, IF there is a solution to the A.I. problem, such that you can achieve certainty about the A.I. without access to the sensors, then it is entailed that you also have a way to validate scientific hypotheses without resorting to experimental evidence.

Similarly, in the realm of politics and business regulation and nuclear disarmament: we hold parties accountable by means of verification. Without transparency, we cannot expect to hold them accountable. Yet, IF a solution exists to the A.I. problem, then that PROTOCOL X would also allow you to trust a politician without transparency, and guarantee a nuclear disarmament without inspectors.

I claim that, considering we have *not* found a means to do without experimental evidence or transparency and verification, there must also *not* be a solution to the ELK problem. Reductio ad Absurdum: you won’t find a way to avoid doing science.

--

--