I’m trying to imagine what this initial strategy would actually look like, and failing.
Frame Of Stack

I will certainly pay more attention to the bootstrapping step when it seems like a bottleneck / weak point (hopefully that will happen because the weaker points become stronger, rather than because this point becomes weaker).

I’ll try to quickly give a very rough sense of what I have in mind.

(Note that A1 is trained on a fixed set of problems, determined by ALBA itself, namely those that arise in the process of overseeing A2).

Suppose that you are solving the meta-meta-meta… problem. You are faced with a question like {How good was it to perform action {a} in state {s}?}

It seems to me you have many easy/natural ways to break this down, like:

  • What is the most important next thing to accomplish in state {s}?
  • What is the strongest argument for/against taking action {a} in state {s}?
  • What are the most promising alternatives to taking action {a} in state {s}?
  • What are the most similar situations to {s} about which I have relevant information?
  • Is the action {a} safe to display to a human user?

Most of these approaches aren’t really sensitive to whether we are at the meta level or the meta meta level. If we zoom in on one of them, like {What the most important next thing to accomplish in state {s}?} it seems like the situation is similar:

  • What are the plausible next goals in state {s}?
  • What are the relative importances of different plausible goals?
  • Where do we want to end up in order to achieve the goal of state {s}?
  • What information would allow us to determine what to do in state {s}, and how valuable is that information?

And if we imagine getting answers to any of these questions, there are similarly natural next steps, and so on.

In the first levels we may pick somewhat arbitrarily from amongst possible moves, since it’s up to the human to evaluate them. But as our agent becomes stronger it hopefully uses increasingly sophisticated heuristics to decide what to do.

I think the actual suitability of this particular scheme is largely a practical question, which I do hope to engage with eventually. (The details of annotated functional programming are mostly shaped by my experience with a sequence of early prototypes.)

Like what you read? Give Paul Christiano a round of applause.

From a quick cheer to a standing ovation, clap to show how much you enjoyed this story.