The intention is to completely re-implement translation, though (1) the hope is to accomplish it over the course of many iterations of meta-execution so that none of them is individually too expensive, and (2) you aren’t actually doing that at runtime, you are just using it to train an ML model.
You seem to be implicitly claiming that when re-implementing translation the agent is not helpful, except perhaps as a dictionary; that’s the claim with which I am disagreeing. I think you can make use of the agent at every step of the implementation/decomposition.
(It is certainly possible to construct agents which can translate but are insufficiently reflective to be able to help in any way other than by being able to translate. I don’t mean for security amplification to be possible given such an agent. I also think that for such agents it is basically not meaningful to talk about some input as being a “vulnerability,” the notion of vulnerability is only meaningful in the context of having some ability to reason about your own behavior. If the agent is literally just a giant table of translations from one language to another, it’s not even really clear what it means to call the table “wrong” if we look at the agent in isolation.)
It seems to me that the tasks you mention are all probably decomposable enough for meta-execution to work in bite-sized pieces. Fortunately, this claim is is also possible to test over the short term.
In the translation example, we can break the problem down into [source language → meaning] and then [meaning → target language] (and perhaps do further steps to identify and preserve any non-meaning components of the text that are important to preserve).
I think both parts can easily be broken down further. For example, we could break [source language → meaning] down further via a series of steps like “What are the possible meanings of the word [x] and how plausible are they?” and “If we juxtapose a subphrase meaning [x] and one meaning [y], what are the possible meanings of the result and how likely are they?” And so forth.
There is a second question about whether the meta-executed versions would have their own sets of equally-easy-to-exploit vulnerabilities, and my conjecture is that they would be harder to exploit (and that subsequent iteration would make them harder still to exploit). This is important, and harder to test. But it doesn’t seem like the key disagreement at this point.
There is a third question about whether the meta-level tasks are not only feasible for humans but also almost as easy to learn (such that if we could train a translation system, we could also train some meta-level implementation). My intuition is that this isn’t a deal-breaker and my impression is that we disagree on the feasibility-for-humans step of the argument. But I do think this is a strong additional claim and it’s a reasonable place to disagree.