I guess the difference is that in the “hard” examples, different levels of performance are possible, and I don’t know how humans are able to perform at the level they do, therefore I don’t know how to duplicate that level of performance with Hᴬ, or how to determine (aside from actually running Hᴬ) whether a proposed strategy for H would actually cause Hᴬ to match human level performance.
Maybe my “learning to program” example isn’t the best one to illustrate this since judging/comparing programming performance is non-trivial. Instead consider the task of learning to synthesize speech for a foreign language (that would ideally be indistinguishable from a native speaker). A human learner can do this without much trouble if they start at a young age. I can imagine the following high-level task decomposition for Hᴬ:
- create a table of phonemes that the language uses
- transcribe the input text into a phonetic representation
- using the phonetic representation, string together the phonemes in the text, while trying to match the tempo and intonation of native speakers
But this is the same general strategy that people who build software speech synthesizers use, and their output is still far from matching that of native speakers. Do you have an argument why Hᴬ would do much better on this learning task than current software, namely as well as a human child?
I have another question that’s kind of tangential. Suppose Hᴬ receives a complex query, and figures out a possible way to eventually answer the query but isn’t sure that the whole computation that would result is actually benign. What should it do at that point? More generally, what does H need to do to make sure that nothing malign happens as a result of their actions? Is there a standard guide H can follow, or do they also have to figure it out case by case?