Here are some views, often held in a cluster:
I'm not sure exactly which clusters you're referring to, but I'll just assume that you're pointing to something like "people who aren't very into the sharp left turn and think that iterative, carefully bootstrapped alignment is a plausible strategy." If this isn't what you were trying to highlight, I apologize. The rest of this comment might not be very relevant in that case.
To me, the views you listed here feel like a straw man or weak man of this perspective.
Furthermore, I think the actual crux is more often "prior to having to align systems that are collectively much more powerful than humans, we'll only have to align systems that are somewhat more powerful than humans." This is essentially the crux you highlight in A Case for the Least Forgiving Take On Alignment. I believe disagreements about hands-on experience are quite downstream of this crux: I don't think people with reasonable views (not weak men) believe that "without prior access to powerful AIs, humans will need to align AIs that are vastly, vastly superhuman, but this will be fine because these AIs will need lots of slow, hands-on experience in the world to do powerful stuff (like nanotech)."
So, discussing how well superintelligent AIs can operate from first principles seems mostly irrelevant to this discussion (if by superintelligent AI, you mean something much, much smarter than the human range).
I would be more sympathetic if you made a move like, "I'll accept continuity through the human range of intelligence, and that we'll only have to align systems as collectively powerful as humans, but I still think that hands-on experience is only..." In particular, I think there is a real disagreement about the relative value of experimenting on future dangerous systems instead of working on theory or trying to carefully construct analogous situations today by thinking in detail about alignment difficulties in the future.
I largely agree with the general point that I think this post is making, which I would summarize in my own words as: the importance of iteration-and-feedback cycles, experimentation, experience, trial-and-error, etc. (LPE, in your terms) is sometimes overrated in importance and necessity. This over-emphasis is particularly common among those who have an optimistic view on solving the alignment problem through iterative experimentation.
I think degree to which LPE is actually necessary for solving problems in any given domain, as well as the minimum amount of time, resources, and general tractability of obtaining such LPE, is an empirical question which people frequently investigate for particular important domains.
Differing intuitions about how important LPE is in general, and how tractable it is to obtain, seems like an important place for identifying cruxes in world views. I wrote a bit more about this in a recent post, and commented on one of the empirical investigations to which my post is partially a response to. As I said in the comment, I find such investigations interesting and valuable as a matter of furthering scientific understanding about the limits of the possible, but pretty futile as attempts to bound the capabilities of a superintelligence. I think your post is a good articulation of one reason why I find these arguments so uncompelling.
Here are some views, oftentimes held in a cluster:
You can probably see the common theme here. It holds that learning by practical experience (henceforth LPE) is the only process by which a certain kind of cognitive algorithms can be generated. LPE is the only way to become proficient in some domains, and the current AI paradigm works because it implements this kind of learning, and it only works inasmuch as it implements this kind of learning.[1]
All in all, it's not totally impossible. I myself had suggested that some capabilities may only be implementable via one algorithm and one algorithm only.
But I think this is false, in this case. And perhaps, when put this way, it already looks false to you as well.
If not, let's dig into the why.[2]
A Toy Formal Model
What is a "heuristic", fundamentally speaking? It's a recorded statistical correlation — the knowledge that if you're operating in some environment E with the intent to achieve some goal G, taking the action A is likely to lead to achieving that goal.
As a toy formality, we can say that it's a structure of the following form:
h:⟨E, G⟩→A | EA→GEThe question is: what information is necessary for computing h? Clearly you need to know E and G — the structure of the environment and what you're trying to do there. But is there anything else?
The LPE view says yes: you also need a set of "training scenarios" S={EA1, ..., EAn}, where the results of taking various actions Ai on the environment are shown. Not because you need to learn the environment's structure — we're already assuming it's known. No, you need them because... because...
Perhaps I'm failing the ITT here, but I think the argument just breaks down at this step, in a way that can't be patched. It seems clear, to me, that E itself is entirely sufficient to compute h, essentially by definition. If heuristics are statistical correlations, it should be sufficient to know the statistical model of the environment to generate them!
Toy-formally, P(h|E⋅S)=P(h|E). Once the environment's structure is known, you gain no additional information from playing around with it.
If your understanding is incomplete, sure, you may gain an additional appreciation of the environment's dynamics by running mental simulations. But it's still about figuring out the environment's structure, not because this training set is absolutely necessary.
Concretely:
Figuring out good environmental heuristics does not strictly require a training set, only the knowledge of the environment's structure.
Why Are Humans Tempted to Think Otherwise?
Two reasons:
The first is because in many practical cases, LPE is the most cost-efficient way to learn an environment's structure. Even in my very simple tic-tac-toe example, momentary abstract reasoning only yielded us a "pretty good" move. In practical cases, the situation is even worse: we're not given the game's rules on a silver platter, we can only back-infer them from studying how things tend to play out.
The second is because our System 1 (which implements quick heuristics) is faster and allocated more compute than System 2 (which does abstract reasoning), owning to the fact that general intelligence is a novel evolutionary adaptation. Thus, "solving" environments abstractly is more time-consuming than just running out and refining our LPE-heuristics against them, and the resultant algorithms work slower. (And that often makes them useless — consider trying to use System 2 to coordinate muscle movements in a brawl.)
This creates the illusion that LPE is the only thing that works. It is, however, an illusion:
LPE is a specific method of deriving a certain type of statistical correlations from the environment, and it only works if it's given a set of training examples as an input. But it's not the only method — merely one that's most applicable in the regime in which we've been operating up to this point.
What about superintelligent AGIs, then? By the definition of being "superintelligent", they'd have more resources allocated to their general-intelligence module/System-2 equivalent. Thus, they'd be natively better at solving environments abstractly, "without experience".
Takeaways
The LPE views holds that merely knowing the structure of some domain is not enough to learn how to navigate it. You also need to do some trial-and-error in it, to arrive at the necessary heuristics.[4]
I claim that this is false, that there are algorithms that allow learning without experience — and indeed, that one of such algorithms is the cornerstone of "general intelligence".
If true, this should negate the initial statements:
It is, in fact, possible to make strong predictions about OOD events like AGI Ruin — if you've studied the problem exhaustively enough to infer its structure despite lacking the hands-on experience. By the same token, it should be possible to solve the problem in advance, without creating it first.
And an AGI, by dint of being superintelligent, would be very good at this sort of thing — at generalizing to domains it hasn't been trained on, like social manipulation, or even to entirely novel ones, like nanotechnology, then successfully navigating them at the first try.
Much like the existence vs. nonexistence of general intelligence, the degree of importance ascribed to LPE seems to be one of the main causes of divergence in people's P(doom) estimates.
Put in other words, it says that babble-and-prune is the only general-purpose method of planning possible. Stochastically generate candidate solutions, prune them, repeat until arriving at a good-enough solution.
Also, here's a John Wentworth post that addresses the babble-and-prune framing in particular.
And it's indeed a pretty good move, much better than random, if not the optimal one.
Indeed, some people ascribe some truly mythical importance to that process.