Even after thinking through these issues in SERI-MATS, and already agreeing with at least most of this post, I was surprised upon reading it how many new-or-newish-to-me ideas and links it contained.
I'm not sure if that's more of a failure of me, or of the alignment field to notice "things that are common between a diverse array of problems faced". Kind of related to my hunch that multiple alignment concepts ("goals", "boundaries", "optimization") will turn out to be isomorphic to the same tiny-handful of mathematical objects.
What's the quickest way to get up to speed and learn the relevant skills, to the level expected of someone working at OpenAI?
Now I'm immediately going to this thought chain:
Maybe such an IQ test could be designed. --> Ah, but then specialized AIs (debatably "all of them" or "most of the non-LLM ones") would just fail equally! --> Maybe that's okay? Like, you don't give a blind kid a typical SAT, you give them a braille SAT or a reader or something (or, often, no accommodation).
The capabilities-in-the-wrong-order point is spot-on.
I could also imagine a thing that's not so similar to IQ in the details, but still measures some kind of generic "information processing/retention/[other activities] capability"... but, as noted, we already can't predict capabilities, so such a measure would need to either solve that or else be less useful than e.g. parameter-count. --> If we classified and properly understood a taxonomy of capabilities, that'd help build such a measure! --> That task, itself, is probably really bloody difficult compared with "some hypothetical tactic that attacks the 'IQ-style fire-alarm' narrative head-on / in a different way".
(A toy demo of "the IQ-style fire alarm won't come" could be a good subtask of "toy demo of misalignment that actually convinces people"... OR it could end up as a "toy demo of capabilities that just pushes people to work more on capabilities", which is obviously bad.)