Nicholas Kross

Theoretical AI alignment (and relevant upskilling) in my free time. My current view of the field is here (part 1) and here (part 2).

Genderfluid (differs on hour/day-ish timescale.). It's not a multiple-personality thing.

/nickai/

Posts

Sorted by New

Wiki Contributions

Comments

Sorted by

Now I'm immediately going to this thought chain:

Maybe such an IQ test could be designed. --> Ah, but then specialized AIs (debatably "all of them" or "most of the non-LLM ones") would just fail equally! --> Maybe that's okay? Like, you don't give a blind kid a typical SAT, you give them a braille SAT or a reader or something (or, often, no accommodation).

The capabilities-in-the-wrong-order point is spot-on.

I could also imagine a thing that's not so similar to IQ in the details, but still measures some kind of generic "information processing/retention/[other activities] capability"... but, as noted, we already can't predict capabilities, so such a measure would need to either solve that or else be less useful than e.g. parameter-count. --> If we classified and properly understood a taxonomy of capabilities, that'd help build such a measure! --> That task, itself, is probably really bloody difficult compared with "some hypothetical tactic that attacks the 'IQ-style fire-alarm' narrative head-on / in a different way".

(A toy demo of "the IQ-style fire alarm won't come" could be a good subtask of "toy demo of misalignment that actually convinces people"... OR it could end up as a "toy demo of capabilities that just pushes people to work more on capabilities", which is obviously bad.)

Even after thinking through these issues in SERI-MATS, and already agreeing with at least most of this post, I was surprised upon reading it how many new-or-newish-to-me ideas and links it contained.

I'm not sure if that's more of a failure of me, or of the alignment field to notice "things that are common between a diverse array of problems faced". Kind of related to my hunch that multiple alignment concepts ("goals", "boundaries", "optimization") will turn out to be isomorphic to the same tiny-handful of mathematical objects.

What's the quickest way to get up to speed and learn the relevant skills, to the level expected of someone working at OpenAI?