Personal Blog

7

1 min read

7

This is a repository for miscellaneous short things I want to post. Other people are welcome to make top-level comments here if they want. (E.g., questions for me you'd rather discuss publicly than via PM; links you think will be interesting to people in this comment section but not to LW as a whole; etc.)

Personal Blog
New Comment
3 comments, sorted by Click to highlight new comments since:

Collecting all of the quantitative AI predictions I know of MIRI leadership making on Arbital (let me know if I missed any):

Some caveats:

  • Arbital predictions range from 1% to 99%.
  • I assume these are generally ~5 years old. Views may have shifted.
  • By default, I assume that the standard caveats for probabilities like these apply: I treat these as off-the-cuff ass numbers unless stated otherwise, products of 'thinking about the problem on and off for years and then querying my gut about what it expects to actually see', more so than of building Guesstimate models or trying to hard to make sure all the probabilities are perfectly coherent.

Inconsistencies are flags 'something is wrong here', but ass numbers are vague and unreliable enough that they're to be expected to some degree. Similarly, ass numbers are often unstable hour-to-hour and day-to-day.

On my model, the point of ass numbers isn't to demand perfection of your gut (e.g., of the sort that would be needed to avoid multiple-stage fallacies when trying to conditionalize a lot), but to:

  1. Communicate with more precision than English-language words like 'likely' or 'unlikely' allow. Even very vague or uncertain numbers will, at least some of the time, be a better guide than natural-language terms that weren't designed to cover the space of probabilities (and that can vary somewhat in meaning from person to person).
  2. At least very vaguely and roughly bring your intuitions into contact with reality, and with each other, so you can more readily notice things like 'I'm miscalibrated', 'reality went differently than I expected', 'these two probabilities don't make sense together', etc.

It may still be a terrible idea to spend too much time generating ass numbers, since "real numbers" are not the native format human brains compute probability with, and spending a lot of time working in a non-native format may skew your reasoning.

(Maybe there's some individual variation here?)

But they're at least a good tool to use sometimes, for the sake of crisper communication, calibration practice (so you can generate non-awful future probabilities when you need to), etc.

In the context of a conversation with Balaji Srinivasan about my AI views snapshot, I asked Nate Soares what sorts of alignment results would impress him, and he said:

example thing that would be relatively impressive to me: specific, comprehensive understanding of models (with the caveat that that knowledge may lend itself more (and sooner) to capabilities before alignment). demonstrated e.g. by the ability to precisely predict the capabilities and quirks of the next generation (before running it)

i'd also still be impressed by simple theories of aimable cognition (i mostly don't expect that sort of thing to have time to play out any more, but if someone was able to come up with one after staring at LLMs for a while, i would at least be impressed)

fwiw i don't myself really know how to answer the question "technical research is more useful than policy research"; like that question sounds to me like it's generated from a place of "enough of either of these will save you" whereas my model is more like "you need both"

tho i'm more like "to get the requisite technical research, aim for uploads" at this juncture

if this was gonna be blasted outwards, i'd maybe also caveat that, while a bunch of this is a type of interpretability work, i also expect a bunch of interpretability work to strike me as fake, shallow, or far short of the bar i consider impressive/hopeful

(which is not itself supposed to be any kind of sideswipe; i applaud interpretability efforts even while thinking it's moving too slowly etc.)