User Comment Replies — AI Alignment Forum

METR: Measuring AI Ability to Complete Long Tasks

No, at some point you "jump all the way" to AGI

I'm confused as to what the actual argument for this is. It seems like you've just kinda asserted it. (I realize in some contexts all you can do is offer an "incredulous stare," but this doesn't seem like the kind of context where that suffices.)

I'm not sure if the argument is supposed to be the stuff you say in the next paragraph (if so, the "Also" is confusing).

4Daniel Kokotajlo8d

Great question. You are forcing me to actually think through the argument more carefully. Here goes: Suppose we defined "t-AGI" as "An AI system that can do basically everything that professional humans can do in time t or less, and just as well, while being cheaper." And we said AGI is an AI that can do everything at least as well as professional humans, while being cheaper. Well, then AGI = t-AGI for t=infinity. Because for anything professional humans can do, no matter how long it takes, AGI can do it at least as well. Now, METR's definition is different. If I understand correctly, they made a dataset of AI R&D tasks, had humans give a baseline for how long it takes humans to do the tasks, and then had AIs do the tasks and found this nice relationship where AIs tend to be able to do tasks below time t but not above, for t which varies from AI to AI and increases as the AIs get smarter. ...I guess the summary is, if you think about horizon lengths as being relative to humans (i.e. the t-AGI definition above) then by definition you eventually "jump all the way to AGI" when you strictly dominate humans. But if you think of horizon length as being the length of task the AI can do vs. not do (*not* "as well as humans," just "can do at all") then it's logically possible for horizon lengths to just smoothly grow for the next billion years and never reach infinity. So that's the argument-by-definition. There's also an intuition pump about the skills, which also was a pretty handwavy argument, but is separate.

[Link] Why I’m optimistic about OpenAI’s alignment approach

Anthony DiGiovanni2y35

I think you might be misunderstanding Jan's understanding. A big crux in this whole discussion between Eliezer and Richard seems to be: Eliezer believes any AI capable of doing good alignment research—at least good enough to provide a plan that would help humans make an aligned AGI—must be good at consequentialist reasoning in order to generate good alignment plans. (I gather from Nate's notes in that conversation plus various other posts that he agrees with Eliezer here, but not certain.) I strongly doubt that Jan just mistook MIRI's focus on understandin... (read more)

1Ramana Kumar2y

I think you're right - thanks for this! It makes sense now that I recognise the quote was in a section titled "Alignment research can only be done by AI systems that are too dangerous to run".

Relaxed adversarial training for inner alignment

Anthony DiGiovanni2y20

$Adv : M \to X_{pseudo}$
$L_{M} = P_{M} (Adv (M) (x) | x \sim deploy) \cdot P_{M} (C (M, x) | Adv (M) (x), x \sim deploy)$

Basic questions: If the type of Adv(M) is a pseudo-input, as suggested by the above, then what does Adv(M)(x) even mean? What is the event whose probability is being computed? Does the unacceptability checker C also take real inputs as the second argument, not just pseudo-inputs—in which case I should interpret a pseudo-input as a function that can be applied to real inputs, and Adv(M)(x) is the statement "A real input x is in the pseudo-input (a set) given by... (read more)

2Evan Hubinger2y

The idea is that we're thinking of pseudo-inputs as “predicates that constrain X” here, so, for α∈Xpseudo, we have α:X→B.

Cooperative Oracles: Introduction

Anthony DiGiovanni4y00

(At the risk of necroposting:) Was this paper ever written? Can't seem to find it, but I'm interested in any developments on this line of research.

1Scott Garrabrant4y

Im not intending to go back to thinking about this anymore, but diffractor is the person who was thinking /writing about it. https://www.lesswrong.com/posts/SgkaXQn3xqJkGQ2D8/cooperative-oracles

AI ALIGNMENT FORUM
AF

All of Anthony DiGiovanni's Comments + Replies