Anthony DiGiovanni

Researcher at the Center on Long-Term Risk. All opinions my own.

Wikitag Contributions

Comments

Sorted by

No, at some point you "jump all the way" to AGI

I'm confused as to what the actual argument for this is. It seems like you've just kinda asserted it. (I realize in some contexts all you can do is offer an "incredulous stare," but this doesn't seem like the kind of context where that suffices.)

I'm not sure if the argument is supposed to be the stuff you say in the next paragraph (if so, the "Also" is confusing).

I think you might be misunderstanding Jan's understanding. A big crux in this whole discussion between Eliezer and Richard seems to be: Eliezer believes any AI capable of doing good alignment research—at least good enough to provide a plan that would help humans make an aligned AGI—must be good at consequentialist reasoning in order to generate good alignment plans. (I gather from Nate's notes in that conversation plus various other posts that he agrees with Eliezer here, but not certain.) I strongly doubt that Jan just mistook MIRI's focus on understanding consequentialist reasonsing for a belief that alignment research requires being a consequentialist reasoner.

Basic questions: If the type of Adv(M) is a pseudo-input, as suggested by the above, then what does Adv(M)(x) even mean? What is the event whose probability is being computed? Does the unacceptability checker C also take real inputs as the second argument, not just pseudo-inputs—in which case I should interpret a pseudo-input as a function that can be applied to real inputs, and Adv(M)(x) is the statement "A real input x is in the pseudo-input (a set) given by Adv(M)"?

(I don't know how pedantic this is, but the unacceptability penalty seems pretty important, and I struggle to understand what the unacceptability penalty is because I'm confused about Adv(M)(x).)

(At the risk of necroposting:) Was this paper ever written? Can't seem to find it, but I'm interested in any developments on this line of research.