AI ALIGNMENT FORUM
AF

Inverse Reinforcement LearningAI
Frontpage

7

[ Question ]

Is CIRL a promising agenda?

by Chris_Leong
23rd Jun 2022
1 min read
A
0
16

7

Inverse Reinforcement LearningAI
Frontpage
New Answer
New Comment
Moderation Log
More from Chris_Leong
View more
Curated and popular this week
A
0
0

Richard Ngo writes

Since Stuart Russell's proposed alignment solution in Human Compatible is the most publicly-prominent alignment agenda, I should be more explicit about my belief that it almost entirely fails to address the core problems I expect on realistic pathways to AGI.

Specifying an update rule which converges to a desirable goal is just a reframing of the problem of specifying a desirable goal, with the "uncertainty" part as a red herring. https://arbital.com/p/updated_deference/… In other words, Russell gives a wrong-way reduction.

I originally included CIRL in my curriculum (https://docs.google.com/document/d/1mTm_sT2YQx3mRXQD6J2xD2QJG1c3kHyvX8kQc_IQ0ns/edit?usp=drivesdk…) out of some kind of deferent/catering to academic mainstream instinct. Probably a mistake; my current annoyance about deferential thinking has reminded me to take it out.

Howie writes:

My impression is that ~everyone I know in the alignment community is very pessimistic about SR's agenda. Does it sound right that your view is basically a consensus? (There's prob some selection bias in who I know).

Richard responds:

I think it's fair to say that this is a pretty widespread opinion. Partly it's because Stuart is much more skeptical of deep learning (and even machine learning more generally!) than almost any other alignment researcher, and so he's working in a different paradigm.

Is Richard correct and if so why? (I would also like a clearer explanation why Richard is skeptical of Stuart's agenda. I agree that the reframing doesn't completely solve the problem, but I don't understand why it can't be a useful piece).

Mentioned in
95(My understanding of) What Everyone in Technical Alignment is Doing and Why