Draft papers for REALab and Decoupled Approval on tampering

Ramana Kumar

28 Draft papers for REALab and Decoupled Approval on tampering

by Jonathan Uesato, Ramana Kumar

28th Oct 2020

1 min read

2

28

Embedded AgencyReinforcement learningReward FunctionsWireheadingAI

Frontpage

Mentioned in

53DeepMind is hiring for the Scalable Alignment and Alignment Teams

New Comment

2 comments, sorted by

top scoring

Click to highlight new comments since: Today at 3:56 PM

[-]Ben Pace5y40

PSA: You can write comment on PDFs in google drive!

There's a button in the top right that says "Add a comment" on hover-over, then you get to click-and-drag to highlight a box in the PDF where your comment goes. I will leave a test comment on the first PDF so everyone can see that.

(I literally just found this out.)

Reply

[-]Charlie Steiner5y20

Very interesting. Naturalizing feedback (as opposed to directly accessing True Reward) seems like it could lead to a lot of desirable emergent behaviors, though I'm somewhat nervous about reliance on a handwritten model of what reliable feedback is.

Reply

Moderation Log

AI ALIGNMENT FORUM
AF

AI ALIGNMENT FORUM
AF

28

Draft papers for REALab and Decoupled Approval on tampering

28