A putative new idea for AI control; index here.
I've been posting a lot on value/reward learning recently, and, as usual, the process of posting (and some feedback) means that those posts are partially superseded already - and some of them are overly complex.
So here I'll try and briefly summarise my current insights, with links to the other posts if appropriate (a link will cover all the points noted since the previous link):