AI grantmaking at Open Philanthropy.
I used to give careers advice for 80,000 hours.
Thanks for writing this up! I've found this frame to be a really useful way of thinking about GPT-like models since first discussing it.
In terms of future work, I was surprised to see the apparent low priority of discussing pre-trained simulators that were then modified by RLHF (buried in the 'other methods' section of 'Novel methods of process/agent specification'). Please consider this comment a vote for you to write more on this! Discussion seems especially important given e.g. OpenAI's current plans. My understanding is that Conjecture is overall very negative on RLHF, but that makes it seem more useful to discuss how to model the results of the approach, not less, to the extent that you expect this framing to help shed light what might go wrong.
It feels like there are a few different ways you could sketch out how you might expect this kind of training to go. Quick, clearly non-exhaustive thoughts below:
Yeah this is the impression I have of their views too, but I think there are good reasons to discuss what this kind of theoretical framework says about RL anyway, even if you're very against pushing the RL SoTA.