'define a system that will let you press its off-switch without it trying to make you press the off-switch' presents no challenge at all to them...
...building a Thing all of whose designs and strategies will also contain an off-switch, such that you can abort them individually and collectively and then get low impact beyond that point.
This has been proposed before (as their citations indicate), and this particular proposal does not seem to introduce any particularly novel (or good) solutions.
I think the problems with myopic agents (of which this is but a special case) are made clearer by looking at current LLMs like the hobbyist AutoGPT. Most discussions of myopic agents seem to have in mind a simplistic scenario of a single persistent agent located in a single PC running only 1 computation at a time with no self-modification or ML-related programming or change of any parameters; and t... (read more)
I believe this has been proposed before (I'm not sure what the first time was).
The main obstacles is that this still doesn't solve impact regularization, and a more generalized type of shutdownability then you presented.
... (read more)This has been proposed before (as their citations indicate), and this particular proposal does not seem to introduce any particularly novel (or good) solutions.
I think the problems with myopic agents (of which this is but a special case) are made clearer by looking at current LLMs like the hobbyist AutoGPT. Most discussions of myopic agents seem to have in mind a simplistic scenario of a single persistent agent located in a single PC running only 1 computation at a time with no self-modification or ML-related programming or change of any parameters; and t... (read more)