I don't know how to quickly convey why I find this point so helpful, but I find this to be a helpful pointer to a key problem, and the post is quite short, and I hope someone else positively votes on it. +4.
Or you guys could find a 1-2 hour window to show up and live-chat in a LW dialogue, then publish the results :-)
Curated. I thought this was a valuable list of areas, most of which I haven't thought that much about, and I've certainly never seen them brought together in one place before, which I think itself is a handy pointer to the sort of work that needs doing.
You explicitly assume this stuff away, but I believe under this setup that the subagents would be incentivized to murder each other before the button is pressed (to get rid of that annoying veto).
I also note that if one agent becomes way way smarter than the other, that this balance may not work out.
Even if it works, I don't see how to set up the utility functions such that humans aren't disempowered. That's a complicated term!
Overall a very interesting idea.
+9. This is a powerful set of arguments pointing out how humanity will literally go extinct soon due to AI development (or have something similarly bad happen to us). A lot of thought and research went into an understanding of the problem that can produce this level of understanding of the problems we face, and I'm extremely glad it was written up.
Someone working full-time on an approach to the alignment problem that they feel optimistic about, and writing annual reflections on their work, is something that has been sorely lacking. +4
This feels like amusingly like tricking a child. "Remember kiddo, you can reason out loud about where you're going to hide and I won't hear it. Now let's play hide and seek!"