I'm an admin of LessWrong. Here are a few things about me.
Curated. I thought this was a valuable list of areas, most of which I haven't thought that much about, and I've certainly never seen them brought together in one place before, which I think itself is a handy pointer to the sort of work that needs doing.
I don’t think it applies to safety researchers at AI Labs though, I am shocked how much those folks can make.
You explicitly assume this stuff away, but I believe under this setup that the subagents would be incentivized to murder each other before the button is pressed (to get rid of that annoying veto).
I also note that if one agent becomes way way smarter than the other, that this balance may not work out.
Even if it works, I don't see how to set up the utility functions such that humans aren't disempowered. That's a complicated term!
Overall a very interesting idea.
+9. This is a powerful set of arguments pointing out how humanity will literally go extinct soon due to AI development (or have something similarly bad happen to us). A lot of thought and research went into an understanding of the problem that can produce this level of understanding of the problems we face, and I'm extremely glad it was written up.
Someone working full-time on an approach to the alignment problem that they feel optimistic about, and writing annual reflections on their work, is something that has been sorely lacking. +4
I don't want to double the comment count I submit to Recent Discussion, so I'll just update this comment with the things I've cut.
12/06/2023 Comment on Originality vs. Correctness
It's fun to take the wins of one culture and apply them to the other, people are very shocked that you found some hidden value to be had (though it often isn't competitive value / legible to the culture). And if you manage to avoid some terrible decison people speak about how wise you are to have noticed.
(Those are the best cases, often of course people are like "this is odd, I'm going to pretend I didn't see this" and then move on.)
For too long, I have erred on the side of writing too much.
The first reason I write is in order to find out what I think.
This often leaves my writing long and not very defensible.
However, editing the whole thing is so much extra work after I already did all the work figuring out what I think.
Sometimes it goes well if I just scrap the whole thing and concisely write my conclusion.
But typically I don't want to spend the marginal time.
Another reason my writing is too long is because I have extra thoughts I know most people won't find useful.
But I've picked up a heuristic that says it's good to share actual thinking because sometimes some people find it surprisingly useful, so I hit publish anyway.
Nonetheless, I endeavor to write shorter.
So I think I shall experiment with cutting the bits off of comments that represent me thinking aloud, but aren't worth the space in the local conversation.
And I will put them here, as the dregs of my cognition. I shall hopefully gather data over the next month or two and find out whether they are in fact worthwhile.
I just gave this a re-read, I forgot what a trip it is to read the thoughts of Eliezer Yudkowsky. It continues to be some of my favorite stuff in recent years written on LessWrong.
It's hard to relate to the world with a level of mastery over basic ideas as Eliezer has. I don't mean with this to vouch that his perspective is certainly correct, but I believe it is at least possible, and so I think he aspires to a knowledge of reality that I rarely if ever aspire to. Reading it inspires me to really think about how the world works, and really figure out what I know and what I don't. +9
(And the smart people dialoguing with him here are good sports for keeping up their side of the argument.)
They are not being treated worse than foot soldiers, because they do not have an enemy army attempting to murder them during the job. (Unless 'foot soldiers' itself more commonly used as a metaphor for 'grunt work' and I'm not aware of that.)
Or you guys could find a 1-2 hour window to show up and live-chat in a LW dialogue, then publish the results :-)