LessWrong team member / moderator. I've been a LessWrong organizer since 2011, with roughly equal focus on the cultural, practical and intellectual aspects of the community. My first project was creating the Secular Solstice and helping groups across the world run their own version of it. More recently I've been interested in improving my own epistemic standards and helping others to do so as well.
That's not really what I had in mind, but I had in mind something less clear than I thought. The spirit is about "can the AI come up with novel concepts",
I think one reason I think the current paradigm is "general enough, in principle", is that I don't think "novel concepts" is really The Thing. I think creativity / intelligence mostly is about is combining concepts, it's just that really smart people are
a) faster in raw horsepower and can handle more complexity at a time
b) have a better set of building blocks to combine or apply to make new concepts (which includes building blocks for building better building blocks)
c) have a more efficient search for useful/relevant building blocks (both metacognitive and object-level).
Maybe you believe this, and think that "well yeah, it's the efficient search that's the important part, which we still don't actually have a real working version of?"?
It seems like the current models have basically all the tools a moderately smart human have, with regards to generating novel ideas, and the thing that they're missing is something like "having a good metacognitive loop such that they notice when they're doing a fake/dumb version of things, and course correcting" and "persistently pursue plans over long time horizons." And it doesn't seem to have zero of either of those, just not enough to get over some hump.
I don't see what's missing that a ton of training on a ton of diverse, multimodal tasks + scaffoldin + data flywheel isn't going to figure out.
(my guess is you took more like 15-25 minutes per question? Hard to tell from my notes, you may have finished early but I don't recall it being crazy early)
(This seems like more time than Buck was taking – the goal was to not get any wrong so it wasn't like people were trying to crank through them in 7 minutes)
The problems I gave were (as listed in the csv for the diamond problems)
I at least attempted to be filtering the problems I gave you for GPQA diamond, although I am not very confident that I succeeded.
(Update: yes, the problems John did were GPQA diamond. I gave 5 problems to a group of 8 people, and gave them two hours to complete however many they thought they could complete without getting any wrong)
Note: I plan to extend the Nomination phase through ~Monday, I didn't mean for it to end partway through the weekend.
I haven't had much success articulating why.
I'd be interested in a more in-depth review where you take another pass at this.
A thing unclear to me: is it worth hiding the authors from the Voting page?
On the first LessWrong Review, we deliberately hid authors and randomized the order of the voting results. A few year later, we've mostly shifted towards "help people efficiently sort through the information" rather than "making sure the presentation is random/fair." It's not like people don't know who the posts are by once they start reading them.
Curious what people think.
I would find this post easier to remember and link to if it were called "Serial vs Parallel Research Time", or something like that which points more at the particular insight the post provides.
Yeah the LW team has been doing this sort of thing internally, still in the experimental phase. I don't know if we've used all the tricks listed here yet.
My current guess is:
1. This is more relevant for up-to-the first couple generations of "just barely superintelligent" AIs.
2. I don't really expect it to be the deciding factor after many iterations of end-to-end RSI that gets you to the "able to generate novel scientific or engineering insights much faster than a human or institution could."
I do think it's plausible that the initial bias towards "evil/hackery AI" could start it off in a bad basin of attraction, but a) even if you completely avoided that, I would still basically expect this to rediscover this on it's own as it gained superhuman levels of competence, b) one of the things I most want to use a slightly-superhuman AI to do is to robustly align massively superhuman AI, and I don't really see how to do that without directly engaging with the knowledge of the failure modes there.
I think there are other plans that route more though "use STEM AI to build an uploader or bioenhancer, and then have an accelerated human-psyche do the technical philosophy necessary to handle the unbounded alignment case. I could see that being the right call, and I could imagine the bias from the "already knows about deceptive alignment etc" being large-magnitude enough to matter in the initial process. [edit: In those cases I'd probably want to filter out a lot more than just "unfriendly AI strategies"]
But, basically, how this applies depends on what it is you're trying to do with the AI, and what stage/flavor of AI you're working with and how it's helping.