Related to the role of peer review: a lot stuff on LW/AF is relatively exploratory, feeling out concepts, trying to figure out the right frames, etc. We need to be generally willing to ask discuss incomplete ideas, stuff that hasn't yet had the details ironed out. For that to succeed, we need community discussion standards which tolerate a high level of imperfect details or incomplete ideas. I think we do pretty well with this today.
But sometimes, you want to be like "come at me bro". You've got something that you're pretty highly confident is right, and you want people to really try to shoot it down (partly as a social mechanism to demonstrate that the idea is in fact as solid and useful as you think it is). This isn't something I'd want to be the default kind of feedback, but I'd like for authors to be able to say "come at me bro" when they're ready for it, and I'd like for posts which survive such a review to be perceived as more epistemically-solid/useful.
With that in mind, here's a few of my own AF posts which I'd submit for a "come at me bro" review:
For all of these, things like "this frame is wrong" or "this seems true but not useful" are valid objections. I'm not just claiming that the proofs hold.
But sometimes, you want to be like "come at me bro". You've got something that you're pretty highly confident is right, and you want people to really try to shoot it down (partly as a social mechanism to demonstrate that the idea is in fact as solid and useful as you think it is). This isn't something I'd want to be the default kind of feedback, but I'd like for authors to be able to say "come at me bro" when they're ready for it, and I'd like for posts which survive such a review to be perceived as more epistemically-solid/useful.
Yeah, when I think about implementing a review process for the Alignment Forum, I'm definitely thinking about something you can ask for more polished research, in order to get external feedback and a tag saying this is peer review (for prestige and reference).
Thanks for the suggestions! We'll consider them. :)
Steve's big thoughts on alignment in the brain probably deserve a review. Component posts include https://www.lesswrong.com/posts/diruo47z32eprenTg/my-computational-framework-for-the-brain , https://www.lesswrong.com/posts/DWFx2Cmsvd4uCKkZ4/inner-alignment-in-the-brain , https://www.lesswrong.com/posts/jNrDzyc8PJ9HXtGFm/supervised-learning-of-outputs-in-the-brain
Interestingly, I think there aren't any of my posts I should recommend - basically all of them are speculative. However, I did have a post called Gricean communication and meta-preferences that I think is still fairly interesting speculation, and I've never gotten any feedback on it at all, so I'll happily ask for some: https://www.lesswrong.com/posts/8NpwfjFuEPMjTdriJ/gricean-communication-and-meta-preferences .
Suggestion 1: Utility != reward by Vladimir Mikulik. This post attempts to distill the core ideas of mesa alignment. This kind of distillment increases the surface area of AI Alignment, which is one of the key bottlenecks of the area (that is, getting people familiarized with the field, motivated to work on it and with a handle on some open questions to work on). I would like an in-depth review because it might help us learn how to do it better!
Suggestion 2: me and my coauthor Pablo Moreno would be interested in feedback in our post about quantum computing and AI alignment. We do not think that the ideas of the paper are useful in the sense of getting us closer to AI alignment, but I think it is useful to have signpost explaining why avenues that might seem attractive to people coming into the field are not worth exploring, while introducing them to the field in a familiar way (in this case our audience are quantum computing experts). One thing that confuses me is that some people have approached me after publishing the post asking me why I think that quantum computing is useful for AI alignment, so I'd be interested in feedback on what went wrong on the communication process given the deflationary nature of the article.
Great idea! Thanks for doing this!
Unsurprisingly, I'd love it if you reviewed any of my posts.
Since you said "technical," I suggest this one in particular. It's a big deal IMO because Armstrong & Mindermann's argument has been cited approvingly by many people and seems to be still widely regarded as correct, but if I'm right, it's actually a bad argument. I'd love a third perspective on this that helps me figure out what's going on.
More generally I'd recommend sorting all AF posts by karma and reviewing the ones that got the most, since presumably karma correlates with how much people here like the post and thus it's extra important to find flaws in high-karma posts.
I was indeed expecting you to suggest one of your post. But that's one of the valid reasons I listed, and I didn't know about this one, so it's great!
We'll consider it. :)
I wrote this post as a summary of a paper I published. It didn't get much attention, so I'd be interesting in having you all review it.
To say a little more, I think the general approach I lay out in here for taking towards safety work is worth considering more deeply and points towards a better process for choosing interventions in attempts to build aligned AI. I think what's more important than the specific examples where I apply the method is the method itself, but thus far as best I can tell folks did not much engage with that, so unclear to me if that's because they disagree, think it's too obvious, or what.
I think the generalized insight from Armstrong's no free lunch paper is still underappreciated in that I sometimes see papers that, to me, seem to run up against this and fail to realize there's a free variable in their mechanisms that needs to be fixed if they want them to not go off in random directions.
https://www.lesswrong.com/posts/LRYwpq8i9ym7Wuyoc/other-versions-of-no-free-lunch-in-value-learning
Another post of mine I'll recommend you:
https://www.lesswrong.com/posts/k8F8TBzuZtLheJt47/deconfusing-human-values-research-agenda-v1
This is the culmination of a series of post on "formal alignment", where I start out saying "what it would mean to formally state what it would mean to build aligned AI" and then from that try to figure out what we'd have to figure out in order to achieve that.
Over the last year I've gotten pulled in other directions so not pushed this line of research forward much, plus I reached a point with it where it was clear it required different specialization than I have to make additional progress, but I still think it presents a different approach to what others are doing in the space of work towards AI alignment and think you might find it interesting to review (along with the preceding posts in the series) for that reason.
Thanks for the suggestion!
We want to go through the different research agendas (and I already knew about yours), as they give different views/paradigms on AI Alignment. Yet I'm not sure how relevant a review of such posts are. In a sense, the "reviewable" part is the actual research that underlies the agenda, right?
How does one write a good and useful review of a technical post on the Alignment Forum?
I don’t know. Like many people, I tend to comment and give feedback on posts closely related to my own research, or to write down my own ideas when reading the paper. Yet this is quite different from the quality peer-review that you can get (if you’re lucky) in more established fields. And from experience, such quality reviews can improve the research dramatically, give some prestige to it, and help people navigate the field.
In an attempt to understand what makes a good review for the Alignment Forum, Joe Collman, Jérémy Perret (Gyrodiot on LW) and me are launching a project to review many posts in depth. The goal is to actually write reviews of various posts, get feedback on their usefulness from authors and readers alike, and try to extract from them some knowledge about how to go about doing such reviews for the field. We hope to have enough insights to eventually write some guidelines that could be used in an official AF review process.
On that note, despite the support of members of the LW team, this project isn’t official. It’s just the three of us trying out something.
Now, the reason for the existence of this post (and why it is a question) is that we’re looking for posts to review. We already have some in mind, but they are necessarily biased towards what we’re more comfortable about. This is where you come in, to suggest a more varied range of posts.
Anything posted on the AF goes, although we will not take into account things that are clearly not “research outputs” (like transcripts of podcasts or pointers to surveys). This means that posts about specific risks, about timelines, about deconfusion, about alignment schemes, and more, are all welcome.
We would definitely appreciate it if you add a reason to your suggestion, to help us decide whether to include the post on our selection. Here is a (non-exhaustive) list of possible reasons:
Thanks in advance, and we’re excited about reading your suggestions!