User Comment Replies — AI Alignment Forum

Let’s think about slowing down AI

Larks1y31Review for 2022 Review

This post seems like it was quite influential. This is basically a trivial review to allow the post to be voted on.

Common misconceptions about OpenAI

Larks3y10

Thanks!

Common misconceptions about OpenAI

Larks3y118

Alignment research: 30

Could you share some breakdown for what these people work on? Does this include things like the 'anti-bias' prompt engineering?

Jacob Hilton3y84

It includes the people working on the kinds of projects I listed under the first misconception. It does not include people working on things like the mitigation you linked to. OpenAI distinguishes internally between research staff (who do ML and policy research) and applied staff (who work on commercial activities), and my numbers count only the former.

2021 AI Alignment Literature Review and Charity Comparison

Larks3y10

Thanks, that's very kind of you!

2021 AI Alignment Literature Review and Charity Comparison

Larks3y40

Is your argument about personnel overlap that one could do some sort of mixed effect regression, with location as the primary independent variable and controls for individual productivity? If so I'm so somewhat skeptical about the tractability: the sample size is not that big, the data seems messy, and I'm not sure it would capture necessarily the fundamental thing we care about. I'd be interested in the results if you wanted to give it a go though!

More importantly, I'm not sure this analysis would be that useful. Geography-based-priors only really seem us... (read more)

2Owain Evans3y

I agree with most of this -- and my original comment should have been clearer. I'm wondering if the past five years of direct observations leads you to update the geography-based prior (which has been included in your alignment review for since 2018). How much do you expect the quality of alignment work to differ from a new organization based in the Bay vs somewhere else? (No need to answer: I realize this is probably a small consideration and I don't want to start an unproductive thread on this topic).

2021 AI Alignment Literature Review and Charity Comparison

Larks3y40

Thanks, fixed in both copies.

2021 AI Alignment Literature Review and Charity Comparison

Larks3y30

I prioritized posts by named organizations.
- Diffractor does not list any institutional affiliations on his user page.
- No institution I noticed listed the post/sequence on their 'research' page.
- No institution I contacted mentioned the post/sequence.
No post in the sequence was that high in the list of 2021 Alignment Forum posts, sorted by karma.
Several other filtering methods also did not identify the post

However upon reflection it does seem to be MIRI-affiliated so perhaps should have been affiliated; if I have time I may review and edit it in later.

Vanessa Kosoy3y100

Notice that in MIRI's summary of 2020 they wrote "From our perspective, our most interesting public work this year is Scott Garrabrant’s Cartesian frames model and Vanessa Kosoy’s work on infra-Bayesianism."

2020 AI Alignment Literature Review and Charity Comparison

Larks4y10

Thanks, added.

2020 AI Alignment Literature Review and Charity Comparison

Larks4y30

Hey Daniel, thanks very much for the comment. In my database I have you down as class of 2020, hence out of scope for that analysis, which was class of 2018 only. I didn't include 2019 or 2020 classes because I figured it takes time to find your footing, do research, write it up etc., so absence of evidence would not be very strong evidence of absence. So please don't consider this as any reflection on you. Ironically I actually did review one of your papers in the above - this one - which I did indeed think was pretty relevant! (Cntrl-F 'Hendrycks' to find the paragraph in the article). Sorry if this was not clear from the text.

2018 AI Alignment Literature Review and Charity Comparison

Larks5y20Review for 2018 Review

See next year's post here.

AI ALIGNMENT FORUM
AF

All of Larks's Comments + Replies