AI ALIGNMENT FORUM
AF

AI-Assisted AlignmentAI
Frontpage

9

Research request (alignment strategy): Deep dive on "making AI solve alignment for us"

by JanB
1st Dec 2022
1 min read
3

9

AI-Assisted AlignmentAI
Frontpage
Research request (alignment strategy): Deep dive on "making AI solve alignment for us"
2JanB
New Comment
1 comment, sorted by
top scoring
Click to highlight new comments since: Today at 8:15 AM
[-]JanB3y20

There is now also this write-up by Jan Leike: https://www.lesswrong.com/posts/FAJWEfXxws8pMp8Hk/link-why-i-m-optimistic-about-openai-s-alignment-approach

Reply
Moderation Log
More from JanB
View more
Curated and popular this week
1Comments

We might be able to train AI alignment assistants that massively accelerate/improve the alignment research that gets done. These assistants need not have (strongly) superhuman capabilities or be highly agentic, they just need to be capable and aligned enough to allow us to offload most work on alignment, safety, and related problems to them.

This seems to be a key part of OpenAI's alignment strategy. The best explanation of this strategy that I've seen is maybe A minimal viable product for alignment, with an excellent discussion in the comment section.

If this strategy is promising, it likely recommends fairly different prioritisation from what the alignment community is currently doing (see, e.g. Beth Barnes's ideas here, or my upcoming post on "non-scalable oversight"---i.e. pragmatic improvements to oversight that would help with training alignment assistant but which cannot directly scale to oversight of superhuman systems).

I haven't seen any deep treatment of the viability of this strategy and its implications. I think such an analysis would be pretty useful.

I could potentially provide funding for such an analysis or help with obtaining funding.

Mentioned in
12Take 2: Building tools to help build FAI is a legitimate strategy, but it's dual-use.