2198

AI ALIGNMENT FORUM
AF

2197

My recent posts — AI Alignment Forum

Personal Blog

5

My recent posts

by paulfchristiano

29th Nov 2016

1 min read

5

Personal Blog

New Comment

More from paulfchristiano

Curated and popular this week

Over at medium, I'm continuing to write about AI control; here's a roundup from the last month.

Many of these seem like interesting things to discuss here; would it be better to post each of these as a link when I write it?

#Strategy

Prosaic AI control argues that AI control research should first consider the case where AI involves no "unknown unknowns."
Handling destructive technology tries to explain the upside of AI control, if we live in a universe where we eventually need to build a singleton anyway.
Hard-core subproblems explains a concept I find helpful for organizing research.

#Building blocks of ALBA

Security amplification and reliability amplification are complements to capability amplification. Ensembling for reliability is now implemented in ALBA on github.
Meta-execution is my current leading contender for security and capability amplification. It’s totally unclear how well it can work (some relevant speculation).
Thoughts on reward engineering discusses a bunch of prosaic but important issues when designing reward functions.

Terminology and concepts

Clarifying the distinction between safety, control and alignment.
Benignity may be a useful invariant when designing aligned AI.