Bottle Caps Aren't Optimisers

[-]evhub6y90Nomination for 2018 Review

Daniel Filan's bottle cap example was featured prominently in "Risks from Learned Optimization" for good reason. I think it is a really clear and useful example of why you might want to care about the internals of an optimization algorithm and not just its behavior, and helped motivate that framing in the "Risks from Learned Optimization" paper.

[-]DanielFilan6y50

Daniel Filan's bottle cap example

Note that Abram Demski deserves a large part of the credit for that specific example (somewhere between 'half' and 'all'), as noted in the final sentence of the post.

[-]Raemon6y10

A reminder, since this looks like it has a few upvotes from AF users: posts need 2 nominations to proceed to the review round.

[-]DanielFilan6y60Review for 2018 Review

Review by the author:

I continue to endorse the contents of this post.

I don't really think about the post that much, but the post expresses a worldview that shapes how I do my research - that agency is a mechanical fact about the workings of a system.

To me, the main contribution of the post is setting up a question: what's a good definition of optimisation that avoids the counterexamples of the post? Ideally, this definition would refer or correspond to the mechanistic properties of the system, so that people could somehow statically determine whether a given controller was an optimiser. To the best of my knowledge, no such definition has been developed. As such, I see the post as not having kicked off a fruitful public conversation, and its value if any lies in how it has changed the way other people think about optimisation.

[-]Li peng Ye2y00

Yes, I kind of agree with you

[-]orthonormal6y50

I'm surprised nobody has yet replied that the two examples are both products of significant optimizers with relevant optimization targets, and that the naive definition seems to work with one modification:

A system is downstream from an optimizer of some objective function to the extent that that objective function attains much higher values than would be attained if the system didn't exist, or were doing some other random thing.

[-]DanielFilan6y20

I'm surprised nobody has yet replied that the two examples are both products of significant optimizers with relevant optimization targets.

Yes, this seems pretty important and relevant.

That being said, I think that that definition suggests that natural selection and/or the earth's crust are downstream from an optimiser of the number of Holiday Inns, or that my liver is downstream from an optimiser from my income, both of which aren't right.

Probably it's important to relate 'natural subgoals' to some ideal definition - which offers some hope, since 'subgoal' is really a computational notion, so maybe investigation along these lines would offer a more computational characterisation of optimisation.

[EDIT: I made this comment longer and more contentful]

[-]orthonormal6y10

Okay, so another necessary condition for being downstream from an optimizer is being causally downstream. I'm sure there are other conditions, but the claim still feels like an important addition to the conversation.

[-]Stuart_Armstrong7y30

I think my syntax/semantics idea is relevant to this question - especially the idea of different sets of environments. https://www.lesswrong.com/posts/EEPdbtvW8ei9Yi2e8/bridging-syntax-and-semantics-empirically

For example, suppose we have a super-intelligent bottle cap, dedicated to staying on the bottle (and with some convenient manufacturing arms and manufacturing capability. This seems to be exactly an optimiser, one that we mere humans cannot expect to be able to get off the bottle.

In contrast the standard bottle cap will only remain on the bottle in a much narrower set of circumstances (though the superintelligent bottle cap will also remain on in those circumstances).

So it seems that what distinguishes the standard bottle cap from a genuine optimiser, is that the genuine optimiser will accomplish its role in a much larger set of (possibly antagonistic) environments, while the standard bottle cap will only do so in a much smaller set of circumstances.

[-]Stuart_Armstrong6y10Nomination for 2018 Review

It's helped me hone my thinking on what is and isn't an optimiser (and a wireheader, and so on, for associated concepts).

AI ALIGNMENT FORUM
AF

AI ALIGNMENT FORUM
AF

30

30