Two senses of “optimizer”

[-]evhub6y60

The implicit assumption seems to be that an optimizer_1 could turn into an optimizer_2 unexpectedly if it becomes sufficiently powerful. It is not at all clear to me that this is the case – I have not seen any good argument to support this, nor can I think of any myself.

I think that this is one of the major ways in which old discussions of optimization daemons would often get confused. I think the confusion was coming from the fact that, while it is true in isolation that an optimizer_1 won't generally self-modify into an optimizer_2, there is a pretty common case in which this is a possibility: the presence of a training procedure (e.g. gradient descent) which can perform the modification from the outside. In particular, it seems very likely to me that there will be many cases where you'll get an optimizer_1 early in training and then an optimizer_2 later in training.

That being said, while having an optimizer_2 seems likely to be necessary for deceptive alignment, I think you only need an optimizer_1 for pseudo-alignment: every search procedure has an objective, and if that objective is misaligned, it raises the possibility of capability generalization without objective generalization.

Also, as a terminological note, I've taken to using "optimizer" for optimizer_1 and "agent" for something closer to optimizer_2, where I've been defining an agent as an optimizer that is performing a search over what its own action should be. I prefer that definition to your definition of optimizer_2, as I prefer mechanistic definitions over behavioral definitions since I generally find them more useful, though I think your notion of optimizer_2 is also a useful concept.

[-]Joar Skalse6y20

I agree with everything you have said.

[-]Ofer6y10

Also, as a terminological note, I've taken to using "optimizer" for optimizer_1 and "agent" for something closer to optimizer_2, where I've been defining an agent as an optimizer that is performing a search over what its own action should be.

I'm confused about this part. According to this definition, is "agent" a special case of optimizer_1? If so it doesn't seem close to how we might want to define a "consequentialist" (which I think should capture some programs that do interesting stuff other than just implementing [a Turing Machine that performs well on a formal optimization problem and does not do any other interesting stuff]).

[-]abramdemski6y50

I pretty strongly think this is the same distinction as I am pointing at with selection vs control, although perhaps I focus on a slightly broader cluster-y distinction while you have a more focused definition.

I think this distinction is something which people often conflate in computer science more broadly, too. Often, for example, a method will be initially intended for the control case, and people will make 'improvements' to it which only make sense in a selection context. It's easy for things to slide in that direction, because control-type algorithms will often be tested out in computer-simulated environments; but then, you have access to the environment, and can optimize it in more direct ways.

I'm more annoyed by this sort of mix-up than I probably should be.

[-]Rohin Shah6y40

Planned summary:

The first sense of "optimizer" is an optimization algorithm, that given some formally specified problem computes the solution to that problem, e.g. a SAT solver or linear program solver. The second sense is an algorithm that acts upon its environment to change it. Joar believes that people often conflate the two in AI safety.

Planned opinion:

I agree that this is an important distinction to keep in mind. It seems to me that the distinction is whether the optimizer has knowledge about the environment: in canonical examples of the first kind of optimizer, it does not. If we somehow encoded the dynamics of the world as a SAT formula and asked a super-powerful SAT solver to solve for the actions that accomplish some goal, it would look like the second kind of optimizer.

[-]Steven Byrnes6y30

If the super-powerful SAT solver thing finds the plans but doesn't execute them, would you still lump it with optimizer_2? (I know it's just terminology and there's no right answer, but I'm just curious about what categories you find natural.)

(BTW this is more-or-less a description of my current Grand Vision For AGI Safety, where the "dynamics of the world" are discovered by self-supervised learning, and the search process (and much else) is TBD.)

[-]Rohin Shah6y20

Hmm, idk, it feels more like an optimizer_1 in that situation. Now that you've posed this question, the super-powerful SAT solver that acts in the world feels like both an optimizer_1 and an optimizer_2.

[-]David Scott Krueger (formerly: capybaralet)6y20

It seems to me that the distinction is whether the optimizer has knowledge about the environment

Alternatively, you could say the distinction is whether the optimizer cares about the environment. I think there's a sense (or senses?) in which these things can be made/considered equivalent. I don't feel like I totally understand or am satisfied with either way of thinking about it, though.

[-]Gordon Seidoh Worley6y30

This looks like a Cartesian distinction that exists only by virtue of not fully considering the embeddedness of the optimizer.

It only seems that the domain of an optimizer_1 cannot optimize or affect the environment like an optimizer_2 because you are thinking of it as operating in mathematical, ideal terms, rather than as a real system that runs on a computer by doing physics and interacting with the world. An optimizer_1 can smoothly turn into an optimizer_2 in at least two ways. One is via unintended side effects. Another is via scope creep. There is no clean, bright line separating the domain of the optimizer_1 from the rest of reality, and in fact it was always an optimizer_2, just only looking at a narrow slice of the world because you put up some guardrails to keep it there.

The worry is what happens when it jumps the guardrails, or the guardrails fail.

[-]Joar Skalse6y20

I don't think that I'm assuming the existence of some sort of Cartesian boundary, and the distinction between these two senses of "optimizer" does not seem to disappear if you think of a computer as an embedded, causal structure. Could you state more precisely why you think that this is a Cartesian distinction?

[-]Gordon Seidoh Worley6y40

Sure, let's be super specific about it.

Let's say we have something you consider an optimizer_1, a SAT solver. It operates over a set of variables V arranged in predicts P using an algorithm A. Since this is a real SAT solver that is computed rather than a purely mathematical one we think about, it runs on some computer C and thus for each of V, P, and A there is some C(V), C(P), and C(A) that is the manifestation of each on the computer. We can conceptualize what C does to V, P, and A in different ways: it turns them into bytes, it turns A into instructions, it uses C(A) to operate on C(V) and C(P) to produce a solution for V and P.

Now the intention is that the algorithm A is an optimizer_1 that only operates on V and P, but in fact A is never run, properly speaking, C(A) is, and we can only say A is run to the extent C(A) does something to reality that we can set up an isomorphism to A with. So C(A) is only an optimizer_1 to the extent the isomorphism holds and it is, as you defined optimizer_1, "solving a computational optimization problem". But properly speaking C(A) doesn't "know" it's an algorithm: it's just matter arranged in a way that is isomorphic, via some transformation, to A.

So what is C(A) doing then to produce a solution? Well, I'd say it "optimizes its environment", that is literally the matter and its configuration that it is in contact with, so it's an optimizer_2.

You might object that there's something special going on here such that C(A) is still an optimizer_1 because it was set up in a way that isolates it from the broader environment so it stays within the isomorphism, but that's not a matter of classification, that's an engineering problem of making an optimizer_2 behave as if it were an optimizer_1. And a large chunk of AI safety (mostly boxing) is dealing with ways in which, even if we can make something safe in optimizer_1 terms, it may still be dangerous as an optimizer_2 because of unexpected behavior where it "breaks" the isomorphism and does something that might still keep the isomorphism in tact but also does other things you didn't think it would do if the isomorphism were strict.

Put pithily, there's no free lunch when it comes to isomorphisms that allow you to manifest your algorithms to compute them, so you have to worry about the way they are computed.

[-]Ofer6y20

It seems useful to have a quick way of saying:

"The quarks in this box implement a Turing Machine that [performs well on the formal optimization problem P and does not do any other interesting stuff]. And the quarks do not do any other interesting stuff."

(which of course does not imply that the box is safe)

[-]Gordon Seidoh Worley6y10

Sure. Not making the distinction seems important, though, because this post seems to be leaning towards rejecting arguments that depend on noticing that the distinction is leaky. Making it is okay so long as you understand it as "optimizer_1 is a way of looking at things that screens off many messy details of the world so I can focus on only the details I care about right now", but if it becomes conflated with "and if something is an optimizer_1 I don't have to worry about the way it is also an optimizer_2" then that's dangerous.

The author of the post suggests it's a problem that "some arguments related to AI safety that seem to conflate these two concepts". I'd say they don't conflate them, but understand that every optimizer_1 is an optimizer_2.

[-]Ofer6y30

Maybe we're just not using the same definitions, but according to the definitions in the OP as I understand them, a box might indeed contain an arbitrarily strong optimizer_1 while not containing an optimizer_2.

For example, suppose the box contains an arbitrarily large computer that runs a brute-force search for some formal optimization problem. [EDIT: for some optimization problems, the evaluation of a solution might result in the execution of an optimizer_2]

[-]Gordon Seidoh Worley6y10

Yes, and I'm saying that's not possible. Every optimizer_1 is an optimizer_2.

[-]Joar Skalse6y*10

I have already (sort of) addressed this point at the bottom of the post. There is a perspective from which any optimizer_1 can (kind of) be thought of as an optimizer_2, but its unclear how informative this is. It is certainly at least misleading in many cases. Whether or not the distinction is "leaky" in a given case is something that should be carefully examined, not something that should be glossed over.

I also agree with what ofer said.

"even if we can make something safe in optimizer_1 terms, it may still be dangerous as an optimizer_2 because of unexpected behavior where it "breaks" the isomorphism and does something that might still keep the isomorphism in tact but also does other things you didn't think it would do if the isomorphism were strict"

I agree. Part of the reason why it's valuable to make the distinction is to enable more clear thinking about these sorts of issues.

[-]Gordon Seidoh Worley6y10

I think there is only a question of how leaky, but it is always non-zero amounts of leaky, which is the reason Bostrom and others are concerned about it for all optimizers and don't bother to make this distinction.

[-]Ofer6y10

Second, a system can be an “optimizer” in the sense that it optimizes its environment. A human is an optimizer in this sense, because we robustly take actions that push our environment in a certain direction. A reinforcement learning agent can also be thought of as an optimizer in this sense, but confined to whatever environment it is run in.

This definition of optimizer_2 depends on the definition of "environment". It seems that for an RL agent you use the word "environment" to mean the formal environment as defined in RL. How do you define "environment", for this purpose, in non-RL settings?

What should be considered the environment of a SAT solver, or an arbitrary mesa-optimizer that was optimized to be a SAT solver?

[-]David Scott Krueger (formerly: capybaralet)6y10

Yep. Good post. Important stuff. I think we're still struggling to understand all of this fully, and work on indifference seems like the most relevant stuff.

My current take is that as long as there is any "black-box" part of the algorithm which is optimizing for performance, then it may end up behaving like an optimizer_2, since the black box can pick up on arbitrary effective strategies.

(in partial RE to Rohin below): I wouldn't necessarily say that such an algorithm knows about its environment (i.e. has a good model), it may simply have stumbled upon an effective strategy for interacting with it (i.e. have a good policy).

AI ALIGNMENT FORUM
AF

AI ALIGNMENT FORUM
AF

20

20