Defining Myopia

[-]evhub6y*140

Really enjoying this sequence! For the purposes of relaxed adversarial training, I definitely want something closer to absolute myopia than just dynamic consistency. However, I think even absolute myopia is insufficient for the purposes of preventing deceptive alignment.^[1] For example, as is mentioned in the post, an agent that manipulates the world via self-fulfilling prophecies could still be absolutely myopic. However, that's not the only way in which I think an absolutely myopic agent could still be a problem. In particular, I think there's an important distinction which I think about a lot when I think about myopia which I think is missing here regarding the nature of the agent's objective function.

In particular, I think there's a distinction between agents with objective functions over states of the world vs. their own subjective experience vs. their output.^[2] For the sort of myopia that I want, I think you basically need the last thing; that is, you need an objective function which is just a function of the agent's output that completely disregards the consequences of that output. Having an objective function over the state of the world bleeds into full agency too easily and having an objective function over your own subjective experience leads to the possibility of wanting to gather resources to run simulations of yourself to capture your own subjective experience. If your agent simply isn't considering/thinking about how its output will affect the world at all, however, then I think you might be safe.^[3]

I mentioned this to Abram earlier and he agrees with this, though I think it's worth putting here as well. ↩︎
Note that I don't think that these possibilities are actually comprehensive; I just think that they're some of the most salient ones. ↩︎
Instead, I want it to just be thinking about making the best prediction it can. Note that this still leaves open the door for self-fulfilling prophecies, though if the selection of which self-fulfilling prophecy to go with is non-adversarial (i.e. there's no deceptive alignment), then I don't think I'm very concerned about that. ↩︎

[-]Ofer6y20

In particular, I think there’s a distinction between agents with objective functions over states the world vs. their own subjective experience vs. their output.

This line of thinking seems to me very important!

The following point might be obvious to Evan, but it's probably not obvious to everyone: Objective functions over the agent's output should probably not be interpreted as objective functions over the physical representation of the output (e.g. the configuration of atoms in certain RAM memory cells). That would just be a special case of objective functions over world states. Rather, we should probably be thinking about objective functions over the output as it is formally defined by the code of the agent (when interpreting "the code of the agent" as a mathematical object, like a Turing machine, and using a formal mapping from code to output).

Perhaps the following analogy can convey this idea: think about a human facing Newcomb's problem. The person has the following instrumental goal: "be a person that does not take both boxes" (because that makes Omega put $1,000,000 in the first box). Now imagine that that was the person's terminal goal rather than an instrumental goal. That person might be analogous to a program that "wants" to be a program that its (formally defined) output maximizes a given utility function.

[-]abramdemski6y20

I like the way you are almost able to turn this into a 'positive' account (the way generalized objectives are a positive account of myopic goals, but speaking in terms of failure to make certain pareto improvements is not). However, I worry that any goal over stated can be converted to a goal over outputs which amounts to the same thing, by calculating the expected value of the action according to the old goal. Presumably you mean some sufficiently simple action-goal so as to exclude this.

[-]evhub6y*10

Yeah, I agree. I almost said "simple function of the output," but I don't actually think simplicity is the right metric here. It's more like "a function of the output that doesn't go through the consequences of said output."

[-]Raemon6y50

(note, this comment is kinda grumpy but, to be clear, comes from the context of me generally quite respecting you as a writer. :P)

I can't remember if I've complained about this elsewhere, but I have no idea what you mean by myopia, and I was about to comment (on another post) asking if you could write a post that succinctly defined what you meant by myopia (or if the point is that it's hard to define, say that explicitly and give a few short attempted descriptions that could help me triangulate it).

Then I searched to see if you'd already done that, and found this post, which seems like it really wants to open with a succinct description of what myopia is, but doesn't, and leaves me even more confused about it by the end.

(I can see the dictionary definition of myopia is "lack of imagination, foresight, or intellectual insight", but I don't know exactly how that's connecting to the overall model or ideas your building towards here)

[-]abramdemski5y70

Sorry for somehow missing/ignoring this comment for about 5 months. The short answer is that I've been treating "myopia" as a focusing object, and am likely to think any definitions (including my own definitions in the OP) are too hasty and don't capture everything I want to point at. In fact I initially tried to use the new term "partial agency" to make sure people didn't think I was talking about more well-defined versions.

My attempt to give others a handle for the same focusing object was in the first post of the sequence, where I try to triangulate what I'm getting at with a number of examples right at the start:

Stop-gradients. This is a perfectly well-defined idea, used in deep learning.
Myopia in the computer science sense: a failure to look ahead. AKA "greedy algorithms"/"greedy approximations".
Map/territory fit; honing in on the way it's not something you're supposed to just optimize in an ordinary sense.
Goal/territory fit; in the same way as the map should be made to fit the territory and not the other way around, the territory is supposed to be made to fit the goal and not the other way around.
Causal decision theory: performing "causal surgery" on a graph makes you ignore some lines of influence, like stop-gradients.

Perhaps I could give a better short/informal definition now if I sat down to think about it.

[-]Ofer6y*20

[EDIT: 2019-11-09: The argument I made here seems incorrect; see here (H/T Abram for showing me that my reasoning was wrong).]

Conjecture: It is not possible to set up a learning system which gets you full agency in the sense of eventually learning to take all the Pareto improvements.

There's also reason to suspect the conjecture to be false. There's a natural instrumental convergence toward dynamic consistency; a system will self-modify to greater consistency in many cases. If there's an attractor basin around full agency, one would not expect it to be that hard to set up incentives which push things into that attractor basin.

Apart from this, it seems to me that some evolutionary computation algorithms tend to yield models that take all the Pareto improvements, given sufficiently long runtime. The idea is that at any point during training we should expect a model to outperform another model—that takes one less Pareto improvement—on future fitness evaluations (all other things being equal).

[-]abramdemski6y30

Any global optimization technique can find the global optimum of a fixed evaluation function given time. This is a different problem. As I mentioned before, the assumption of simulable environments which you invoke to apply evolutionary algorithms to RL problems assumes too much; it fundamentally changes the problem from a control problem to a selection problem. This is exactly the kind of mistake which prompted me to come up with the selection/control distinction.

How would you propose to apply evolutionary algorithms to online learning? How would you propose to apply evolutionary algorithms to non-episodic environments? I'm not saying it can't be done, but in doing so, your remark will no longer apply. For online non-episodic problems, you don't get to think directly in terms of climbing a fitness landscape.

[-]Ofer6y10

Taking a step back, I want to note two things about my model of the near future (if your model disagrees with those things, that disagreement might explain what's going on in our recent exchanges):

(1) I expect many actors to be throwing a lot of money on selection processes (especially unsupervised learning), and I find it plausible that such efforts would produce transformative/dangerous systems.

(2) Suppose there's some competitive task that is financially important (e.g. algo-trading), for which actors build systems that use a huge neural network trained via gradient descent. I find it plausible that some actors will experiment with evolutionary computation methods, trying to produce a component that will outperform and replace that neural network.

Regarding the questions you raised:

How would you propose to apply evolutionary algorithms to online learning?

One can use a selection process—say, some evolutionary computation algorithm—to produce a system that performs well in an online learning task. The fitness metric would be based on the performance in many (other) online learning tasks for which training data is available (e.g. past stock prices) or for which the environment can be simulated (e.g. Atari games, robotic arm + boxes).

How would you propose to apply evolutionary algorithms to non-episodic environments?

I'm not sure whether this refers to non-episodic tasks (the issue being slower/sparser feedback?) or environments that can't be simulated (in which case the idea above seems to apply: one can use a selection process, using other tasks for which there's training data or for which the environment can be simulated).

[-]abramdemski6y30

(1) I expect many actors to be throwing a lot of money on selection processes (especially unsupervised learning), and I find it plausible that such efforts would produce transformative/dangerous systems.

Sure.

(2) Suppose there's some competitive task that is financially important (e.g. algo-trading), for which actors build systems that use a huge neural network trained via gradient descent. I find it plausible that some actors will experiment with evolutionary computation methods, trying to produce a component that will outperform and replace that neural network.

Maybe, sure.

There seems to be something I'm missing here. What you said earlier:

Apart from this, it seems to me that some evolutionary computation algorithms tend to yield models that take all the Pareto improvements, given sufficiently long runtime. The idea is that at any point during training we should expect a model to outperform another model—that takes one less Pareto improvement—on future fitness evaluations (all other things being equal).

is an essentially mathematical remark, which doesn't have a lot to do with AI timelines and projections of which technologies will be used. I'm saying that this remark strikes me as a type error, because it confuses what I meant by "take all the Pareto improvements" -- substituting the (conceptually and technologically difficult) control concept for the (conceptually straightforward, difficult only because of processing power limitations) selection concept.

I interpret you that way because your suggestion to apply evolutionary algorithms appears to be missing data. We can apply evolutionary algorithms if we can define a loss function. But the problem I'm pointing at (off full vs partial agency) has to do with difficulties of defining a loss function.

>How would you propose to apply evolutionary algorithms to online learning?

One can use a selection process—say, some evolutionary computation algorithm—to produce a system that performs well in an online learning task. The fitness metric would be based on the performance in many (other) online learning tasks for which training data is available (e.g. past stock prices) or for which the environment can be simulated (e.g. Atari games, robotic arm + boxes).

So, what is the argument that you'd tend to get full agency out of this? I think the situation is not very different from applying gradient descent in a similar way.

Using data from past stock prices, say, creates an implicit model that the agent's trades can never influence the stock price. This is of course a mostly fine model for today's ML systems, but, it's also an example of what I'm talking about -- training procedures tend to create partial agency rather than full agency.
Training the system on many online learning tasks, there will not be an incentive to optimize across tasks -- the training procedure implicitly assumes that the different tasks are independent. This is significant because you really need a whole lot of data in order to learn effective online learning tactics; it seems likely you'd end up splitting larger scenarios into a lot of tiny episodes, creating myopia.

I'm not saying I'd be happily confident that such a procedure would produce partial agents (therefore avoiding AI risk). And indeed, there are differences between doing this with gradient descent and evolutionary algorithms. One of the things I focused on in the post, time-discounting, becomes less relevant -- but only because it's more natural to split things into episodes in the case of evolutionary algorithms, which still creates myopia as a side effect.

What I'm saying is there's a real credit assignment problem here -- you're trying to pick between different policies (ie the code which the evolutionary algorithms are selecting between), based on which policy has performed better in the past. But you've taken a lot of actions in the past. And you've gotten a lot of individual pieces of feedback. You don't know how to ascribe success/failure credit -- that is, you don't know how to match individual pieces of feedback to individual decisions you made (and hence to individual pieces of code).

So you solve the problem in a basically naive way: you assume that the feedback on "instance n" was related to the code you were running at that time. This is a myopic assumption!

>How would you propose to apply evolutionary algorithms to non-episodic environments?

I'm not sure whether this refers to non-episodic tasks (the issue being slower/sparser feedback?) or environments that can't be simulated (in which case the idea above seems to apply: one can use a selection process, using other tasks for which there's training data or for which the environment can be simulated).

The big thing with environments that can't be simulated is that you don't have a reset button, so you can't back up and try again; so, episodic and simulable are pretty related.

Sparse feedback is related to what I'm talking about, but feels like a selection-oriented way of understanding the difficulty of control; "sparse feedback" still applies to very episodic problems such as chess. The difficulty with control is that arbitrarily long historical contexts can sometimes matter, and you have to learn anyway. But I agree that it's much easier for this to present real difficulty if the rewards are sparse.

[-]Ofer6y20

I suspect I made our recent discussions unnecessarily messy by simultaneously talking about: (1) "informal strategic stuff" (e.g. the argument that selection processes are strategically important, which I now understand is not contradictory to your model of the future); and (2) my (somewhat less informal) mathematical argument about evolutionary computation algorithms.

The rest of this comment involves only the mathematical argument. I want to make that argument narrower than the version that perhaps you responded to: I want it to only be about absolute myopia, rather than more general concepts of myopia or full agency. Also, I (now) think my argument applies only to learning setups in which the behavior of the model/agent can affect what the model encounters in future iterations/episodes. Therefore, my argument does not apply to setups such as unsupervised learning for past stock prices or RL for Atari games (when each episode is a new game).

My argument is (now) only the following: Suppose we have a learning setup in which the behavior of the model at a particular moment may affect the future inputs/environments that the model will be trained on. I argue that evolutionary computation algorithms seem less likely to yield an absolute myopic model, relative to gradient decent. If you already think that, you might want to skip the rest of this comment (in which I try to support this argument).

I think the following property might make a learning algorithm more likely to yield models that are NOT absolute myopic:

During training, a parameter's value is more likely to persist (i.e. end up in the final model) if it causes behavior that is beneficial for future iterations/episodes.

I think that this property tends to apply to evolutionary computation algorithms more than it applies to gradient descent. I'll use the following example to explain why I think that:

Suppose we have some online supervised learning setup. Suppose that during iteration 1 the model needs to predict random labels (and thus can't perform better than chance), however, if parameter $θ_{8}$ has a large value then the model makes predictions that cause the examples in iteration 2 to be more predictable. By assumption, during iteration 2 the value of $θ_{8}$ does not (directly) affect predictions.

How should we expect our learning algorithm to update the parameter $θ_{8}$ at the end of iteration 2?

If our learning algorithm is gradient decent, it seems that we should NOT expect $θ_{8}$ to increase, because there is no iteration in which the relevant component of the gradient (i.e. the partial derivative of the objective with respect to $θ_{8}$ ) is expected to be positive.

In contrast, if our learning algorithm is some evolutionary computation algorithm, the models (in the population) in which $θ_{8}$ happens to be larger are expected to outperform the other models, in iteration 2. Therefore, we should expect iteration 2 to increase the average value of $θ_{8}$ (over the model population).

[-]abramdemski6y30

Sorry for taking so long to respond to this one.

I don't get the last step in your argument:

In contrast, if our learning algorithm is some evolutionary computation algorithm, the models (in the population) in which θ8 happens to be larger are expected to outperform the other models, in iteration 2. Therefore, we should expect iteration 2 to increase the average value of θ8 (over the model population).

Why do those models outperform? I think you must be imagining a different setup, but I'm interpreting your setup as:

This is a classification problem, so, we're getting feedback on correct labels X for some Y.
It's online, so we're doing this in sequence, and learning after each.
We keep a population of models, which we update (perhaps only a little) after every training example; population members who predicted the label correctly get a chance to reproduce, and a few population members who didn't are killed off.
The overall prediction made by the system is the average of all the predictions (or some other aggregation).
Large $θ_{8}$ influences at one time-step will cause predictions which make the next time-step easier.
So, if the population has an abundance of high $θ_{8}$ at one time step, the population overall does better in the next time step, because it's easier for everyone to predict.
So, the frequency of high $θ_{8}$ will not be increased at all. Just like in gradient descent, there's no point at which the relevant population members are specifically rewarded.

In other words, many members of the population can swoop in and reap the benefits caused by high- $θ_{8}$ members. So high- $θ_{8}$ carriers do not specifically benefit.

[-]Ofer6y60

In other words, many members of the population can swoop in and reap the benefits caused by high-θ8 members. So high-θ8 carriers do not specifically benefit.

Oops! I agree. My above example is utterly confused and incorrect (I think your description of my setup matches what I was imagining).

Backtracking from this, I now realize that my core reasoning behind my argument—about evolutionary computation algorithms producing non-myopic behavior by default—was incorrect.

(My confusion stemmed from thinking that an argument I made about a theoretical training algorithm also applies to many evolutionary computation algorithms; which seems incorrect!)

AI ALIGNMENT FORUM
AF

AI ALIGNMENT FORUM
AF

17

17

IID vs Myopia

Game-Theoretic Myopia Definition

Pareto Definition

Decision Theory

Mechanism Design Analogy