User Comment Replies — AI Alignment Forum

I think the double decrease effect kicks in with uncertainty, but not with confident expectation of a smaller network.

0Stuart Armstrong8y

I think it does do the double decrease for the known smaller network. Take three agent A1, A2, and A3, with utilities u1, u2, and u3. Assume the indexes i, j, and k are always distinct. For each Ai, they can boost uj at the cost described above in terms of ui. What I haven't really specified is the three-way synergy - can Ai boost uj+uk more efficiently that simply boosting uj and uk independently? In general yes (the two utilities uj and uk are synergistic with each other, after all), but let's first assume there is zero three-way synergy. Then each agent Ai will sacrifice 1/2+1/2=1 in ui to boost uj and uk each by 1. Overall, each utility function goes up by 1+1−1=1. This scales linearly with the size of the trade network each agent sees (excluding themselves): if there were two agents total, each utility would go up by 1/2, as in the top post example. And if there were n+1 agents, each utility would go up by n/2. However, if there are any three-way, four-way,..., or n-way synergies, then the trade network is more efficient than that. So there is a double decrease (or double increase, from the other perspective), as long as there are higher-order synergies between the utilities.

A permutation argument for comparing utility functions

Owen Cotton-Barratt8y00

I'm not sure I've fully followed, but I'm suspicious that you seem to be getting something for nothing in your shift from a type of uncertainty that we don't know how to handle to a type we do.

It seems to me like you must be making an implicit assumption somewhere. My guess is that this is where you used $i$ to pair $S$ with $S^{'}$ . If you'd instead chosen $j = i \circ ρ$ as the matching then you'd have uncertainty between whether $m$ should be $j$ or $ρ^{- 1} \circ j$ . My guess is that generically this gives different recommendations from your approach.

1Stuart Armstrong8y

Nope! That gives the same recommendation (as does the same thing if you pre-compose with any other permutation of S). I thought about putting that fact in, but it took up space. The recommendation given in both cases is just to normalise each utility function individually, using any of the methods that we know (which will always produce equivalent utility classes in this situation).

Learning Impact in RL

Owen Cotton-Barratt8y20

Seems to me like there are a bunch of challenges. For example you need extra structure on your space to add things or tell what's small; and you really want to keep track of long-term impact not just at the next time-step. Particularly the long-term one seems thorny (for low-impact in general, not just for this).

Nevertheless I think this idea looks promising enough to explore further, would also like to hear David's reasons.

On motivations for MIRI's highly reliable agent design research

Owen Cotton-Barratt8y20

For #5, OK, there's something to this. But:

It's somewhat plausible that stabilising pivotal acts will be available before world-destroying ones;
Actually there's been a supposition smuggled in already with "the first AI systems capable of performing pivotal acts". Perhaps there will at no point be a system capable of a pivotal act. I'm not quite sure whether it's appropriate to talk about the collection of systems that exist being together capable of pivotal acts if they will not act in concert. Perhaps we'll have a collection of systems which if aligned

... (read more)

1Jessica Taylor8y

I agree that things get messier when there is a collection of AI systems rather than a single one. "Pivotal acts" mostly make sense in the context of local takeoff. In nonlocal takeoff, one of the main concerns is that goal-directed agents not aligned with human values are going to find a way to cooperate with each other.

On motivations for MIRI's highly reliable agent design research

Owen Cotton-Barratt8y20

Thanks for the write-up, this is helpful for me (Owen).

My initial takes on the five steps of the argument as presented, in approximately decreasing order of how much I am on board:

Number 3 is a logical entailment, no quarrel here
Number 5 is framed as "therefore", but adds the assumption that this will lead to catastrophe. I think this is quite likely if the systems in question are extremely powerful, but less likely if they are of modest power.
Number 4 splits my intuitions. I begin with some intuition that selection pressure would significantly constra

... (read more)

0Jessica Taylor8y

* For #5, it seems like "capable of pivotal acts" is doing the work of implying that the systems are extremely powerful. * For #4, I think that selection pressure does not constrain the goal much, since different terminal goals produce similar convergent instrumental goals. I'm still uncertain about this, though; it seems at least plausible (though not likely) that an agent's goals are going to be aligned with a given task if e.g. their reproductive success is directly tied to performance on the task. * Agree on #2; I can kind of see it both ways too. * I'm also somewhat skeptical of #1. I usually think of it in terms of "how much of a competitive edge does general consequentialist reasoning give an AI project" and "how much of a competitive edge will safe AI projects have over unsafe ones, e.g. due to having more resources".

AI ALIGNMENT FORUM
AF

All of owencb's Comments + Replies