KatjaGrace

Wiki Contributions

Comments

Sorted by

I wrote an AI Impacts page summary of the situation as I understand it. If anyone feels like looking, I'm interested in corrections/suggestions (either here or in the AI Impacts feedback box).  

A few quick thoughts on reasons for confusion:

I think maybe one thing going on is that I already took the coherence arguments to apply only in getting you from weakly having goals to strongly having goals, so since you were arguing against their applicability, I thought you were talking about the step from weaker to stronger goal direction. (I’m not sure what arguments people use to get from 1 to 2 though, so maybe you are right that it is also something to do with coherence, at least implicitly.)

It also seems natural to think of ‘weakly has goals’ as something other than ‘goal directed’, and ‘goal directed’ as referring only to ‘strongly has goals’, so that ‘coherence arguments do not imply goal directed behavior’ (in combination with expecting coherence arguments to be in the weak->strong part of the argument) sounds like ‘coherence arguments do not get you from ‘weakly has goals’ to ‘strongly has goals’.

I also think separating out the step from no goal direction to weak, and weak to strong might be helpful in clarity. It sounded to me like you were considering an argument from 'any kind of agent' to 'strong goal directed' and finding it lacking, and I was like 'but any kind of agent includes a mix of those that this force will work on, and those it won't, so shouldn't it be a partial/probabilistic move toward goal direction?' Whereas you were just meaning to talk about what fraction of existing things are weakly goal directed.

Thanks. Let me check if I understand you correctly:

You think I take the original argument to be arguing from ‘has goals' to ‘has goals’, essentially, and agree that that holds, but don’t find it very interesting/relevant.

What you disagree with is an argument from ‘anything smart’ to ‘has goals’, which seems to be what is needed for the AI risk argument to apply to any superintelligent agent.

Is that right?

If so, I think it’s helpful to distinguish between ‘weakly has goals’ and ‘strongly has goals’:

  1. Weakly has goals: ‘has some sort of drive toward something, at least sometimes' (e.g. aspects of outcomes are taken into account in decisions in some way)
  2. Strongly has goals: ’pursues outcomes consistently and effectively' (i.e. decisions maximize expected utility)

 

So that the full argument I currently take you to be responding to is closer to:

  1. By hypothesis, we will have superintelligent machines
  2. They will weakly have goals (for various reasons, e.g. they will do something, and maybe that means ‘weakly having goals’ in the relevant way? Probably other arguments go in here.)
  3. Anything that weakly has goals has reason to reform to become an EU maximizer, i.e. to strongly have goals
  4. Therefore we will have superintelligent machines that strongly have goals

 

In that case, my current understanding is that you are disagreeing with 2, and that you agree that if 2 holds in some case, then the argument goes through. That is, creatures that are weakly goal directed are liable to become strongly goal directed. (e.g. an agent that twitches because it has various flickering and potentially conflicting urges toward different outcomes is liable to become an agent that more systematically seeks to bring about some such outcomes) Does that sound right?

If so, I think we agree. (In my intuition I characterize the situation as ‘there is roughly a gradient of goal directedness, and a force pulling less goal directed things into being more goal directed. This force probably doesn’t exist out at the zero goal directness edges, but it unclear how strong it is in the rest of the space—i.e. whether it becomes substantial as soon as you move out from zero goal directedness, or is weak until you are in a few specific places right next to ‘maximally goal directed’.)

I meant: conditional on it growing faster, why expect this is attributable to a small number of technologies, given that when it accelerated previously it was not like that (if I understand)?

If throughout most of history growth rates have been gradually increasing, I don't follow why you would expect one technology to cause it to grow much faster, if it goes back to accelerating.

1) Even if it counts as a DSA, I claim that it is not very interesting in the context of AI. DSAs of something already almost as large as the world are commonplace. For instance, in the extreme, the world minus any particular person could take over the world if they wanted to. The concern with AI is that an initially tiny entity might take over the world.

2) My important point is rather that your '30 year' number is specific to the starting size of the thing, and not just a general number for getting a DSA. In particular, it does not apply to smaller things.

3) Agree income doesn't equal taking over, though in the modern world where much purchasing occurs, it is closer. Not clear to me that AI companies do better as a fraction of the world in terms of military power than they do in terms of spending.

The time it takes to get a DSA by growing bigger depends on how big you are to begin with. If I understand, you take your 30 years from considering the largest countries, which are not far from being the size of the world, and then use it when talking about AI projects that are much smaller (e.g. a billion dollars a year suggests about 1/100,000 of the world). If you start from a situation of an AI project being three doublings from taking over the world say, then most of the question of how it came to have a DSA seems to be the question of how it grew the other seventeen doublings. (Perhaps you are thinking of an initially large country growing fast via AI? Do we then have to imagine that all of the country's resources are going into AI?)