All of MichaelA's Comments + Replies

Thanks for this! I found it interesting and useful. 

I don't have much specific feedback, partly because I listened to this via Nonlinear Library while doing other things rather than reading it, but I'll share some thoughts anyway since you indicated being very keen for feedback.

  • I in general think this sort of distillation work is important and under-supplied
  • This seems like a good example of what this sort of distillation work should be like - broken into different posts that can be read separately, starting with an overall overview, each post is broke
... (read more)

Interesting (again!).

So you've updated your unconditional estimate from ~5% (1 in 20) to ~9%? If so, people may have to stop citing you as an "optimist"... (which was already perhaps a tad misleading, given what the 1 in 20 was about)

(I mean, I know we're all sort-of just playing with incredibly uncertain numbers about fuzzy scenarios anyway, but still.)

3Rohin Shah
I wouldn't be surprised if the median number from MIRI researchers was around 50%. I think the people who cite me as an optimist are people with those background beliefs. I think even at 5% I'd fall on the pessimistic side at FHI (though certainly not the most pessimistic, e.g. Toby is more pessimistic than I am.

Quite interesting. Thanks for that response.

And yes, this does seem quite consistent with Ord's framing. E.g., he writes "my estimates above incorporate the possibility that we get our act together and start taking these risks very seriously." So I guess I've seen it presented this way at least that once, but I'm not sure I've seen it made explicit like that very often (and doing so seems useful and retrospectively-obvious).

But if we just exerted a lot more effort (i.e. "surprisingly much action"), the extra effort p
... (read more)
3Rohin Shah
More like (b) than (a). In particular, I'm think of lots of additional effort by longtermists, which probably doesn't result in lots of additional effort by everyone else, which already means that we're scaling sublinearly. In addition, you should then expect diminishing marginal returns to more research, which lessens it even more.Also, a thing that I realized Also, I was thinking about this recently, and I am pretty pessimistic about worlds with discontinuous takeoff, which should maybe add another ~5 percentage points to my risk estimate conditional on no intervention by longtermists, and ~4 percentage points to my unconditional risk estimate.

Thanks for this reply!

Perhaps I should've been clear that I didn't expect what I was saying was things you hadn't heard. (I mean, I think I watched an EAG video of you presenting on 80k's ideas, and you were in The Precipice's acknowledgements.)

I guess I was just suggesting that your comments there, taken by themselves/out of context, seemed to ignore those important arguments, and thus might seem overly optimistic. Which seemed mildly potentially important for someone to mention at some point, as I've seen this cited as an e... (read more)

3Rohin Shah
Sure, that seems reasonable. Just adversarial optimization / misalignment. See the comment thread with Wei Dai below, especially this comment. Oh yeah, definitely. (Toby does the same in The Precipice; his position is that it's clearer not to condition on anything, because it's usually unclear what exactly you are conditioning on, though in person he did like the operationalization of "without action from longtermists".) Like, my model of the world is that for any sufficiently important decision like the development of powerful AI systems, there are lots of humans bringing many perspectives to the table, which usually ends up with most considerations being brought up by someone, and an overall high level of risk aversion. On this model, longtermists are one of the many groups that argue for being more careful than we otherwise would be. Yeah, presumably. The 1 in 20 number was very made up, even more so than the 1 in 10 number. I suppose if our actions were very successful, I could see us getting down to 1 in 1000? But if we just exerted a lot more effort (i.e. "surprisingly much action"), the extra effort probably doesn't help much more than the initial effort, so maybe... 1 in 25? 1 in 30? (All of this is very anchored on the initial 1 in 10 number.)
You could imagine a situation where for some reason the US and China are like, “Whoever gets to AGI first just wins the universe.” And I think in that scenario maybe I’m a bit worried, but even then, it seems like extinction is just worse, and as a result, you get significantly less risky behavior? But I don’t think you get to the point where people are just literally racing ahead with no thought to safety for the sake of winning.

My interpretation of what Rohin is saying there is:

  • 1) Extinction is an extremely bad outcome.
  • 2) It&
... (read more)
3Rohin Shah
I agree that actors will focus on x-risk far less than they "should" -- that's exactly why I work on AI alignment! This doesn't mean that x-risk is high in an absolute sense, just higher than it "should" be from an altruistic perspective. Presumably from an altruistic perspective x-risk should be very low (certainly below 1%), so my 10% estimate is orders of magnitude higher than what it "should" be. Also, re: Precipice, it's worth noting that Toby and I don't disagree much -- I estimate 1 in 10 conditioned on no action from longtermists; he estimates 1 in 5 conditioned on AGI being developed this century. Let's say that action from longtermists can halve the risk; then my unconditional estimate would be 1 in 20, and would be very slightly higher if we condition on AGI being developed this century (because we'd have less time to prepare), so overall there's a 4x difference, which given the huge uncertainty is really not very much.

Interesting interview, thanks for sharing it!

Asya Bergal: It seems like people believe there’s going to be some kind of pressure for performance or competitiveness that pushes people to try to make more powerful AI in spite of safety failures. Does that seem untrue to you or like you’re unsure about it?
Rohin Shah: It seems somewhat untrue to me. I recently made a comment about this on the Alignment Forum. People make this analogy between AI x-risk and risk of nuclear war, on mutually assured destruction. That particular analogy seems off to m
... (read more)
2Rohin Shah
MAD-style strategies happen when: 1. There are two (or more) actors that are in competition with each other 2. There is a technology such that if one actor deploys it and the other actor doesn't, the first actor remains the same and the second actor is "destroyed". 3. If both actors deploy the technology, then both actors are "destroyed". (I just made these up right now; you could probably get better versions from papers about MAD.) Condition 2 doesn't hold for accident risk from AI: if any actor deploys an unaligned AI, then both actors are destroyed. I agree I didn't explain this well in the interview -- when I said I should have said something like which is not true for nuclear weapons (deploying a nuke doesn't affect you in and of itself).