I think one reason machine learning researchers don't think AI x-risk is a problem is because they haven't given it the time of day. And on some level, they may be right in not doing so!

We all need to do meta-level reasoning about what to spend our time and effort on. Even giving an idea or argument the time of day requires it to cross a somewhat high bar, if you value your time. Ultimately, in evaluating whether it's worth considering a putative issue (like the extinction of humanity at the hands (graspers?) of a rogue AI), one must rely on heuristics; by giving the argument the time of day, you've already conceded a significant amount of resources to it! Moreover, you risk privileging the hypothesis or falling victim to Pascal's Mugging.

Unfortunately, the case for x-risk from out-of-control AI systems seems to fail many powerful and accurate heuristics. This can put proponents of this issue in a similar position to flat-earth conspiracy theorists at first glance. My goal here is to enumerate heuristics that arguments for AI takeover scenarios fail.

Ultimately, I think machine learning researchers should not refuse to consider AI x-risk when presented with a well-made case by a person they respect or have a personal relationship with, but I'm ambivalent as to whether they have an obligation to consider the case if they've only seen a few headlines about Elon. I do find it a bit hard to understand how one doesn't end up thinking about the consequences of super-human AI, since it seems obviously impactful and fascinating. But I'm a very curious (read "distractable") person...


A list of heuristics that say not to worry about AI takeover scenarios:

  • Outsiders not experts: This concern is being voiced exclusively by non-experts like Elon Musk, Steven Hawking, and the talkative crazy guy next to you on the bus.
  • Ludditism has a poor track record: For every new technology, there's been a pack of alarmist naysayers and doomsday prophets. And then instead of falling apart, the world got better.
  • EtA: No concrete threat model: When someone raises a hypothetical concern, but can't give you a good explanation for how it could actually happen, it's much less likely to actually happen. Is the paperclip maximizer the best you can do?
  • It's straight out of science fiction: AI researchers didn't come up with this concern, Hollywood did. Science fiction is constructed based on entertaining premises, not realistic capabilities of technologies.
  • It's not empirically testable: There's no way to falsify the belief that AI will kill us all. It's purely a matter of faith. Such beliefs don't have good track records of matching reality.
  • It's just too extreme: Whenever we hear an extreme prediction, we should be suspicious. To the extent that extreme changes happen, they tend to be unpredictable. While extreme predictions sometimes contain a seed of truth, reality tends to be more mundane and boring.
  • It has no grounding in my personal experience: When I train my AI systems, they are dumb as doorknobs. You're telling me they're going to be smarter than me? In a few years? So smart that they can outwit me, even though I control the very substrate of their existence?
  • It's too far off: It's too hard to predict the future and we can't really hope to anticipate specific problems with future AI systems; we're sure to be surprised! We should wait until we can envision more specific issues, scenarios, and threats, not waste our time on what comes down to pure speculation.

I'm pretty sure this list in incomplete, and I plan to keep adding to it as I think of or hear new suggestions! Suggest away!!

Also, to be clear, I am writing these descriptions from the perspective of someone who has had very limited exposure to the ideas underlying concerns about AI takeover scenarios. I think a lot of these reactions indicate significant misunderstandings about what people working on mitigating AI x-risk believe, as well as matters of fact (e.g. a number of experts have voiced concerns about AI x-risk, and a significant portion of the research community seems to agree that these concerns are at least somewhat plausible and important).

New Comment
11 comments, sorted by Click to highlight new comments since:

Another important improvement I should make: rephrase these to have the type signature of "heuristic"!

I pushed this post out since I think it's good to link to it in this other post. But there are at least 2 improvements I'd like to make and would appreciate help with:

I helped make this list in 2016 for a post by Nate, partly because I was dissatisfied with Scott's list (which includes people like Richard Sutton, who thinks worrying about AI risk is carbon chauvinism):

Stuart Russell’s Cambridge talk is an excellent introduction to long-term AI risk. Other leading AI researchers who have expressed these kinds of concerns about general AI include Francesca Rossi (IBM), Shane Legg (Google DeepMind), Eric Horvitz (Microsoft), Bart Selman (Cornell), Ilya Sutskever (OpenAI), Andrew Davison (Imperial College London), David McAllester (TTIC), and Jürgen Schmidhuber (IDSIA).

These days I'd probably make a different list, including people like Yoshua Bengio. AI risk stuff is also sufficiently in the Overton window that I care more about researchers' specific views than about "does the alignment problem seem nontrivial to you?". Even if we're just asking the latter question, I think it's more useful to list the specific views and arguments of individuals (e.g., note that Rossi is more optimistic about the alignment problem than Russell), list the views and arguments of the similarly prominent CS people who think worrying about AGI is silly, and let people eyeball which people they think tend to produce better reasons.

Is there a better reference for " a number of experts have voiced concerns about AI x-risk "? I feel like there should be by now...

I hope someone actually answers your question, but FWIW, the Asilomar principles were signed by an impressive list of prominent AI experts. Five of the items are related to AGI and x-risk. The statements aren't really strong enough to declare that those people "voiced concerns about AI x-risk", but it's a data-point for what can be said about AI x-risk while staying firmly in the mainstream.

My experience in casual discussions is that it's enough to just name one example to make the point, and that example is of course Stuart Russell. When talking to non-ML people—who don't know the currently-famous AI people anyway—I may also mention older examples like Alan Turing, Marvin Minsky, or Norbert Wiener.

Thanks for this nice post. :-)

Yeah I've had conversations with people who shot down a long list of concerned experts, e.g.:

  • Stuart Russell is GOFAI ==> out-of-touch
  • Shane Legg doesn't do DL, does he even do research? ==> out-of-touch
  • Ilya Sutskever (and everyone at OpenAI) is crazy, they think AGI is 5 years away ==> out-of-touch
  • Anyone at DeepMind is just marketing their B.S. "AGI" story or drank the koolaid ==> out-of-touch

But then, even the big 5 of deep learning have all said things that can be used to support the case....

So it kind of seems like there should be a compendium of quotes somewhere, or something.

Sounds like their problem isn't just misleading heuristics, it's motivated cognition.

Oh sure, in some special cases. I don't this this experience was particularly representative.

Here's another: AI being x-risky makes me the bad guy.

That is, if I'm an AI researcher and someone tells me that AI poses x-risks, I might react by seeing this as someone telling me I'm a bad person for working on something that makes the world worse. This is bad for me because I derive import parts of my sense of self from being an AI researcher: it's my profession, my source of income, my primary source of status, and a huge part of what makes my life meaningful to me. If what I am doing is bad or dangerous, that threatens to take much of that away (if I also want to think of myself as a good person, meaning I either have to stop doing AI work to avoid being bad or stop thinking of myself as good), and an easy solution to that is to dismiss the arguments.

This is more generally a kind of motivated cognition or rationalization, but I think it's worth considering a specific mechanism because it better points towards ways you might address the objection.

This doesn't seem like it belongs on a "list of good heuristics", though!

Flo's summary for the Alignment Newsletter:

Because human attention is limited and a lot of people try to convince us of the importance of their favourite cause, we cannot engage with everyone’s arguments in detail. Thus we have to rely on heuristics to filter out insensible arguments. Depending on the form of exposure, the case for AI risks can fail on many of these generally useful heuristics, eight of which are detailed in this post. Given this outside view perspective, it is unclear whether we should actually expect ML researchers to spend time evaluating the arguments for AI risk.

Flo's opinion:

I can remember being critical of AI risk myself for similar reasons and think that it is important to be careful with the framing of pitches to avoid these heuristics from firing. This is not to say that we should avoid criticism of the idea of AI risk, but criticism is a lot more helpful if it comes from people who have actually engaged with the arguments.

My opinion:

Even after knowing the arguments, I find six of the heuristics quite compelling: technology doomsayers have usually been wrong in the past, there isn't a concrete threat model, it's not empirically testable, it's too extreme, it isn't well grounded in my experience with existing AI systems, and it's too far off to do useful work now. All six make me distinctly more skeptical of AI risk.

Sort of related to a couple points you already brought up (not in personal experience, outsiders not experts, science fiction), but worrying about AI x-risk is also weird, i.e. it's not a thing everyone else is worrying about, so you use some of your weirdness-points to publicly worry about it, and most people have very low weirdness budgets (because of not enough status to afford more weirdness, low psychological openness, etc.).