A shift in arguments for AI risk

Richard_Ngo

This is a linkpost for https://fragile-credences.github.io/prioritising-ai/

The linked post is work done by Tom Adamczewski while at FHI. I think this sort of expository and analytic work is very valuable, so I'm cross-posting it here (with his permission). Below is an extended summary; for the full document, see his linked blog post.

Many people now work on ensuring that advanced AI has beneficial consequences. But members of this community have made several quite different arguments for prioritising AI.

Early arguments, and in particular Superintelligence, identified the “alignment problem” as the key source of AI risk. In addition, the book relies on the assumption that superintelligent AI is likely to emerge through a discontinuous jump in the capabilities of an AI system, rather than through gradual progress. This assumption is crucial to the argument that a single AI system could gain a “decisive strategic advantage”, that the alignment problem cannot be solved through trial and error, and that there is likely to be a “treacherous turn”. Hence, the discontinuity assumption underlies the book’s conclusion that existential catastrophe is a likely outcome.

The argument in Superintelligence combines three features: (i) a focus on the alignment problem, (ii) the discontinuity assumption, and (iii) the resulting conclusion that an existential catastrophe is likely.

Arguments that abandon some of these features have recently become prominent. They also generally tend to have been made in less detail than the early arguments.

One line of argument, promoted by Paul Christiano and Katja Grace, drops the discontinuity assumption, but continues to view the alignment problem as the source of AI risk. Even under more gradual scenarios, they argue that, unless we solve the alignment problem before advanced AIs are widely deployed in the economy, these AIs will cause human values to eventually fade from prominence. They appear to be agonistic about whether these harms would warrant the label “existential risk”.

Moreover, others have proposed AI risks that are unrelated to the alignment problem. I discuss three of these: (i) the risk that AI might be misused, (ii) that it could make war between great powers more likely, and (iii) that it might lead to value erosion from competition. These arguments don’t crucially rely on a discontinuity, and the risks are rarely existential in scale.

It’s not always clear which of the arguments actually motivates members of the beneficial AI community. It would be useful to clarify which of these arguments (or yet other arguments) are crucial for which people. This could help with evaluating the strength of the case for prioritising AI, deciding which strategies to pursue within AI, and avoiding costly misunderstanding with sympathetic outsiders or sceptics.

Note: This post was written in February 2019 while at the Governance of AI Programme, within the Future of Humanity Institute. I’m publishing it as it stood in February, despite significant flaws, since I’m starting a new job and anticipate I won’t have time to update it. I thank Markus Anderljung, Max Daniel, Jeffrey Ding, Eric Drexler, Carrick Flynn, Richard Ngo, Cullen O’Keefe, Stefan Schubert, Rohin Shah, Toby Shevlane, Matt van der Merwe and Remco Zwetsloot for help with previous versions of this document. Ben Garfinkel was especially generous with his time and many of the ideas in this document were originally his.

I guess this may have been one of those Google docs that people had a lot of private discussions in. This makes me rather discouraged from commenting, knowing that anything I write may have been extensively discussed already and the author just didn't have time to or didn't feel like incorporating those comments/viewpoints into the published document. (Some of the open questions listed seem to have fairly obvious answers. Did no one suggest such answers to the author? Or were they found wanting in some way?) Also it seems like the author is not here to participate in a public discussion, or may not have the time to do so (given his new job situation).

However I did write a bunch of comments under one of your posts (you = ricraz = Richard Ngo, if I remember correctly), which appears to cover roughly the same topic (shifts in AI risk arguments over time), and those comments may also be somewhat relevant here. Beyond that, I wonder if you could summarize what is new or different in this document compared to yours, and whether you think there's anything in it that would be especially valuable to have a public discussion about (even absent participation of the author).

Planned summary:

Early arguments for AI safety focus on existential risk cause by a failure of alignment combined with a sharp, discontinuous jump in AI capabilities. The discontinuity assumption is needed in order to argue for a treacherous turn, for example: without a discontinuity, we would presumably see less capable AI systems fail to hide their misaligned goals from us, or to attempt to deceive us without success. Similarly, in order for an AI system to obtain a decisive strategic advantage, it would need to be significantly more powerful than all the other AI systems already in existence, which requires some sort of discontinuity.

Now, there are several other arguments for AI risk, though none of them have been made in great detail and are spread out over a few blog posts. This post analyzes several of them and points out some open questions.

First, even without a discontinuity, a failure of alignment could lead to a bad future: since the AIs have more power and intelligence their values will determine what happens in the future, rather than ours. (Here **it is the difference between AIs and humans that matters**, whereas for a decisive strategic advantage it is the difference between the most intelligent agent and the next-most intelligent agents that matters.) See also More realistic tales of doom and Three impacts of machine intelligence. However, it isn't clear why we wouldn't be able to fix the misalignment at the early stages when the AI systems are not too powerful.

Even if we ignore alignment failures, there are other AI risk arguments. In particular, since AI will be a powerful technology, it could be used by malicious actors; it could help ensure robust totalitarian regimes; it could increase the likelihood of great-power war, and it could lead to stronger competitive pressures that erode value. With all of these arguments, it's not clear why they are specific to AI in particular, as opposed to any important technology, and the arguments for risk have not been sketched out in detail.

The post ends with an exhortation to AI safety researchers to clarify which sources of risk motivate them, because it will influence what safety work is most important, it will help cause prioritization efforts that need to determine how much money to allocate to AI risk, and it can help avoid misunderstandings with people who are skeptical of AI risk.

Planned opinion:

I'm glad to see more work of this form; it seems particularly important to gain more clarity on what risks we actually care about, because it strongly influences what work we should do. In the particular scenario of an alignment failure without a discontinuity, I'm not satisfied with the solution "we can fix the misalignment early on", because early on even if the misalignment is apparent to us, it likely will not be easy to fix, and the misaligned AI system could still be useful because it is "aligned enough", at least at this low level of capability.

Personally, the argument that motivates me most is "AI will be very impactful, and it's worth putting in effort into making sure that that impact is positive". I think the scenarios involving alignment failures without a discontinuity are a particularly important subcategory of this argument: while I do expect we will be able to handle this issue if it arises, this is mostly because of meta-level faith in humanity to deal with the problem. We don't currently have a good object-level story for why the issue _won't_ happen, or why it will be fixed when it does happen, and it would be good to have such a story in order to be confident that AI will in fact be beneficial for humanity.

I know less about the non-alignment risks, and my work doesn't really address any of them. They seem worth more investigation; currently my feeling towards them is "yeah, those could be risks, but I have no idea how likely the risks are".

An alternate framing could be about changing group boundaries rather than changing demographics in an isolated group.

There were surely people in 2010 who thought that the main risk from AI was it being used by bad people. The difference might not be that these people have popped into existence or only recently started talking - it's that they're inside the fence more than before.

And of course, reality is always complicated. One of the concerns in the "early LW" genre is value stability and self-trust under self-modification, which has nothing to do with sudden growth. And one of the "recent" genre concerns is arms races, which are predicated on people expecting sudden capability growth to give them a first mover advantage.

Thanks for making the cross-post. Do you know if the author is likely to see comments posted here, or if he prefers to receive comments another way?

Note: This post was written in February 2019 while at the Governance of AI Programme, within the Future of Humanity Institute. I’m publishing it as it stood in February, despite significant flaws, since I’m starting a new job and anticipate I won’t have time to update it. I thank Markus Anderljung, Max Daniel, Jeffrey Ding, Eric Drexler, Carrick Flynn, Richard Ngo, Cullen O’Keefe, Stefan Schubert, Rohin Shah, Toby Shevlane, Matt van der Merwe and Remco Zwetsloot for help with previous versions of this document. Ben Garfinkel was especially generous with his time and many of the ideas in this document were originally his.

Planned summary:

Planned opinion:

An alternate framing could be about changing group boundaries rather than changing demographics in an isolated group.

Thanks for making the cross-post. Do you know if the author is likely to see comments posted here, or if he prefers to receive comments another way?

13

A shift in arguments for AI risk

13