Here's another argument that I've been pushing since the early days (apparently not very successfully since it didn't make it to this list :) which might be called "argument from philosophical difficulty". It appears that achieving a good long term future requires getting a lot of philosophical questions right that are hard for us to answer. Given this, initially I thought there are only three ways for AI to go right in this regard (assuming everything else goes well with the AI):
Since then people have come up with a couple more scenarios (which did make me slightly more optimistic about this problem):
The overall argument is that, given human safety problems, realistic competitive pressures, difficulties with coordination, etc., it seems hard to end up in any of these scenarios and not have something go wrong along the way. Maybe another way to put this is, given philosophical difficulties, the target we'd have to hit with AI is even smaller than it might otherwise appear.
This [Maximisers are dangerous] was the main thesis advanced by Yudkowsky and Bostrom when founding the field of AI safety. [...] And this proliferation of arguments is evidence against their quality: if your conclusions remain the same but your reasons for holding those conclusions change, that’s a warning sign for motivated cognition (especially when those beliefs are considered important in your social group).
I think many of the other arguments did appear in early discussions of AI safety, but perhaps later didn't get written up clearly or get emphasized as much as "maximisers are dangerous". I'd cite CEV as an AI safety idea that clearly took "human safety problems" strongly into consideration, and even before that, Yudkowsky wrote about the SysOp Scenario which would essentially replace physics with a different set of rules that would (in part) eliminate the potential vulnerabilities of actual physics. The early focus on creating a Singleton wasn't just due to thinking that local intelligence explosion is highly likely but also because for reasons like ones in "prosaic alignment problem", people (including me) thought a competitive multi-polar scenario might lead to unavoidably bad outcomes.
So I don't think "your conclusions remain the same but your reasons for holding those conclusions change" is fair if it was meant to apply to Yudkowsky and Bostrom and others who have been involved in AI safety from the early days.
(I still think it's great that you're doing this work of untangling and explicating the different threads of argument for the importance of AI safety, but this part seems a bit unfair or at least could be interpreted that way.)
Apologies if this felt like it was targeted specifically at you and other early AI safety advocates, I have nothing but the greatest respect for your work. I'll rewrite to clarify my intended meaning, which is more an attempt to evaluate the field as a whole. This is obviously a very vaguely-defined task, but let me take a stab at fleshing out some changes over the past decade:
1. There's now much more concern about argument 2, the target loading problem (as well as inner optimisers, insofar as they're distinct).
2. There's now less focus on recursive self-improvement as a key reason why AI will be dangerous, and more focus on what happens when hardware scales up. Relatedly, I think a greater percentage of safety researchers believe that there'll be a slow takeoff than used to be the case.
3. Argument 3 (prosaic AI alignment) is now considered more important and more tractable.
4. There's now been significant criticism of coherence arguments as a reason to believe that AGI will pursue long-term goals in an insatiable maximising fashion.
I may be wrong about these shifts - I'm speaking as a newcomer to the field who has a very limited perspective on how it's evolved over time. If so, I'd be happy to be corrected. If they have in fact occurred, here are some possible (non-exclusive) reasons why:
A. None of the proponents of the original arguments have changed their minds about the importance of those arguments, but new people came into the field because of those arguments, then disagreed with them and formulated new perspectives.
B. Some of the proponents of the original arguments have changed their minds significantly.
C. The proponents of the original arguments were misinterpreted, or overemphasised some of their beliefs at the expense of others, and actually these shifts are just a change in emphasis.
I think none of these options reflect badly on anyone involved (getting everything exactly right the first time is an absurdly high standard), but I think A and B would be weak evidence against the importance of AI safety (assuming you've already conditioned on the size of the field, etc). I also think that it's great when individual people change their minds about things, and definitely don't want to criticise that. But if the field as a whole does so (whatever that means), the dynamics of such a shift are worth examination.
I don't have strong beliefs about the relative importance of A, B and C, although I would be rather surprised if any one of them were primarily responsible for all the shifts I mentioned above.
I think none of these options reflect badly on anyone involved (getting everything exactly right the first time is an absurdly high standard), but I think A and B would be weak evidence against the importance of AI safety (assuming you’ve already conditioned on the size of the field, etc).
That depend on how much A and B. Even if a field was actually important, it would have some nonzero amount of A and B, so A and B would constitute (even weak) evidence only if it was more than what you'd expect conditional on the field being important. I think the changes you described in the parent comment are real changes and are not entirely due to C, but they're not more than the changes I'd expect to see conditional on AI safety being actually important. Do you have a different sense?
I don't think it depends on how much A and B, because the "expected amount" is not a special point. In this context, the update that I made personally was "There are more shifts than I thought there were, therefore there's probably more of A and B than I thought there was, therefore I should weakly update against AI safety being important." Maybe (to make A and B more concrete) there being more shifts than I thought downgrades my opinion of the original arguments from "absolutely incredible" to "very very good", which slightly downgrades my confidence that AI safety is important.
As a separate issue, conditional on the field being very important, I might expect the original arguments to be very very good, or I might expect them to be very good, or something else. But I don't see how that expectation can prevent a change from "absolutely exceptional" to "very very good" from downgrading my confidence.
C. The proponents of the original arguments were misinterpreted, or overemphasised some of their beliefs at the expense of others, and actually these shifts are just a change in emphasis.
My interpretation of what happened here is that more narrow AI successes made it more convincing that one could reach ASI by building all of the components of it directly, rather than necessitating building an AI that can do most of the hard work for you. If it only takes 5 cognitive modules to take over the world instead of 500, then one no longer needs to posit an extra mechanism by which a buildable system is able to reach the ability to take over the world. And so from my perspective it's mostly a shift in emphasis, with small amounts of A and B as well.
Promoted to curated: I think this classification is good and useful, both to refer to in conversation and to help people navigate the broader alignment space. And I think the post is presented in a clear and relatively concise way.
I do think there would have been value in connecting it more to past writing about similar topics, though I recognize that this might have easily doubled the effort of writing this post.
Thanks! I agree that more connection to past writings is always good, and I'm happy to update it appropriately - although, upon thinking about it, there's nothing which really comes to mind as an obvious omission (except perhaps citing sections of Superintelligence?) Of course I'm pretty biased, since I already put in the things which I thought were most important - so I'd be glad to hear any additional suggestions you have.
One place that comes to mind that had a bunch of related writing is Arbital.
I was also thinking about linking to a bunch of related taxonomies. The "Disjunctive AI Risk" paper comes to mind. I will think about other examples.
This is great, but 4 and 5 seem to be aspects of the same problem to me (i.e., that humans aren't safe agents) and I'm not sure how you're proposing to draw the line between them. For example
It’s also possible that we invent some technology which destroys us unexpectedly, either through unluckiness or carelessness.
If this was caused entirely by an AI pursuing an otherwise beneficial goal, it would certainly count as a failure of AI safety (and is currently studied under "safe exploration") so it seems to make sense to call the analogous human problem "human safety". Similarly coordination between AIs is considered a safety problem and studied under decision theory and game theory for AIs.
Can you explain a bit more the difference you see between 4 and 5?
To me the difference is that when I read 5 I'm thinking about people being careless or malevolent, in an everyday sense of those terms, whereas when I read 4 I'm thinking about how maybe there's no such thing as a human who's not careless or malevolent, if given enough power and presented with a weird enough situation.
I endorse ESRogs' answer. If the world were a singleton under the control of a few particularly benevolent and wise humans, with an AGI that obeys the intention of practical commands (in a somewhat naive way, say, so it'd be unable to help them figure out ethics) then I think argument 5 would no longer apply, but argument 4 would. Or, more generally: argument 5 is about how humans might behave badly under current situations and governmental structures in the short term, but makes no claim that this will be a systemic problem in the long term (we could probably solve it using a singleton + mass surveillance); argument 4 is about how we don't know of any governmental(/psychological?) structures which are very likely to work well in the long term.
Having said that, your ideas were the main (but not sole) inspiration for argument 4, so if this isn't what you intended, then I may need to rethink its inclusion.
I think this division makes sense on a substantive level, and I guess I was confused by the naming and the ordering between 4 and 5. I would define "human safety problems" to include both short term and long term problems (just like "AI safety problems" includes short term and long term problems) so I'd put both 4 and 5 under "human safety problems" instead of just 4. I guess in my posts I mostly focused on long term problems since short term problems have already been widely recognized, but as far as naming, it seems strange to exclude short term problems from "human safety problems". Also you wrote "They are listed roughly from most specific and actionable to most general" and 4 feels like a more general problem than 5 to me, although perhaps that's arguable.
I struggle to understand the difference between #2 and #3. The prosaic AI alignment problem only exists because we don't know how to make an agent that tries to do what we want it to do. Would you say that #3 is a concrete scenario for how #2 could lead to a catastrophe?
I think #3 could occur because of #2 (which I now mostly call "inner misalignment"), but it could also occur because of outer misalignment.
Broadly speaking, though, I think you're right that #2 and #3 are different types of things. Because of that and other issues, I no longer think that this post disentangles the arguments satisfactorily; I'll make a note of this at the top of the document.
Note: my views have shifted significantly since writing this post. I now consider items 1, 2, 3, and 6.2 to be different facets of one core argument, which I call the "second species" argument, and which I explore in depth in this report. And I don't really think of 4 as an AI safety problem any more.
I recently attended the 2019 Beneficial AGI conference organised by the Future of Life Institute. I’ll publish a more complete write-up later, but I was particularly struck by how varied attendees' reasons for considering AI safety important were. Before this, I’d observed a few different lines of thought, but interpreted them as different facets of the same idea. Now, though, I’ve identified at least 6 distinct serious arguments for why AI safety is a priority. By distinct I mean that you can believe any one of them without believing any of the others - although of course the particular categorisation I use is rather subjective, and there’s a significant amount of overlap. In this post I give a brief overview of my own interpretation of each argument (note that I don’t necessarily endorse them myself). They are listed roughly from most specific and actionable to most general. I finish with some thoughts on what to make of this unexpected proliferation of arguments. Primarily, I think it increases the importance of clarifying and debating the core ideas in AI safety.
What should we think about the fact that there are so many arguments for the same conclusion? As a general rule, the more arguments support a statement, the more likely it is to be true. However, I’m inclined to believe that quality matters much more than quantity - it’s easy to make up weak arguments, but you only need one strong one to outweigh all of them. And this proliferation of arguments is (weak) evidence against their quality: if the conclusions of a field remain the same but the reasons given for holding those conclusions change, that’s a warning sign for motivated cognition (especially when those beliefs are considered socially important). This problem is exacerbated by a lack of clarity about which assumptions and conclusions are shared between arguments, and which aren’t.
On the other hand, superintelligent AGI is a very complicated topic, and so perhaps it’s natural that there are many different lines of thought. One way to put this in perspective (which I credit to Beth Barnes) is to think about the arguments which might have been given for worrying about nuclear weapons, before they had been developed. Off the top of my head, there are at least four:
And there are probably more which would have been credible at the time, but which seem silly now due to hindsight bias. So if there’d been an active anti-nuclear movement in the 30’s or early 40’s, the motivations of its members might well have been as disparate as those of AI safety advocates today. Yet the overall concern would have been (and still is) totally valid and reasonable.
I think the main takeaway from this post is that the AI safety community as a whole is still confused about the very problem we are facing. The only way to dissolve this tangle is to have more communication and clarification of the fundamental ideas in AI safety, particularly in the form of writing which is made widely available. And while it would be great to have AI safety researchers explaining their perspectives more often, I think there is still a lot of explicatory work which can be done regardless of technical background. In addition to analysis of the arguments discussed in this post, I think it would be particularly useful to see more descriptions of deployment scenarios and corresponding threat models. It would also be valuable for research agendas to highlight which problem they are addressing, and the assumptions they require to succeed.
This post has benefited greatly from feedback from Rohin Shah, Alex Zhu, Beth Barnes, Adam Marblestone, Toby Ord, and the DeepMind safety team. All opinions are my own.