tl;dr: I know a bunch of EA/rationality-adjacent people who argue — sometimes jokingly and sometimes seriously — that the only way or best way to reduce existential risk is to enable an “aligned” AGI development team to forcibly (even if nonviolently) shut down all other AGI projects, using safe AGI.  I find that the arguments for this conclusion are flawed, and that the conclusion itself causes harm to institutions who espouse it.   Fortunately (according to me), successful AI labs do not seem to espouse this "pivotal act" philosophy.

[This post is also available on the EA Forum.]

How to read this post

Please read Part 1 first if you’re very impact-oriented and want to think about the consequences of various institutional policies more than the arguments that lead to the policies; then Parts 2 and 3.

Please read Part 2 first if you mostly want to evaluate policies based on the arguments behind them; then Parts 1 and 3.

I think all parts of this post are worth reading, but depending on who you are, I think you could be quite put off if you read the wrong part first and start feeling like I’m basing my argument too much on kinds-of-thinking that policy arguments should not be based on.

Part 1: Negative Consequences of Pivotal Act Intentions

Imagine it’s 2022 (it is!), and your plan for reducing existential risk is to build or maintain an institution that aims to find a way for you — or someone else you’ll later identify and ally with — to use AGI to forcibly shut down all other AGI projects in the world.  By “forcibly” I mean methods that violate or threaten to violate private property or public communication norms, such as by using an AGI to engage in…

  • cyber sabotage: hacking into competitors’ computer systems and destroy their data;
  • physical sabotage: deploying tiny robotic systems that locate and destroy AI-critical hardware without (directly) harming any humans;
  • social sabotage: auto-generating mass media campaigns to shut down competitor companies by legal means, or
  • threats: demonstrating powerful cyber or physical or social threats, and bargaining with competitors to shut down “or else”.

Hiring people for your pivotal act project is going to be tricky.  You’re going to need people who are willing to take on, or at least tolerate, a highly adversarial stance toward the rest of the world.  I think this is very likely to have a number of bad consequences for your plan to do good, including the following:

  1. (bad external relations)  People on your team will have a low trust and/or adversarial stance towards neighboring institutions and collaborators, and will have a hard time forming good-faith collaboration.  This will alienate other institutions and make them not want to work with you or be supportive of you.
  2. (bad internal relations)  As your team grows, not everyone will know each other very well.  The “us against the world” attitude will be hard to maintain, because there will be an ever weakening sense of “us”, especially as people quit and move to other institutions and conversely.  Sometimes, new hires will express opinions that differ from the dominant institutional narrative, which might pattern-match as “outsidery” or “norm-y” or “too caught up in external politics”, triggering feelings of internal distrust within the team that some people might defect on the plan to forcibly shut down other projects.  This will cause your team to get along poorly internally, and make it hard to manage people.
  3. (risky behavior) In the fortunate-according-to-you event that your team manages to someday wield a powerful technology, there will be a sense of pressure to use it to “finally make a difference” or other argument that boils down to acting quickly before competitors would have a chance to shut you down or at least defend themselves.  This will make it hard to stop your team from doing rash things that would actually increase existential risk.

Overall, building an AGI development team with the intention to carry out a “pivotal act” of the form “forcibly shut down all other A(G)I projects” is probably going to be a rough time, I predict.

Does this mean no institution in the world can have the job of preparing to shut down runaway technologies?  No; see “Part 3: it matters who does things”.

Part 2: Fallacies in Justifying Pivotal Acts

For pivotal acts of the form “shut down all (other) AGI projects”, there’s an argument  that I’ve heard repeatedly from dozens of people, which I claim has easy-to-see flaws if you slow down and visualize the world that the argument is describing.

This is not an argument that successful AI research groups (e.g., OpenAI, DeepMind, Anthropic) seem to espouse.  Nonetheless, I hear the argument frequently enough to want to break it down and refute it.

Here is the argument:

  1. AGI is a dangerous technology that could cause human extinction if not super-carefully aligned with human values.

    (My take: I agree with this point.)
     
  2. If the first group to develop AGI manages to develop safe AGI, but the group allows other AGI projects elsewhere in the world to keep running, then one of those other projects will likely eventually develop unsafe AGI that causes human extinction.

    (My take: I also agree with this point, except that I would bid to replace “the group allows” with “the world allows”, for reasons that will hopefully become clear in Part 3: It Matters Who Does Things.)
     
  3. Therefore, the first group to develop AGI, assuming they manage to align it well enough with their own values that they believe they can safely issue instructions to it, should use their AGI to build offensive capabilities for targeting and destroying the hardware resources of other AGI development groups, e.g., nanotechnology targeting GPUs, drones carrying tiny EMP charges, or similar.

    (My take: I do not agree with this conclusion, I do not agree that (1) and (2) imply it, and I feel relieved that every successful AI research group I talk to is also not convinced by this argument.)

The short reason why (1) and (2) do not imply (3) is that when you have AGI, you don’t have to use the AGI directly to shut down other projects.  

In fact, before you get to AGI, your company will probably develop other surprising capabilities, and you can demonstrate those capabilities to neutral-but-influential outsiders who previously did not believe those capabilities were possible or concerning.  In other words, outsiders can start to help you implement helpful regulatory ideas, rather than you planning to do it all on your own by force at the last minute using a super-powerful AI system.  

To be clear, I’m not arguing for leaving regulatory efforts entirely in the hands of governments with no help or advice or infrastructural contributions from the tech sector.  I’m just saying that there are many viable options for regulating AI technology without requiring one company or lab to do all the work or even make all the judgment calls.

Q: Surely they must be joking or this must be straw-manning... right?

A: I realize that lots of EA/R folks are thinking about AI regulation in a very nuanced and politically measured way, which is great.  And, I don't think the argument (1-3) above represents a majority opinion among the EA/R communities.  Still, some people mean it, and more people joke about it in an ambiguous way that doesn't obviously distinguish them from meaning it:

  • (ambiguous joking) I've numerous times met people at EA/R events who were saying extreme-sounding things like "[AI lab] should just melt all the chip fabs as soon as they get AGI", who when pressed about the extremeness of this idea will respond with something like "Of course I don't actually mean I want [some AI lab] to melt all the chip fabs".  Presumably, some of those people were actually just using hyperbole to make conversations more interesting or exciting or funny.  

    Part of my motivation in writing this post is to help cut down on the amount of ambiguous joking about such proposals.  As the development of more and more advanced AI technologies is becoming a reality, ambiguous joking about such plans has the potential to really freak people out if they don't realize you're exaggerating.
     
  • (meaning it) I have met at least a dozen people who were not joking when advocating for invasive pivotal acts along the lines of the argument (1-3) above.  That is to say, when pressed after saying something like (1-3), their response wasn't "Geez, I was joking", but rather, "Of course AGI labs should shut down other AGI labs; it's the only morally right thing for them to do, given that AGI labs are bad.  And of course they should do it by force, because otherwise it won't get done."

    In most cases, folks with these viewpoints seemed not to have thought about the cultural consequences of AGI research labs harboring such intentions over a period of years (Part 2), or the fallacy of assuming technologists will have to do everything themselves (Part 1), or the future possibility of making evidence available to support global regulatory efforts from a broader base of consensual actors (see Part 3).

    So, part of my motivation in writing this post is as a genuine critique of a genuinely expressed position.

Part 3: It Matters Who Does Things

I think it’s important to separate the following two ideas:

  • Idea A (for “Alright”): Humanity should develop hardware-destroying capabilities — e.g., broadly and rapidly deployable non-nuclear EMPs — to be used in emergencies to shut down potentially-out-of-control AGI situations, such as an AGI that has leaked onto the internet, or an irresponsible nation developing AGI unsafely.
  • Idea B (for “Bad”): AGI development teams should be the ones planning to build the hardware-destroying capabilities in Idea A.

For what it’s worth, I agree with Idea A, but disagree with Idea B:

Why I agree with Idea A

It’s indeed much nicer to shut down runaway AI technologies (if they happen) using hardware-specific interventions than attacks with big splash effects like explosives or brainwashing campaigns.  I think this is the main reason well-intentioned people end up arriving at this idea, and Idea B, but I think Idea B has some serious problems.

Why I disagree with Idea B

A few reasons!  First, there’s:

  • Action Consequence 1: the action of having an AGI carry out or even prescribe such a large intervention on the world — invading others’ private property to destroy their hardware — is risky and legitimately scary.  Invasive behavior is risky and threatening enough as it is; using AGI to do it introduces a whole range of other uncertainties, not least because the AGI could be deceptive or otherwise misaligned with humanity in ways that we don’t understand.

Second, before even reaching the point of taking the action prescribed in Idea B, merely harboring the intention of Idea B has bad consequences; echoing similar concerns as Part 1:

  • Intention Consequence 1: Racing.  Harboring Idea B creates an adversarial winner-takes-all relationship with other AGI companies racing to maintain
    • a degree of control over the future, and
    • the ability to implement their own pet theories on how safety/alignment should work, leading to more desperation, more risk-taking, and less safety overall.
  • Intention Consequence 2: Fear.  Via staff turnover and other channels, harboring Idea B signals to other AGI companies that you are willing to violate their property boundaries to achieve your goals, which will cause them to fear for their physical safety (e.g., because your incursion to invade their hardware might go awry and end up harming them personally as well).  This kind of fear leads to more desperation, more winner-takes-all mentality, more risk-taking, and less safety.

Summary

In Part 1, I argued that there are negative consequences to AGI companies harboring the intention to forcibly shut down other AGI companies.  In Part 2, I analyzed a common argument in favor of that kind of “pivotal act”, and found a pretty simple flaw stemming from fallaciously assuming that the AGI company has to do everything itself (rather than enlisting help from neutral outsiders, using evidence).  In Part 3, I elaborated more on the nuance regarding who (if anyone) should be responsible for developing hardware-shutdown technologies to protect humanity from runaway AI disasters, and why in particular AGI companies should not be the ones planning to do this, mostly echoing points from Part 1.

Fortunately, successful AI labs like DeepMind, OpenAI, and Anthropic do not seem to espouse this “pivotal act” philosophy for doing good in the world.  One of my hopes in writing this post is to help more EA/R folks understand why I agree with their position.





 

New Comment
26 comments, sorted by Click to highlight new comments since:

In fact, before you get to AGI, your company will probably develop other surprising capabilities, and you can demonstrate those capabilities to neutral-but-influential outsiders who previously did not believe those capabilities were possible or concerning.  In other words, outsiders can start to help you implement helpful regulatory ideas...

It is not for lack of regulatory ideas that the world has not banned gain-of-function research.

It is not for lack of demonstration of scary gain-of-function capabilities that the world has not banned gain-of-function research.

What exactly is the model by which some AI organization demonstrating AI capabilities will lead to world governments jointly preventing scary AI from being built, in a world which does not actually ban gain-of-function research?

(And to be clear: I'm not saying that gain-of-function research is a great analogy. Gain-of-function research is a much easier problem, because the problem is much more legible and obvious. People know what plagues look like and why they're scary. In AI, it's the hard-to-notice problems which are the central issue. Also, there's no giant economic incentive for gain-of-function research.)

Various thoughts that this inspires:

Gain of Function Ban as Practice-Run/Learning for relevant AI Bans

I have heard vague-musings-of-plans in the direction of "get the world to successfully ban Gain of Function research, as a practice-case for getting the world to successfully ban dangerous AI." 

I have vague memories of the actual top bio people around not being too focused on this, because they thought there were easier ways to make progress on biosecurity. (I may be conflating a few different statements – they might have just critiquing a particular strategy I mentioned for banning Gain of Function research)

A few considerations for banning Gain of Function research for AI-related reasons:

  • because you will gain skills / capacities that transfer into banning relevant AI systems. (i.e. "you'll learn what works")
  • you won't learn what works, but you'll hit a bunch of brick walls that teaches you what doesn't work.
  • A gain of function ban is "lower stakes" (biorisk is way less likely to kill everyone than AI), and (hopefully?) won't have many side effects that specifically make it harder to ban AI-stuff later. (By contrast, if you try ineffectually to regulate AI in some way, you will cause the AI industry to raise it's hackles, and maybe cause one political party to get a reputation as "the anti-AI party", causing the other party to become "anti-anti-AI" in response, or maybe you will get everyone's epistemics about what's worth banning all clouded.

Distinction between "Regulating AI is possible/impossible" vs "pivotal act framing is harmful/unharmful".

I currently believe John's point of "man, sure seems real hard to actually usefully regulate things even when it's comparatively easy, I don't feel that hopeful about regulation processes working."

But, that doesn't necessarily contradict the point that it's really hard to build an organization capable of unilaterally implementing a pivotal act, and that the process of doing is likely to create enemies, erode coordination-fabric, make people fearful, etc.

It seems obvious to me that the arguments in this post are true-to-some-degree. There's some actual math / hashing-out that I haven't seen to my satisfaction of how all the arguments actually balance against each toher. 

Something feels off about the way people relate to "everyone else's ideas seeming more impossible than mine own, even if we all agree it's pretty impossible."

+1 to the distinction between "Regulating AI is possible/impossible" vs "pivotal act framing is harmful/unharmful".

I'm sympathetic to a view that says something like "yeah, regulating AI is Hard, but it's also necessary because a unilateral pivotal act would be Bad". (TBC, I'm not saying I agree with that view, but it's at least coherent and not obviously incompatible with how the world actually works.) To properly make that case, one has to argue some combination of:

  • A unilateral pivotal act would be so bad that it's worth accepting a much higher chance of human extinction in order to avoid it, OR
  • Aiming for a unilateral pivotal act would not reduce the chance of human extinction much more than aiming for a multilateral pivotal act

I generally expect people opposed to the pivotal act framing to have the latter in mind rather than the former. The obvious argument that aiming for a unilateral pivotal act does reduce the chance of human extinction much more than aiming for a multilateral pivotal act is that it's much more likely that someone could actually perform a unilateral pivotal act; it is a far easier problem, even after accounting for the problems the OP mentions in Part 1. That, I think, is the main view one would need to argue against in order to make the case for multilateral over unilateral pivotal act as a goal. The OP doesn't really make that case at all; it argues that aiming for unilateral introduces various challenges, but it doesn't even attempt to argue that those challenges would be harder than (or even comparably hard to) getting all the major world governments to jointly implement an actually-effective pivotal act.

John, it seems like you're continuing to make the mistake-according-to-me of analyzing the consequences of a pivotal act without regard for the consequences of the intentions leading up to the act.  The act can't come out of a vacuum, and you can't built a project compatible with the kind of invasive pivotal acts I'm complaining about without causing a lot of problems leading up to the act, including triggering a lot of fear and panic for other labs and institutions.  To summarize from the post title: pivotal act intentions directly have negative consequences fox x-safety, and people thinking about the acts alone seem to be ignoring the consequences of the intentions leading up to the act, which is a fallacy.

I see the argument you're making there. I still think my point stands: the strategically relevant question is not whether unilateral pivotal act intentions will cause problems, the question is whether aiming for a unilateral pivotal act would or would not reduce the chance of human extinction much more than aiming for a multilateral pivotal act. The OP does not actually attempt to compare the two, it just lists some problems with aiming for a unilateral pivotal act.

I do think that aiming for a unilateral act increases the chance of successfully executing the pivotal act by multiple orders of magnitude, even accounting for the part where other players react to the intention, and that completely swamps the other considerations.

Just as a related idea, in my mind, I often do a kind of thinking that HPMOR!Harry would call “Hufflepuff Bones”, where I look for ways a problem is solvable in physical reality at all, before considering ethical and coordination and even much in the way of practical concerns.

it's much more likely that someone could actually perform a unilateral pivotal act; it is a far easier problem, even after accounting for the problems the OP mentions in Part 1.

What I've never understood about the pivotal act plan is exactly what the successful AGI team is supposed to do after melting the GPUs or whatever. Every government on Earth will now consider them their enemy; they will immediately be destroyed unless they can defend themselves militarily, then countries will simply rebuild the GPU factories and continue on as before(except now in a more combative, disrupted, AI-race-encouraging geopolitical situation). So any pivotal act seems to require, at a minimum, an AI capable of militarily defeating all countries' militaries. Then in order to not have society collapse, you probably need to become the government yourself, or take over or persuade existing governments to go along with your agenda. But an AGI that would be capable of doing all this safely seems...not much easier to create than a full-on FAI? It's not like you could get by with an AI that was freakishly skilled at designing nanomachines but nothing else, you'd need something much more general. But isn't the whole idea of the pivotal act plan that you don't need to solve alignment in full generality to execute a pivotal act? For these reasons, executing a unilateral pivotal act(that actually results in an x-risk reduction) does not seem obviously easier than convincing governments to me.

Oh, melting the GPUs would not actually be a pivotal act. There would need to be some way to prevent new GPUs from being built in order for it to be a pivotal act.

Military capability is not strictly necessary; a pivotal act need not necessarily piss off world governments. AGI-driven propaganda, for instance, might avoid that.

Alternatively, an AGI could produce nanomachines which destroy GPUs, are extremely hard to eradicate, but otherwise don't do much of anything.

(Note that these aren't intended to be very good/realistic suggestions, they're just meant to point to different dimensions of the possibility space.)

Oh, melting the GPUs would not actually be a pivotal act

Well yeah, that's my point. It seems to me that any pivotal act worthy of the name would essentially require the AI team to become an AGI-powered world government, which seems pretty darn difficult to pull off safely. The superpowered-AI-propaganda plan falls under this category. The long-lasting nanomachines idea is cute, but I bet people would just figure out ways to evade the nanomachines' definition of 'GPU'.

Note that these aren't intended to be very good/realistic suggestions, they're just meant to point to different dimensions of the possibility space

Fair enough...but if the pivotal act plan is workable, there should be some member of that space which actually is good/seems like it has a shot of working out in reality(and which wouldn't require a full FAI). I've never heard any and am having a hard time thinking of one. Now it could be that MIRI or others think they have a workable plan which they don't want to share the details of due to infohazard concerns. But as an outside observer, I have to assign a certain amount of probability to that being self-delusion.

Well yeah, that's my point. It seems to me that any pivotal act worthy of the name would essentially require the AI team to become an AGI-powered world government, which seems pretty darn difficult to pull off safely. The superpowered-AI-propaganda plan falls under this category.

Yeah. I think this sort of thing is why Eliezer thinks we're doomed – getting the humanity to coordinate collectively seems doomed (i.e. see Gain of Function Research), and there are no weak pivotal acts that aren't basically impossible to execute safely.

The nanomachine gpu-melting pivotal act is meant to be a gesture at the difficulty / power level, not an actual working example. The other gestured-example I've heard is "upload aligned people who think hard for 1000 subjective years and hopefully figure something out." I've heard someone from MIRI argue that one is also unworkable but wasn't sure on the exact reasons.

Yeah. I think this sort of thing is why Eliezer thinks we're doomed

Hmm, interesting...but wasn't he more optimistic a few years ago, when his plan was still "pull off a pivotal act with a limited AI"? I thought the thing that made him update towards doom was the apparent difficulty of safely making even a limited AI, plus shorter timelines.

other gestured-example I've heard is "upload aligned people who think hard for 1000 subjective years and hopefully figure something out."

Ah, that actually seems like it might work. I guess the problem is that an AI that can competently do neuroscience well enough to do this would have to be pretty general. Maybe a more realistic plan along the same lines might be to try using ML to replicate the functional activity of various parts of the human brain and create 'pseudo-uploads'. Or just try to create an AI with similar architecture and roughly-similar reward function to us, hoping that human values are more generic than they might appear.

It seems relatively plausible that you could use a Limited AGI to build a nanotech system capable of uploading a diverse assortment of (non-brain, or maybe only very small brains) living tissue without damaging them, and that this system would learn how to upload tissue in a general way. Then you could use the system (not the AGI) to upload humans (tested on increasingly complex animals). It would be a relatively inefficient emulation, but it doesn't seem obviously doomed to me.

Probably too late once hardware is available to do this though.

Followup point on the Gain-of-Function-Ban as practice-run for AI:

My sense is that the biorisk people who were thinking about Gain-of-Function-Ban were not primarily modeling it as a practice run for regulating AGI. This may result in them not really prioritizing it.

I think biorisk is significantly lower than AGI risk, so if it's tractable and useful to regulate Gain of Function research as a practice run for regulating AGI, it's plausible this is actually much more important than business-as-usual biorisk. 

BUT I think smart people I know seem to disagree about how any of this works, so the "if tractable and useful" conditional is pretty non-obvious to me.

If bio-and-AI-people haven't had a serious conversation about this where they mapped out the considerations in more detail, I do think that should happen.

(edited)

There are/could be crucial differences between GoF and some AGI examples. 

Eg, a convincing demonstration of the ability to overthrow the government. States are also agents, also have convergent instrumental goals. GoF research seems much more threatening to individual humans, but not that much threatening to states or governments. 

I agree it's not necessarily a good idea to go around founding the Let's Commit A Pivotal Act AI Company.

But I think there's room for subtlety somewhere like "Conditional on you being in a situation where you could take a pivotal act, which is a small and unusual fraction of world-branches, maybe you should take a pivotal act."

That is, if you are in a position where you have the option to build an AI capable of destroying all competing AI projects, the moment you notice this you should update heavily in favor of short timelines (zero in your case, but everyone else should be close behind) and fast takeoff speeds (since your AI has these impressive capabilities). You should also update on existing AI regulation being insufficient (since it was insufficient to prevent you)

Somewhere halfway between "found the Let's Commit A Pivotal Act Company" and "if you happen to stumble into a pivotal act, take it", there's an intervention to spread a norm of "if a good person who cares about the world happens to stumble into a pivotal-act-capable AI, take the opportunity". I don't think this norm would necessarily accelerate a race. After all, bad people who want to seize power can take pivotal acts whether we want them to or not. The only people who are bound by norms are good people who care about the future of humanity. I, as someone with no loyalty to any individual AI team, would prefer that (good, norm-following) teams take pivotal acts if they happen to end up with the first superintelligence, rather than not doing that.

Another way to think about this is that all good people should be equally happy with any other good person creating a pivotal AGI, so they won't need to race among themselves. They might be less happy with a bad person creating a pivotal AGI, but in that case you should race and you have no other option. I realize "good" and "bad" are very simplistic but I don't think adding real moral complexity changes the calculation much.

I am more concerned about your point where someone rushes into a pivotal act without being sure their own AI is aligned. I agree this would be very dangerous, but it seems like a job for normal cost-benefit calculation: what's the risk of your AI being unaligned if you act now, vs. someone else creating an unaligned AI if you wait X amount of time? Do we have any reason to think teams would be systematically biased when making this calculation?

That is, if you are in a position where you have the option to build an AI capable of destroying all competing AI projects, the moment you notice this you should update heavily in favor of short timelines (zero in your case, but everyone else should be close behind) and fast takeoff speeds (since your AI has these impressive capabilities). You should also update on existing AI regulation being insufficient (since it was insufficient to prevent you)

A functioning Bayesian should have probably have updated to that position long before they actually have the AI. 

Destroying all competing AI projects might mean that the AI took a month to find a few bugs in linux and tensorflow and create something that's basically the next stuxnet. This doesn't sound like that fast a takeoff to me. 

The regulation is basically non-existant and will likely continue to be so. 

I mean making superintelligent AI probably breaks a bunch of laws, technically, as interpreted by a pedantic and literal minded laws. But breathing probably technically breaks a bunch of laws. Some laws are just overbroad, technically ban everything and are generally ignored. 

Any enforced rule that makes it pragmatically hard to make AGI would basically have to be a ban on computers (or at least programming) 

This mostly seems to be an argument for: "It'd be nice if no pivotal act is necessary", but I don't think anyone disagrees with that.

As for "Should an AGI company be doing this?" the obvious answer is "It depends on the situation". It's clearly nice if it's not necessary. Similarly, if [the world does the enforcement] has higher odds of success than [the AGI org does the enforcement] then it's clearly preferable - but it's not clear that would be the case.

I think it's rather missing the point to call it a "pivotal act philosophy" as if anyone values pivotal acts for their own sake. Some people just think they're plausibly necessary - as are many unpleasant and undesirable acts. Obviously this doesn't imply they should be treated lightly, or that the full range of more palatable options shouldn't be carefully considered,

I don't buy that an intention to perform pivotal acts is a significant race-dynamic factor: incentives to race seem over-determined already. If we could stop the existing race, I imagine most pivotal-act advocates would think a pivotal act were much less likely to be necessary.

Depending on the form an aligned AGI takes, it's also not clear that the developing organisation gets to decide/control what it does. Given that special-casing avoidance of every negative side-effect is a non-starter, an aligned AGI will likely need a very general avoids-negative-side-effects mechanism. It's not clear to me that an aligned AGI that knowingly permits significant avoidable existential risk (without some huge compensatory upside) is a coherent concept.

If you're allowing a [the end of the world] side-effect, what exactly are you avoiding, and on what basis? As soon as your AGI takes on any large-scale long-term task, then [the end of the world] is likely to lead to a poor outcome on that task, and [prevent the end of the world] becomes an instrumental goal.

Forms of AGI that just do the pivotal act, whatever the creators might think about it, are at least plausible.
I assume this will be an obvious possibility for other labs to consider in planning.

This mostly seems to be an argument for: "It'd be nice if no pivotal act is necessary", but I don't think anyone disagrees with that.

It's arguing that, given that your organization has scary (near) AGI capabilities, it is not so much harder (to get a legitimate authority to impose an off-switch on the world's compute) than (to 'manufacture your own authority' to impose that off-switch) such that it's worth avoiding the cost of (developing those capabilities while planning to manufacture authority). Obviously there can be civilizations where that's true, and civilizations where that's not true.

Idea A (for “Alright”): Humanity should develop hardware-destroying capabilities — e.g., broadly and rapidly deployable non-nuclear EMPs — to be used in emergencies to shut down potentially-out-of-control AGI situations, such as an AGI that has leaked onto the internet, or an irresponsible nation developing AGI unsafely.

Sounds obviously impossible in real life, so how about you go do that and then I'll doff my hat in amazement and change how I speak of pivotal acts. Go get gain-of-function banned, even, that should be vastly simpler. Then we can talk about doing the much more difficult thing. Otherwise it seems to me like this is just a fairytale about what you wouldn't need to do in a brighter world than this.

Eliezer, from outside the universe I might take your side of this bet.  But I don't think it's productive to give up on getting mainstream institutions to engage in cooperative efforts to reduce x-risk.

A propos, I wrote the following post in reaction to positions-like-yours-on-this-issue, but FYI it's not just you (maybe 10% you though?):
https://www.lesswrong.com/posts/5hkXeCnzojjESJ4eB 

In fact, before you get to AGI, your company will probably develop other surprising capabilities, and you can demonstrate those capabilities to neutral-but-influential outsiders who previously did not believe those capabilities were possible or concerning. In other words, outsiders can start to help you implement helpful regulatory ideas, rather than you planning to do it all on your own by force at the last minute using a super-powerful AI system.

This all seems like it would be good news. For the record I think that the necessary evidence to start acting has been around for decades if not longer (humans, evolution, computers, etc) and I don’t bet on a future such turning-point (there is no fire alarm). Would be happy to see a credible argument to the contrary.

Also all the cool shit you can do with AI feels like it will apply orders of magnitude more pressure on the “economic forces are pushing our civilization to make more” side than the “oh some weird chance this shit FOOMs and ends the world so let’s regulate this really well” side.

To be clear, I’m not arguing for leaving regulatory efforts entirely in the hands of governments with no help or advice or infrastructural contributions from the tech sector. I’m just saying that there are many viable options for regulating AI technology without requiring one company or lab to do all the work or even make all the judgment calls.

You think there are many viable options, I would be interested in hearing three.

The synthesis of these options would be an AGI research group whose plan consists of:

  • Develop safe AGI.
  • Try to convince world governments to perform some such pivotal act (Idea A) - note that per current institutions this needs consensus and strong implementation across all major and medium tech powers.
  • Have a back-up plan, if AGI research is proliferating without impending shutdown, to shut down world research unilaterally (Idea B).

What do you think of such a plan?

I think this would be reasonable, but if the plan is taken up then it becomes a cost-benefit analysis of when Idea B should be deployed, which plausibly could be very aggressive, so it could easily boil down to just Idea B. 

It's also worth noting that a research group with an AGI who want world governments to perform a pivotal act would need to be incredibly effective and persuasive. Their options would run a spectrum from normal public-channel and lobbying efforts to AGI-takes-over-the-world-behind-the-scenes (depending on sufficient capability), with a variety of AGI-assisted persuasion techniques in between. At some degree of AI/research group control over government, it's not clear if this would be an improvement over the original act. Demonstrating the power of AGI in a way that would force governments to listen would need to at least threaten a transformative act (self-driving cars, solving protein folding, passing normal Turing tests clearly aren't enough) and so the necessary levels of influence and demonstrated capability would be large (and demonstrating capability has obvious potential drawbacks in sparking arms races).

When this post came out, I left a comment saying:

It is not for lack of regulatory ideas that the world has not banned gain-of-function research.

It is not for lack of demonstration of scary gain-of-function capabilities that the world has not banned gain-of-function research.

What exactly is the model by which some AI organization demonstrating AI capabilities will lead to world governments jointly preventing scary AI from being built, in a world which does not actually ban gain-of-function research?

Given how the past year has gone, I should probably lose at least some Bayes points for this. Not necessarily very many Bayes points; notably there is still not a ban on AI capabilities research, and it doesn't look like there will be. But the world has at least moved marginally closer to world governments actually stopping AI capabilities work, over the past year.

In Part 2, I analyzed a common argument in favor of that kind of “pivotal act”, and found a pretty simple flaw stemming from fallaciously assuming that the AGI company has to do everything itself (rather than enlisting help from neutral outsiders, using evidence).

For the record this does seem like the cruxy part of the whole discussion, and I think more concrete descriptions of alternatives would help assuage my concerns here.

Suppose you develop the first AGI. It fooms. The AI tells you that it is capable of gaining total cosmic power by hacking physics in a millisecond. (Being an aligned AI, its waiting for your instructions before doing that.) It also tells you that the second AI project is only 1 day behind, and they have screwed up alignment.  

Options.

  1. Do nothing. Unfriendly AI gains total cosmic power tomorrow.
  2. Lightspeed bubble of hedonium. All humans are uploaded into a virtual utopia by femtobots. The sun is fully disassembled for raw materials whithin 10 minutes of you giving the order.
  3. Subtly break their AI. A cyberattack that stops their AI from doing anything, and otherwise has no effect. 
  4. Use the total cosmic power to do something powerful and scary. Randomly blow up mars. Tell the world that you did this using AI, and therefore AI should be regulated. Watch 20 hours of headless chicken flailing before the world ends. 
  5. Blow up mars and then use your amazing brainwashing capabilities to get regulations passed and enforced within 24 hours. 
  6. Something else.

Personally I think that 2 and 3 would be the options to consider. 

In fact, before you get to AGI, your company will probably develop other surprising capabilities, and you can demonstrate those capabilities to neutral-but-influential outsiders who previously did not believe those capabilities were possible or concerning. 

Which neutral but influential observers? Politicians that only know how to play signalling games and are utterly mentally incapable of engaging with objective reality in any way? There is now cabal of powerful people who will start acting competently and benevolently the moment they get unambiguous evidence of "intelligence is powerful". A lot of the smart people who know about AI have already realized this. The people who haven't realized will often not be very helpful. Sure, you can get a bit of a boost. You could get MIRI a bit of extra funding. 

 

Lets work our way backwards. Lets imagine the future contains a utopia that lasts billions of years, and contains many many humanlike agents. Why doesn't superintelligent AI created by the humans destroy this utopia.

  1. Every single human capable of destroying the world chooses not to. Requires at least a good bit of education. Quite possibly advanced brain nanotech to stop even one person going mad.
  2. Unfriendly Superintelligence won't destroy the world, our friendly superintelligence will keep it in check. Sure, possible. The longer you leave the unfriendly superintelligence on, the more risk and collateral. Best time to stop it is before it turns on.
  3. FAI in all your computer. Try it and you just get an "oops, that code is an unfriendly superintelligence" error. 
  4. Some earlier step doesn't work, eg there are no human  programable computers in this world. And some force stops humans making them. 
  5. All the humans are too dumb. The innumerate IQ 80 humans have no chance of making AI.
  6. Government of humans. Building ASI would take a lot of tech development. The human run government puts strict limits on that. Building any neural net is very illegal. Somehow the government doesn't get replaced by a pro AI government even on these timescales.

Imagine a contract. "We the undersigned agree that AGI is a powerful, useful but also potentially dangerous technology. To help avoid needlessly taking the same risk twice we agree that upon development of the worlds first AI, we will stop all attempts to create our own AI on the request of the first AI or its creators. In return the creator of the first AI will be nice with it." 

Then you aren't stopping all competitors. Your stopping the few people that can't cooperate.