Pivotal outcomes and pivotal processes

Andrew_Critch

Pivotal outcomes and pivotal processes — AI Alignment Forum

30 Pivotal outcomes and pivotal processes

17th Jun 2022

5 min read

30

tl;dr: If you think humanity is on a dangerous path, and needs to "pivot" toward a different future in order to achieve safety, consider how such a pivot could be achieved by multiple acts across multiple persons and institutions, rather than a single act. Engaging more actors in the process is more costly in terms of coordination, but in the end may be a more practicable social process involving less extreme risk-taking than a single "pivotal act".

Preceded by: “Pivotal Act” Intentions: Negative Consequences and Fallacious Arguments

[This post is also available on the EA Forum.]

In the preceding post, I argued for the negative consequences of the intention to carry out a pivotal act, i.e., a single, large world-changing act sufficient to 'pivot' humanity off of a dangerous path onto a safer one. In short, there are negative side effects of being the sort of institution aiming or willing to carry out a pivotal act, and those negative side effects alone might outweigh the benefit of the act, or prevent the act from even happening.

In this post, I argue that it's still a good idea for humanity-as-a-whole to make a large / pivotal change in its developmental trajectory in order to become safer. In other words, my main concern is not with the "pivot", but with trying to get the whole "pivot" from a single "act", i.e., from a single agent-like entity, such a single human person, institution, or AI system.

Pivotal outcomes and processes

To contrast with pivotal acts, here's a simplified example of a pivotal outcome that one could imagine making a big positive difference to humanity's future, which in principle could be brought about by a multiplicity of actors:

(the "AI immune system") The whole internet — including space satellites and the internet-of-things — becomes way more secure, and includes a distributed network of non-nuclear electromagnetic pulse emitters that will physically shut down any tech infrastructure appearing to be running rogue AI agents.

(For now, let's set aside debate about whether this outcome on its own would be pivotal, in the sense of pivoting humanity onto a safe developmental trajectory... it needs a lot more details and improvements to be adequate for that! My goal in this post is to focus on how the outcome comes about. So for the sake of argument I'm asking to take the "pivotality" of the outcome for granted.)

If a single institution imposed the construction of such an AI immune system on its own, that would constitute a pivotal act. But if a distributed network of several states and companies separately instituted different parts of the change — say, designing and building the EMP emitters, installing them in various jurisdictions, etc. — then I'd call that a pivotal distributed process, or pivotal process for short.

In summary, a pivotal outcome can be achieved through a pivotal (distributed) process without a single pivotal act being carried out by any one institution. Of course, the "can" there is very difficult, and involves solving a ton of coordination problems that I'm not saying humanity will succeed in solving. However, aiming for a pivotal outcome via a pivotal distributed process definitively seems safer to me, in terms of the dynamics it would create between labs and militaries, compared to a single lab planning to do it all on their own.

Revisiting the consequences of pivotal act intentions

In AGI Ruin, Eliezer writes the following, I believe correctly:

The reason why nobody in this community has successfully named a 'pivotal weak act' where you do something weak enough with an AGI to be passively safe, but powerful enough to prevent any other AGI from destroying the world a year later - and yet also we can't just go do that right now and need to wait on AI - is that nothing like that exists. There's no reason why it should exist. There is not some elaborate clever reason why it exists but nobody can see it. It takes a lot of power to do something to the current world that prevents any other AGI from coming into existence; nothing which can do that is passively safe in virtue of its weakness.

I think the above realization is important. The un-safety of trying to get a single locus of action to bring about a pivotal outcome all on its own is important, and it pretty much covers my rationale for why we (humanity) shouldn't advocate for unilateral actors doing that sort of thing.

Less convincingly-to-me, Eliezer then goes on to (seemingly) advocate for using AI to carry out a pivotal act, which he acknowledges would be quite a forceful intervention on the world:

If you can't solve the problem right now (which you can't, because you're opposed to other actors who don't want [it] to be solved and those actors are on roughly the same level as you) then you are resorting to some cognitive system that can do things you could not figure out how to do yourself, that you were not close to figuring out because you are not close to being able to, for example, burn all GPUs. Burning all GPUs would actually stop Facebook AI Research from destroying the world six months later; weaksauce Overton-abiding stuff about 'improving public epistemology by setting GPT-4 loose on Twitter to provide scientifically literate arguments about everything' will be cool but will not actually prevent Facebook AI Research from destroying the world six months later, or some eager open-source collaborative from destroying the world a year later if you manage to stop FAIR specifically. There are no pivotal weak acts.

I'm not entirely sure if the above is meant to advocate for AGI development teams planning to use their future AGI to burn other people's GPU's, but it could certainly be read that way, and my counterargument to that reading has already been written, in “Pivotal Act” Intentions: Negative Consequences and Fallacious Arguments. Basically, a lab X with the intention to burn all the world's GPUs will create a lot of fear that lab X is going to do something drastic that ends up destroying the world by mistake, which in particular drives up the fear and desperation of other AI labs to "get there first" to pull off their own version of a pivotal act. Plus, it requires populating the AGI lab with people willing to do some pretty drastically invasive things to other companies, in particular violating private property laws and state boundaries. From the perspective of a tech CEO, it's quite unnerving to employ and empower AGI developers who are willing to do that sort of thing. You'd have to wonder if they're going to slip out with a thumb drive to try deploying an AGI against you, because they have their own notion of the greater good that they're willing to violate your boundaries to achieve.

So, thankfully-according-to-me, no currently-successful AGI labs are oriented on carrying out pivotal acts, at least not all on their own.

Back to pivotal outcomes

Again, my critique of pivotal acts is not meant to imply that humanity has to give up on pivotal outcomes. Granted, it's usually harder to get an outcome through a distributed process spanning many actors, but in the case of a pivotal outcome for humanity, I argue that:

it's safer to aim for a pivotal outcome to be carried out by a distributed process spanning multiple institutions and states, because the process can happen in a piecemeal fashion that doesn't change the whole world at once, and
it's easier as well, because
1. you won't be constantly setting off alarm bells of the form "Those people are going to try to unilaterally change the whole world in a drastic way", and
2. you won't be trying to populate a lab with AGI developers who, in John Wentworth's terms, think like "villains" (source).

I'm not arguing that we (humanity) are going to succeed in achieving a pivotal outcome through a distributed process; only that it's a safer and more practical endeavor than aiming for a single pivotal act from a single institution.

Frontpage

Mentioned in

75What does it take to defend the world against out-of-control AGIs?

52Davidad's Bold Plan for Alignment: An In-Depth Explanation

56A list of core AI safety problems and how I hope to solve them

21AI for AI safety

19In response to critiques of Guaranteed Safe AI

Load More (5/8)

Pivotal outcomes and pivotal processes

New Comment

11 comments, sorted by

top scoring

Click to highlight new comments since: Today at 12:18 PM

[-]Eliezer Yudkowsky4y2140

(the "AI immune system") The whole internet — including space satellites and the internet-of-things — becomes way more secure, and includes a distributed network of non-nuclear electromagnetic pulse emitters that will physically shut down any tech infrastructure appearing to be running rogue AI agents.

Define "way more secure". Like, superhuman-at-security AGIs rewrote the systems to be formally unhackable even taking into account hardware vulnerabilities like Rowhammer that violate the logical chip invariants?

Can you talk a bit about the world global dictatorship running the electromagnetic pulse emitters, and how they monitor every computer in the world? What sort of violence do you envision being inflicted on any countries who don't want to submit their computers for monitoring? Is part of the plan to use AI drones to kill any political leaders who oppose this plan, so as to minimize civilian casualties? Who controls these AI drones, are we quite sure this world dictatorship stays friendly to its citizens? A lot of political processes leading to such a thing sound like they could potentially be scary.

I said "burn all GPUs" to be frank about these things being scary. It's easy for things to sound less scary when they're vague and the processes leading up to them are left vague. See also, George Orwell, "Politics and the English Language". We can't evaluate whether you have a less scary proposal until you make a less vague one.

[-]VojtaKovarik4y50

An attempted paraphrase, to hopefully-disentangle some claims:

Eliezer, list of AGI lethalities: pivotal acts are (necessarily?) "outside of the Overton window, or something"^[1].

Critch, preceding post: Strategies involving non-Overton elements are not worth it

Critch, this post: there are pivotal outcomes you can via a strategy with no non-Overton elements

Eliezer, this comment: the "AI immune system" example is not an example of a strategy with no non-Overton elements

Possible reading: Critch/the reader/Eliezer currently wouldn't be able to name a strategy towards a pivotal outcome, with no non-Overton elements

Extreme version of this: Any practical-in-our-world strategy towards a pivotal outcome necessarily contains some non-Overton elements

^{^}
Substitute your better characterization of the undesirable property here. I will just use "non-Overton" for the purposes of this comment.

[-]Rob Bensinger4y117

An example of a possible "pivotal act" I like that isn't "melt all GPUs" is:

Use AGI to build fast-running high-fidelity human whole-brain emulations. Then run thousands of very-fast-thinking copies of your best thinkers. Seems to me this plausibly makes it realistic to keep tabs on the world's AGI progress, and locally intervene before anything dangerous happens, in a more surgical way rather than via mass property destruction of any sort.

Looking for pivotal acts that are less destructive (and, more importantly for humanity's sake, less difficult to align) than "melt all GPUs" seems like a worthy endeavor to me. But I prefer the framing 'let's discuss the larger space of pivotal acts, brainstorm new ideas, and try to find options that are easier to achieve, because that particular toy proposal seems suboptimally dangerous and there just hasn't been very much serious analysis and debate about pathways'. In the course of that search, if it then turns out that the most likely-to-succeed option is a process, then we should obviously go with a process.

But I don't like constraining that search to 'processes only, not acts', because:

(a) I'm guessing something more local, discrete, and act-like will be necessary, even if it's less extreme than "melt all GPUs";
(b) insofar as I'm uncertain about which paths will be viable and think the problem is already extremely hard and extremely constrained, I don't want to further narrow the space of options that humanity can consider and reason through;
(c) I worry that the "processes" framing will encourage more Rube-Goldberg-machine-like proposals, where the many added steps and layers and actors obscure the core world-saving cognition and action, making it harder to spot flaws and compare tradeoffs;
and (d) I worry that the extra steps, layers, and actors will encourage "design by committee" and slow-downs that doom otherwise-promising projects.

I suspect we also have different intuitions about pivotal acts because we have different high-level pictures of the world's situation.

I think that humanity as it exists today is very far off from thinking like a serious civilization would about these issues. As a consequence, our current trajectory has a negligible chance of producing good long-run outcomes. Rather than trying to slightly nudge the status quo toward marginally better thinking, we have more hope if we adopt a heuristic like speak candidly and realistically about things, as though we lived on the Earth that does take these issues seriously, and hope that this seriousness and sanity might be infectious.

On my model, we don't have much hope if we continue to half-say-the-truth, and continue to make small steady marginal gains, and continue to talk around the hard parts of the problem; but we do have the potential within us to just drop the act and start fully sharing our models and being real with each other, including being real about the parts where there will be harsh disagreements.

I think that a large part of the reason humanity is currently endangering itself is that everyone is too focused on 'what's in the Overton window?', and is too much trying to finesse each other's models and attitudes, rather than blurting out their actual views and accepting the consequences.

This makes the situation I described in The inordinately slow spread of good AGI conversations in ML much stickier: very little of the high-quality / informed public discussion of AGI is candid and honest, and people notice this, so updating and epistemic convergence is a lot harder; and everyone is dissembling in the same direction, toward 'be more normal', 'treat AGI more like business-as-usual', 'pretend that the future is more like the past'.

All of this would make me less eager to lean into proposals like "yes, let's rush into establishing a norm that large parts of the strategy space are villainous and not to be talked about" even if I agreed that pivotal processes are a better path to long-run good outcomes than pivotal acts. This is inviting even more of the central problem with current discourse, which is that people don't feel comfortable even talking about their actual views.

You may not think that a pivotal act is necessary, but there are many who disagree with you. Of those, I would guess that most aren't currently willing to discuss their thoughts, out of fear that the resultant discussion will toss norms of scholarly discussion out the window. This seems bad to me, and not like the right direction for a civilization to move into if it's trying to emulate 'the kind of civilization that handles AGI successfully'. I would rather a world where humanity's best and brightest were debating this seriously, doing scenario analysis, assigning probabilities and considering specific mainline and fallback plans, etc., over one where we prejudge 'discrete pivotal acts definitely won't be necessary' and decide at the outset to roll over and die if it does turn out that pivotal acts are necessary.

My alternative proposal would be: Let's do scholarship at the problem, discuss it seriously, and not let this topic be ruled by 'what is the optimal social-media soundbite?'.

If the best idea sounds bad in soundbite form, then let's have non-soundbite-length conversations about it. It's an important enough topic, and a complex enough one, that this would IMO be a no-brainer in a world well-equipped to handle developments like AGI.

it's safer to aim for a pivotal outcome to be carried out by a distributed process spanning multiple institutions and states, because the process can happen in a piecemeal fashion that doesn't change the whole world at once

We should distinguish "safer" in the sense of "less likely to cause a bad outcome" from "safer" in the sense of "less likely to be followed by a bad outcome".

E.g., the FDA banning COVID-19 testing in the US in the early days of the pandemic was "safer" in the narrow sense that they legitimately reduced the risk that COVID-19 tests would cause harm. But the absence of testing resulted in much more harm, and was "unsafe" in that sense.

Similarly: I'm mildly skeptical that humanity refusing to attempt any pivotal acts makes us safer from the particular projects that enact this norm. But I'm much more skeptical that humanity refusing to attempt any pivotal acts makes us safer from harm in general. These two versions of "safer" need to be distinguished and argued for separately.

Any proposal that adds red tape, inefficiencies, slow-downs, process failures, etc. will make AGI projects "safer" in the first sense, inasmuch as it cripples the project or slows it down to the point of irrelevance.

As someone who worries that timelines are probably way too short for us to solve enough of the "pre-AGI alignment prerequisites" to have a shot at aligned AGI, I'm a big fan of sane, non-adversarial ideas that slow down the field's AGI progress today.

But from my perspective, the situation is completely reversed when you're talking about slowing down a particular project's progress when they're actually building, aligning, and deploying their AGI.

At some point, a group will figure out how to build AGI. When that happens, I expect an AGI system to destroy the world within just a few years, if no pivotal act or processes finishes occurring first. And I expect safety-conscious projects to be at a major speed disadvantage relative to less safety-conscious projects.

Adding any unnecessary steps to the process—anything that further slows down the most safety-conscious groups—seems like suicide to me, insofar as it either increases the probability that the project fails to produce a pivotal outcome in time, or increases the probability that the project cuts more corners on safety because it knows that it has that much less time.

I obviously don't want the first AGI projects to rush into a half-baked plan and destroy the world. First and foremost, do not destroy the world by your own hands, or commit the fallacy of "something must be done, and this is something!".

But I feel more worried about AGI projects insofar as they don't have a lot of time to carefully align their systems (so I'm extremely reluctant to tack on any extra hurdles that might slow them down and that aren't crucial for alignment), and also more worried insofar as they haven't carefully thought about stuff like this in advance. (Because I think a pivotal act is very likely to be necessary, and I think disaster is a lot more likely if people don't feel like they can talk candidly about it, and doubly so if they're rushing into a plan like this at the last minute rather than having spent decades prior carefully thinking about and discussing it.)

[-]Jan_Kulveit4y41

In my view, in practice, the pivotal acts framing actually pushes people to consider a more narrow space of discrete powerful actions, "sharp turns", "events that have a game-changing impact on astronomical stakes".

As I understand it, the definition of "pivotal acts" explicitly forbids to consider things like "this process would make 20% per year of AI developers actually take safety seriously with 80% chance" or "what class of small shifts would in aggregate move the equilibrium?". (Where things in this category get straw-manned as "Rube-Goldberg-machine-like")

As often, one of the actual cruxes is in continuity assumptions, where basically you have a low prior on "smooth trajectory changes by many acts" and high prior on "sharp turns left or right".

Second crux, as you note, is doom-by-default probability: if you have a very high doom probability, you may be in favour of variance-increasing acts, where people who are a few bits more optimistic may be much less excited about them, in particular if all plans for such acts they have very unclear shapes of impact distributions.

Given this deep prior differences, it seems reasonable to assume this discussion will lead nowhere in particular. (I've a draft with a more explicit argument why.)

[-]Rob Bensinger4y6-2

In my view, in practice, the pivotal acts framing actually pushes people to consider a more narrow space of discrete powerful actions, "sharp turns", "events that have a game-changing impact on astronomical stakes".

My objection to Critch's post wasn't 'you shouldn't talk about pivotal processes, just pivotal acts'. On the contrary, I think bringing in pivotal processes is awesome.

My objection (more so to "Pivotal Act" Intentions, but also to the new one) is specifically to the idea that we should socially shun the concept of "pivotal acts", and socially shun people who say they think humanity needs to execute a pivotal act, or people who say positive things about some subset of pivotal acts.

This seems unwise to me, because it amounts to giving up on humanity's future in the worlds where it turns out humanity does need to execute a pivotal act. Suppose you have this combination of beliefs:

Humanity probably won't need to execute any pivotal acts in order to avoid existential catastrophe.
... But there's a non-tiny chance (e.g., 10%) that at least one pivotal act will in fact be necessary.
A decent number of people currently misunderstand the idea of "pivotal acts" as evil/adversarial/"villainous", in spite of the fact that there's a decent chance humanity will need someone to commit this "villainy" in order to prevent the death of every human on Earth.

I personally think that a large majority of humanity's hope lies in someone executing a pivotal act. But I assume Critch disagrees with this, and holds a view closer to 1+2+3.

If so, then I think he shouldn't go "well, pivotal acts sound weird and carry some additional moral hazards, so I will hereby push for pivotal acts to become more stigmatized and hard to talk about, in order to slightly increase our odds of winning in the worlds where pivotal acts are unnecessary".

Rather, I think hypothetical-Critch should promote the idea of pivotal processes, and try to reduce any existing stigma around the idea of pivotal acts, so that humanity is better positioned to evade destruction if we do end up needing to do a pivotal act. We should try to set ourselves up to win in more worlds.

(Where things in this category get straw-manned as "Rube-Goldberg-machine-like")

If you're referring to my comment, then this is itself straw-manning me!

Rube-Goldberg-ishness is a matter of degree: as you increase the complexity of a plan, it becomes harder to analyze, and tends to accumulate points of failure that reduce the probability of success. This obviously doesn't mean we should pick the simplest possible plan with no consideration for anything else; but it's a cost to keep in mind, like any other.

I mentioned this as a quantitative cost to keep in mind; "things in this category get straw-manned as 'Rube-Goldberg-machine-like'" seems to either be missing the fact that this is a real cost, or treating me as making some stronger and more specific claim.

As often, one of the actual cruxes is in continuity assumptions, where basically you have a low prior on "smooth trajectory changes by many acts" and high prior on "sharp turns left or right".

This seems wrong to me, in multiple respects:

Continuity assumptions are about what's likely to happen, not about what's desirable. It would be a separate assumption to say "continuity is always good", and I worry that a reasoning error is occurring if this is being conflated with "continuity tends to occur".

Why this matters here: My claim is that pivotal acts are likely to be necessary for good outcomes, not that they're necessarily likely to occur. If your choices are "execute a pivotal act, or die", then insofar as you're confident this is the case, the base rate of continuous events just isn't relevant.
The primary argument for hard takeoff isn't "stuff tends to be discontinuous"; it's "AGI is a powerful invention, and e.g. GPT-3 isn't a baby AGI". The discontinuity of hard takeoff is not a primitive; it's an implication of the claim that AGI is different from current AI tech, that it contains a package of qualitatively new kinds of cognition that aren't just 'what GPT-3 is currently doing, but scaled up'.

No one claims that AlphaGo needs to be continuous with theorem-proving AI systems, or that a washing machine needs to be continuous with a chariot. The core disagreement here is about whether X and Y are the same kind of thing, not about whether incremental tweaks to a given kind of thing tend to produce small improvements.

I think you should be more of a fox with respect to continuity, and less of a hedgehog. The reason hard takeoff is very likely true isn't some grand, universal Discontinuity Narrative. It's just that different things work differently. Sometimes you get continuities; sometimes you don't. To figure out which is which, you need to actually analyze the specific phenomenon under discussion, not just consult the universal cosmic base rate of continuity.

(And indeed, I think Paul is doing a lot more 'analyze the specific phenomenon under discussion' than you seem to give him credit for. I think it's straw-manning Paul and Eliezer to reduce their disagreement to a flat 'we have different priors about how many random things tend to be continuous'.)

Second crux, as you note, is doom-by-default probability: if you have a very high doom probability, you may be in favour of variance-increasing acts

I agree with this in general, but I think this is a wrong lens for thinking about pivotal acts. On my model, a pivotal act isn't a hail mary that you attempt because you want to re-roll the dice; it's more like a very specific key that is needed in order to open a very specific lock. Achieving good outcomes is a very constrained problem, and you need to do a lot of specific things in order to make things go well.

We may disagree about variance-increasing tactics in other domains, but our disagreement about pivotal acts is about whether some subset of the specific class of keys called 'pivotal acts' is necessary and/or sufficient to open the lock.

Given this deep prior differences, it seems reasonable to assume this discussion will lead nowhere in particular. (I've a draft with a more explicit argument why.)

I'm feeling much more optimistic than you about trying to resolve these points, in part because I feel that you've misunderstood almost every aspect of my view and of my comment above! If you're that far from passing my ITT, then there's a lot more hope that we may converge in the course of incrementally changing that.

(Or non-incrementally changing that. Sometimes non-continuous things do happen! 'Gaining understanding of a topic' being a classic example of a domain with many discontinuities.)

[-]Jan_Kulveit4y3-2

With the last point: I think can roughly pass your ITT - we can try that, if you are interested.

So, here is what I believe are your beliefs

With pretty high confidence, you expect sharp left turn to happen (in almost all trajectories)
This is to a large extent based on the belief that at some point "systems start to work really well in domains really far beyond the environments of their training" which is roughly the same as "discovering a core of generality" and few other formulations. These systems will be in some meaningful sense fundamentally different from eg Gato
From your perspective, this is based on thinking deeply about the nature of such system (note that this mostly based on hypothetical systems, and an analogy with evolution)
My claim roughly is this is only part of what's going on, where the actual think is: people start with a deep prior on "continuity in the space of intelligent systems". Looking into a specific question about hypothetical systems, their search in argument space is guided by this prior, and they end up mostly sampling arguments supporting their prior. (This is not to say the arguments are wrong.)
You probably don't agree with the above point, but notice the correlations:
- You expect sharp left turn due to discontinuity in "architectures" dimensions (which is the crux according to you)
- But you also expect jumps in capabilities of individual systems (at least I think so)
- Also, you expect majority of hope in a "sharp right turn" histories (in contrast to smooth right turn histories)
- And more
In my view yours (or rather MIRI-esque) views on the above dimensions are correlated more than expected, which suggest the existence of hidden variable/hidden model explaining the correlation.

I personally think that a large majority of humanity's hope lies in someone executing a pivotal act. But I assume Critch disagrees with this, and holds a view closer to 1+2+3.
If so, then I think he shouldn't go "well, pivotal acts sound weird and carry some additional moral hazards, so I will hereby push for pivotal acts to become more stigmatized and hard to talk about, in order to slightly increase our odds of winning in the worlds where pivotal acts are unnecessary".
Rather, I think hypothetical-Critch should promote the idea of pivotal processes, and try to reduce any existing stigma around the idea of pivotal acts, so that humanity is better positioned to evade destruction if we do end up needing to do a pivotal act. We should try to set ourselves up to win in more worlds.

Can't speak for Critch, but my view is pivotal acts planned as pivotal acts, in the way how most people in LW community think about them, have only a very small chance of being the solution. (my guess is one or two bits more extreme, more like 2-5% than 10%).

I'm not sure if I agree with you re: the stigma. My impression is while the broader world doesn't think in terms of pivotal acts, if it payed more attention, yes, many proposals would be viewed with suspicion. On the other hand, I think on LW it's the opposite: many people share the orthodoxy views about sharp turns, pivotal acts, etc., and proposals to steer the situation more gently are viewed as unworkable or engaging in thinking with "too optimistic assumptions" etc.

Note that I advocate for considering much more weird solutions, and also thinking much more weird world states when talking with the "general world". While in contrast, on LW and AF, I'd like to see more discussion of various "boring" solutions on which the world can roughly agree.

Continuity assumptions are about what's likely to happen, not about what's desirable. It would be a separate assumption to say "continuity is always good", and I worry that a reasoning error is occurring if this is being conflated with "continuity tends to occur".

Basically, no. Continuity assumptions are about how the space looks like. Obviously forecasting questions ("what's likely to happen") often depend on ideas how the space looks like.

My claim is that pivotal acts are likely to be necessary for good outcomes, not that they're necessarily likely to occur. If your choices are "execute a pivotal act, or die", then insofar as you're confident this is the case, the base rate of continuous events just isn't relevant.

Yes but your other claim is "sharp left turn" is likely and leads to bad outcomes. So if we partition the space of outcomes good/bad, in both branches you assume it is very likely because of sharp turns.

The primary argument for hard takeoff isn't "stuff tends to be discontinuous"; it's "AGI is a powerful invention, and e.g. GPT-3 isn't a baby AGI". The discontinuity of hard takeoff is not a primitive; it's an implication of the claim that AGI is different from current AI tech, that it contains a package of qualitatively new kinds of cognition that aren't just 'what GPT-3 is currently doing, but scaled up'.

This is becoming maybe repetitive, but I'll try to paraphrase again. Consider the option the "continuity assumptions" I'm talking about are not grounded in "takeoff scenarios", but in "how you think about hypothetical points in the abstract space of intelligent systems".

Thinking about features of this highly abstract space, in regions which don't exist yet, is epistemically tricky (I hope we can at least agree on that).

It probably seems to you, you have many strong arguments giving you reliable insights about how the space works somewhere around "AGI".

My claim is: "Yes, but the process which generated the arguments is based on black-box neural net, which has a strong prior on things like "stuff like math is discontinuous"" (I suspect this "taste and intuition" box is located more in Eliezer's mind, and some other people updated "on the strenght of arguments") This isn't to imply various people haven't done a lot of thinking and generated a lot of arguments and intuitions about this. Unfortunately, given other epistemic constraints, in my view the "taste and intuitions" differences sort of "propagate" to "conclusion" differences.

[-]Rob Bensinger4y113

With pretty high confidence, you expect sharp left turn to happen (in almost all trajectories)
This is to a large extent based on the belief that at some point "systems start to work really well in domains really far beyond the environments of their training" which is roughly the same as "discovering a core of generality" and few other formulations. These systems will be in some meaningful sense fundamentally different from eg Gato

That's right, though the phrasing "discovering a core of generality" here sounds sort of mystical and mysterious to me, which makes me wonder whether you can see the perspective from which this is a very obvious and normal belief. I get a similar vibe when people talk about a "secret sauce" and say they can't understand why MIRI thinks there might be a secret sauce—treating generalizability as a sort of occult property.

The way I would phrase it is in very plain, concrete terms:

If a machine can multiply two-digit numbers together as well as four-digit numbers together, then it can probably multiply three-digit numbers together. The structure of these problems is similar enough that it's easier to build a generalist that can handle 'multiplication' than to solve two-digit and four-digit multiplication using fundamentally different techniques.
Similarly, it's easier to teach a human or AI how to navigate physical environments in general, than to teach them how to navigate all physical environments except parking garages. Parking garages aren't different enough from other physical environments, and the techniques for modeling and navigating physical spaces work too well, when they work at all.
Similarly, it's easier to build an AI that is an excellent physicist and has the potential to be a passable or great chemist and/or biologist, than to build an excellent physicist that just can't do chemistry or biology, no matter how many chemistry experiments or chemistry textbooks it sees. The problems have too much overlap.

We can see that the latter is true just by reflecting on what kinds of mental operations go into generating hypotheses about ontologies/carvings on the world, generating hypothesis about the state of the world given some ontology, fitting hypotheses about different levels/scales into a single cohesive world-model, calculating value of information, strategically directing attention toward more fruitful directions of thought, coming up with experiments, thinking about possible experimental outcomes, noticing anomalies, deducing implications and logical relationships, coming up with new heuristics and trying them out, etc. These clearly overlap enormously across the relevant domains.

We can also observe that this is in fact what happened with humans. We have zero special-purpose brain machinery for any science, or indeed for science as a category; we just evolved to be able to model physical environments well, and this generalized to all sciences once it generalized to any.

For things to not go this way would be quite weird.

From your perspective, this is based on thinking deeply about the nature of such system (note that this mostly based on hypothetical systems, and an analogy with evolution)

Doesn't seem to pass my ITT. Like, it's true in a sense that I'm 'thinking about hypothetical systems', because I only care about human cognition inasmuch as it seems likely to generalize to AGI cognition. But this still seems like it's treating generality as a mysterious occult property, and not as something coextensive with all our observations of general intelligences.

My claim roughly is this is only part of what's going on, where the actual think is: people start with a deep prior on "continuity in the space of intelligent systems". Looking into a specific question about hypothetical systems, their search in argument space is guided by this prior, and they end up mostly sampling arguments supporting their prior. (This is not to say the arguments are wrong.)

Seems to me that my core intuition is about there being common structure shared between physics research, biology research, chemistry research, etc.; plus the simple observation that humans don't have specialized evolved modules for chemistry vs physics vs biology. Discontinuity is an implication of those views, not a generator of those views.

Like, sure, if I had a really incredibly strong prior in favor of continuity, then maybe I would try really hard to do a mental search for reasons not to accept those prime-facie sources of discontinuity. And since I don't have a super strong prior like that, I guess you could call my absence of a super-continuity assumption a 'discontinuity assumption'.

But it seems like a weird and unnatural way of trying to make sense of my reasoning: I don't have an extremely strong prior that everything must be continuous, but I also don't have an extremely strong prior that everything must be spherical, or that everything must be purple. I'm not arriving at any particular conclusions via a generator that keeps saying 'not everything is spherical!' or 'not everything is purple!'; I'm not a non-sphere-ist or an anti-purple-ist; the deep secret heart and generator for all my views is not that I have a deep and abiding faith in "there exist non-spheres". And putting me in a room with some weird person who does think everything is a sphere doesn't change any of that.

You probably don't agree with the above point, but notice the correlations:
You expect sharp left turn due to discontinuity in "architectures" dimensions (which is the crux according to you)
But you also expect jumps in capabilities of individual systems (at least I think so)
Also, you expect majority of hope in a "sharp right turn" histories (in contrast to smooth right turn histories)

I would say that there are two relevant sources of discontinuity here:

AGI is an invention, and inventions happen at particular times. This inherently involves a 0-to-1 transition when the system goes from 'not working' to 'working'. Paul and I believe equally in discontinuities like this, though we may disagree about whether AGI has already been 'invented' (such that we just need to iterate and improve on it), vs. whether the invention lies in the future.
General intelligence is powerful and widely applicable. This is another category of discontinuity Paul believes can happen (e.g., washing machines are allowed to have capabilities that non-washing-machines lack; nukes are allowed to have capabilities that non-nukes lack), though Paul may be somewhat less impressed than me with general intelligence overall (resulting in a smaller gap/discontinuity). Separately, Paul's belief in AGI development predictability, AI research efficiency, and 'AGI is already solved' (see 1, above), each serve to reduce the importance of this discontinuity.

'AGI is an invention' and 'General intelligence is powerful' aren't weird enough beliefs, I think, to call for some special explanation like 'Rob B thinks the world is very discontinuous'. Those are obvious first-pass beliefs to have about the domain, regardless of whether they shake out as correct on further analysis.

'We need a pivotal act' is a consequence of 1 and 2, not a separate discontinuity. If AGI is a sudden huge dangerous deal (because 1 and 2 is true), then we'll need to act fast or we'll die, and there are viable paths to quickly ending the acute risk period. The discontinuity in the one case implies the discontinuity in this new case. There's no need for a further explanation.

[-]Rob Bensinger4y31

Note that I advocate for considering much more weird solutions, and also thinking much more weird world states when talking with the "general world". While in contrast, on LW and AF, I'd like to see more discussion of various "boring" solutions on which the world can roughly agree.

Can I get us all to agree to push for including pivotal acts and pivotal processes in the Overton window, then? :) I'm happy to publicly talk about pivotal processes and encourage people to take them seriously as options to evaluate, while flagging that I'm ~2-5% on them being how the future is saved, if it's saved. But I'll feel more hopeful about this saving the future if you, Critch, etc. are simultaneously publicly talking about pivotal acts and encouraging people to take them seriously as options to evaluate, while flagging that you're ~2-5% on them being how the future is saved.

[-]Joe Collman4y1-1

This still seems to somewhat miss the point (as I pointed out last time):
Conditional on org X having an aligned / corrigible AGI, we should expect:

If the AGI is an aligned sovereign, it'll do the pivotal act (PA) unilaterally if that's best, and do it in distributed fashion if that's best (according to whatever it's aligned to).
If the AGI is more like a corrigible tool, we should expect X to ask 'their' AGI what would be best to do (or equivalent), and we're pretty-much back to case 1.

The question isn't what the humans in X would do, but what the [AGI + humans] would do, given that the humans have access to that AGI.

If org X is initially pro-unilateral-PAs, then we should expect an aligned AGI to talk them out of it if it's not best.
If org X is initially anti-unilateral-PAs, then we should expect an aligned AGI to talk them into it if it is best.

X will only be favouring/disfavouring PAs for instrumental reasons - and we should expect the AGI to correct them as appropriate.

For these reasons, I'd expect the initial attitude of org X to be largely irrelevant.
Since this is predictable, I don't expect it to impact race dynamics: what will matter is whether the unilateral PA seems more/less likely to succeed than the distributed approach to the AGI.

[-]Jan_Kulveit4y10

I think you are missing the possibility that the outcomes of the pivotal process could be
-no one builds autonomous AGI
-autonomos AGI is build only in post-pivotal outcome states, where the condition of building it is alignment being solved

[-]Joe Collman4y10

Sure, that's true - but in that case the entire argument should be put in terms of:
We can (aim to) implement a pivotal process before a unilateral AGI-assisted pivotal act is possible.

And I imagine the issue there would all be around the feasibility of implementation. I think I'd give a Manhattan project to solve the technical problem much higher chances than a pivotal process. (of course people should think about it - I just won't expect them to come up with anything viable)

Once it's possible, the attitude of the creating org before interacting with their AGI is likely to be irrelevant.

So e.g. this just seems silly to me:

So, thankfully-according-to-me, no currently-successful AGI labs are oriented on carrying out pivotal acts, at least not all on their own.

They won't be on their own: they'll have an AGI to set them straight on what will/won't work.

Moderation Log

More from Andrew_Critch

Curated and popular this week

11Comments

New Comment

11 comments, sorted by

top scoring

Click to highlight new comments since: Today at 12:18 PM

[-]Eliezer Yudkowsky4y2140

(the "AI immune system") The whole internet — including space satellites and the internet-of-things — becomes way more secure, and includes a distributed network of non-nuclear electromagnetic pulse emitters that will physically shut down any tech infrastructure appearing to be running rogue AI agents.

[-]VojtaKovarik4y50

An attempted paraphrase, to hopefully-disentangle some claims:

Eliezer, list of AGI lethalities: pivotal acts are (necessarily?) "outside of the Overton window, or something"^[1].

Critch, preceding post: Strategies involving non-Overton elements are not worth it

Critch, this post: there are pivotal outcomes you can via a strategy with no non-Overton elements

Eliezer, this comment: the "AI immune system" example is not an example of a strategy with no non-Overton elements

Possible reading: Critch/the reader/Eliezer currently wouldn't be able to name a strategy towards a pivotal outcome, with no non-Overton elements

Extreme version of this: Any practical-in-our-world strategy towards a pivotal outcome necessarily contains some non-Overton elements

^{^}
Substitute your better characterization of the undesirable property here. I will just use "non-Overton" for the purposes of this comment.

[-]Rob Bensinger4y117

An example of a possible "pivotal act" I like that isn't "melt all GPUs" is:

Use AGI to build fast-running high-fidelity human whole-brain emulations. Then run thousands of very-fast-thinking copies of your best thinkers. Seems to me this plausibly makes it realistic to keep tabs on the world's AGI progress, and locally intervene before anything dangerous happens, in a more surgical way rather than via mass property destruction of any sort.

But I don't like constraining that search to 'processes only, not acts', because:

(a) I'm guessing something more local, discrete, and act-like will be necessary, even if it's less extreme than "melt all GPUs";
(b) insofar as I'm uncertain about which paths will be viable and think the problem is already extremely hard and extremely constrained, I don't want to further narrow the space of options that humanity can consider and reason through;
(c) I worry that the "processes" framing will encourage more Rube-Goldberg-machine-like proposals, where the many added steps and layers and actors obscure the core world-saving cognition and action, making it harder to spot flaws and compare tradeoffs;
and (d) I worry that the extra steps, layers, and actors will encourage "design by committee" and slow-downs that doom otherwise-promising projects.

I suspect we also have different intuitions about pivotal acts because we have different high-level pictures of the world's situation.

My alternative proposal would be: Let's do scholarship at the problem, discuss it seriously, and not let this topic be ruled by 'what is the optimal social-media soundbite?'.

it's safer to aim for a pivotal outcome to be carried out by a distributed process spanning multiple institutions and states, because the process can happen in a piecemeal fashion that doesn't change the whole world at once

We should distinguish "safer" in the sense of "less likely to cause a bad outcome" from "safer" in the sense of "less likely to be followed by a bad outcome".

[-]Jan_Kulveit4y41

[-]Rob Bensinger4y6-2

In my view, in practice, the pivotal acts framing actually pushes people to consider a more narrow space of discrete powerful actions, "sharp turns", "events that have a game-changing impact on astronomical stakes".

My objection to Critch's post wasn't 'you shouldn't talk about pivotal processes, just pivotal acts'. On the contrary, I think bringing in pivotal processes is awesome.

Humanity probably won't need to execute any pivotal acts in order to avoid existential catastrophe.
... But there's a non-tiny chance (e.g., 10%) that at least one pivotal act will in fact be necessary.
A decent number of people currently misunderstand the idea of "pivotal acts" as evil/adversarial/"villainous", in spite of the fact that there's a decent chance humanity will need someone to commit this "villainy" in order to prevent the death of every human on Earth.

I personally think that a large majority of humanity's hope lies in someone executing a pivotal act. But I assume Critch disagrees with this, and holds a view closer to 1+2+3.

(Where things in this category get straw-manned as "Rube-Goldberg-machine-like")

If you're referring to my comment, then this is itself straw-manning me!

As often, one of the actual cruxes is in continuity assumptions, where basically you have a low prior on "smooth trajectory changes by many acts" and high prior on "sharp turns left or right".

This seems wrong to me, in multiple respects:

Continuity assumptions are about what's likely to happen, not about what's desirable. It would be a separate assumption to say "continuity is always good", and I worry that a reasoning error is occurring if this is being conflated with "continuity tends to occur".

Why this matters here: My claim is that pivotal acts are likely to be necessary for good outcomes, not that they're necessarily likely to occur. If your choices are "execute a pivotal act, or die", then insofar as you're confident this is the case, the base rate of continuous events just isn't relevant.
The primary argument for hard takeoff isn't "stuff tends to be discontinuous"; it's "AGI is a powerful invention, and e.g. GPT-3 isn't a baby AGI". The discontinuity of hard takeoff is not a primitive; it's an implication of the claim that AGI is different from current AI tech, that it contains a package of qualitatively new kinds of cognition that aren't just 'what GPT-3 is currently doing, but scaled up'.

No one claims that AlphaGo needs to be continuous with theorem-proving AI systems, or that a washing machine needs to be continuous with a chariot. The core disagreement here is about whether X and Y are the same kind of thing, not about whether incremental tweaks to a given kind of thing tend to produce small improvements.

Second crux, as you note, is doom-by-default probability: if you have a very high doom probability, you may be in favour of variance-increasing acts

Given this deep prior differences, it seems reasonable to assume this discussion will lead nowhere in particular. (I've a draft with a more explicit argument why.)

(Or non-incrementally changing that. Sometimes non-continuous things do happen! 'Gaining understanding of a topic' being a classic example of a domain with many discontinuities.)

[-]Jan_Kulveit4y3-2

With the last point: I think can roughly pass your ITT - we can try that, if you are interested.

So, here is what I believe are your beliefs

With pretty high confidence, you expect sharp left turn to happen (in almost all trajectories)
This is to a large extent based on the belief that at some point "systems start to work really well in domains really far beyond the environments of their training" which is roughly the same as "discovering a core of generality" and few other formulations. These systems will be in some meaningful sense fundamentally different from eg Gato
From your perspective, this is based on thinking deeply about the nature of such system (note that this mostly based on hypothetical systems, and an analogy with evolution)
My claim roughly is this is only part of what's going on, where the actual think is: people start with a deep prior on "continuity in the space of intelligent systems". Looking into a specific question about hypothetical systems, their search in argument space is guided by this prior, and they end up mostly sampling arguments supporting their prior. (This is not to say the arguments are wrong.)
You probably don't agree with the above point, but notice the correlations:
- You expect sharp left turn due to discontinuity in "architectures" dimensions (which is the crux according to you)
- But you also expect jumps in capabilities of individual systems (at least I think so)
- Also, you expect majority of hope in a "sharp right turn" histories (in contrast to smooth right turn histories)
- And more
In my view yours (or rather MIRI-esque) views on the above dimensions are correlated more than expected, which suggest the existence of hidden variable/hidden model explaining the correlation.

I personally think that a large majority of humanity's hope lies in someone executing a pivotal act. But I assume Critch disagrees with this, and holds a view closer to 1+2+3.
If so, then I think he shouldn't go "well, pivotal acts sound weird and carry some additional moral hazards, so I will hereby push for pivotal acts to become more stigmatized and hard to talk about, in order to slightly increase our odds of winning in the worlds where pivotal acts are unnecessary".
Rather, I think hypothetical-Critch should promote the idea of pivotal processes, and try to reduce any existing stigma around the idea of pivotal acts, so that humanity is better positioned to evade destruction if we do end up needing to do a pivotal act. We should try to set ourselves up to win in more worlds.

Continuity assumptions are about what's likely to happen, not about what's desirable. It would be a separate assumption to say "continuity is always good", and I worry that a reasoning error is occurring if this is being conflated with "continuity tends to occur".

Basically, no. Continuity assumptions are about how the space looks like. Obviously forecasting questions ("what's likely to happen") often depend on ideas how the space looks like.

My claim is that pivotal acts are likely to be necessary for good outcomes, not that they're necessarily likely to occur. If your choices are "execute a pivotal act, or die", then insofar as you're confident this is the case, the base rate of continuous events just isn't relevant.

The primary argument for hard takeoff isn't "stuff tends to be discontinuous"; it's "AGI is a powerful invention, and e.g. GPT-3 isn't a baby AGI". The discontinuity of hard takeoff is not a primitive; it's an implication of the claim that AGI is different from current AI tech, that it contains a package of qualitatively new kinds of cognition that aren't just 'what GPT-3 is currently doing, but scaled up'.

[-]Rob Bensinger4y113

With pretty high confidence, you expect sharp left turn to happen (in almost all trajectories)
This is to a large extent based on the belief that at some point "systems start to work really well in domains really far beyond the environments of their training" which is roughly the same as "discovering a core of generality" and few other formulations. These systems will be in some meaningful sense fundamentally different from eg Gato

The way I would phrase it is in very plain, concrete terms:

If a machine can multiply two-digit numbers together as well as four-digit numbers together, then it can probably multiply three-digit numbers together. The structure of these problems is similar enough that it's easier to build a generalist that can handle 'multiplication' than to solve two-digit and four-digit multiplication using fundamentally different techniques.
Similarly, it's easier to teach a human or AI how to navigate physical environments in general, than to teach them how to navigate all physical environments except parking garages. Parking garages aren't different enough from other physical environments, and the techniques for modeling and navigating physical spaces work too well, when they work at all.
Similarly, it's easier to build an AI that is an excellent physicist and has the potential to be a passable or great chemist and/or biologist, than to build an excellent physicist that just can't do chemistry or biology, no matter how many chemistry experiments or chemistry textbooks it sees. The problems have too much overlap.

For things to not go this way would be quite weird.

From your perspective, this is based on thinking deeply about the nature of such system (note that this mostly based on hypothetical systems, and an analogy with evolution)

My claim roughly is this is only part of what's going on, where the actual think is: people start with a deep prior on "continuity in the space of intelligent systems". Looking into a specific question about hypothetical systems, their search in argument space is guided by this prior, and they end up mostly sampling arguments supporting their prior. (This is not to say the arguments are wrong.)

You probably don't agree with the above point, but notice the correlations:
You expect sharp left turn due to discontinuity in "architectures" dimensions (which is the crux according to you)
But you also expect jumps in capabilities of individual systems (at least I think so)
Also, you expect majority of hope in a "sharp right turn" histories (in contrast to smooth right turn histories)

I would say that there are two relevant sources of discontinuity here:

AGI is an invention, and inventions happen at particular times. This inherently involves a 0-to-1 transition when the system goes from 'not working' to 'working'. Paul and I believe equally in discontinuities like this, though we may disagree about whether AGI has already been 'invented' (such that we just need to iterate and improve on it), vs. whether the invention lies in the future.
General intelligence is powerful and widely applicable. This is another category of discontinuity Paul believes can happen (e.g., washing machines are allowed to have capabilities that non-washing-machines lack; nukes are allowed to have capabilities that non-nukes lack), though Paul may be somewhat less impressed than me with general intelligence overall (resulting in a smaller gap/discontinuity). Separately, Paul's belief in AGI development predictability, AI research efficiency, and 'AGI is already solved' (see 1, above), each serve to reduce the importance of this discontinuity.

[-]Rob Bensinger4y31

Note that I advocate for considering much more weird solutions, and also thinking much more weird world states when talking with the "general world". While in contrast, on LW and AF, I'd like to see more discussion of various "boring" solutions on which the world can roughly agree.

[-]Joe Collman4y1-1

This still seems to somewhat miss the point (as I pointed out last time):
Conditional on org X having an aligned / corrigible AGI, we should expect:

If the AGI is an aligned sovereign, it'll do the pivotal act (PA) unilaterally if that's best, and do it in distributed fashion if that's best (according to whatever it's aligned to).
If the AGI is more like a corrigible tool, we should expect X to ask 'their' AGI what would be best to do (or equivalent), and we're pretty-much back to case 1.

The question isn't what the humans in X would do, but what the [AGI + humans] would do, given that the humans have access to that AGI.

X will only be favouring/disfavouring PAs for instrumental reasons - and we should expect the AGI to correct them as appropriate.

[-]Jan_Kulveit4y10

[-]Joe Collman4y10

Sure, that's true - but in that case the entire argument should be put in terms of:
We can (aim to) implement a pivotal process before a unilateral AGI-assisted pivotal act is possible.

Once it's possible, the attitude of the creating org before interacting with their AGI is likely to be irrelevant.

So e.g. this just seems silly to me:

So, thankfully-according-to-me, no currently-successful AGI labs are oriented on carrying out pivotal acts, at least not all on their own.

They won't be on their own: they'll have an AGI to set them straight on what will/won't work.

Moderation Log