Planned summary for the Alignment Newsletter:
This post argues that the debate over takeoff speeds is over a smaller issue than you might otherwise think: people seem to be arguing for either discontinuous progress, or continuous but fast progress. Both camps agree that once AI reaches human-level intelligence, progress will be extremely rapid; the disagreement is primarily about whether there is already quite a lot of progress _before_ that point. As a result, these differences don't constitute a "shift in arguments on AI safety", as some have claimed.
The post also goes through some of the arguments and claims that people have made in the past, which I'm not going to summarize here.
Planned opinion:
While I agree that the debate about takeoff speeds is primarily about the path by which we get to powerful AI systems, that seems like a pretty important question to me with <@many ramifications@>(@Clarifying some key hypotheses in AI alignment@).
Suppose that different tasks take different levels of AI to do better than humans.
Firstly AI can do arithmatic, then play chess, then drive cars ect. Lets also assume that AI is much faster than humans. So imagine that AI research ability was rising from almost nothing to superhuman over the course of a year. A few months in and its inventing stuff like linear regression, impressive, but not as good as current human work on AI. There are a few months where the AI is worse than a serious team of top researchers, but better than an intern. So if you have a nieche use for AI, that can be automatically automated. The AI research AI designs a widget building AI. The humans could have made a widget building AI themselves, but so few widgets are produced, that it wasn't worth it.
Then the AI becomes as good as a top human research team and FOOM. How crazy the world gets before foom depends on how much other stuff is automated first. Is it easier to make an AI teacher, or an AI AI researcher? Also remember that bearocratic delays are a thing, there is a difference between having an AI that does medical diagnosis in a lab, and it being used in every hospital.
That's a good point; if a research group develops a more promising approach to AI, recursive self-improvement / capability enhancement might be one of the first things they do, before going for immediate money-making applications, because the programmers know that application area already, and they can just do it internally without going through the rigmarole of marketing, product design, etc. etc.
The objective time taken for progress in AI is more significant than whether that progress is continuous or discontinuous
I don't think I agree? Presence of discontinuities determines which research agendas will or won't work, and so is extremely decision-relevant; in contrast objective time taken has very little influence on research agendas.
Objective time might be more important for e.g. AI strategy and policy; I haven't thought about it that much, though my immediate guess is that even there the presence/absence of discontinuities will be a key crux in what sorts of institutions we should aim for.
I am not convinced that the distinction between continuous and discontinuous approaches is a feature of the territory. Zoom in in sufficient detail, and you see a continuous wavefunction of electron interacting with the continuous wavefunction of a silicon atom. Zoom out to evolutionary timescales, and the jump from hominids with pointy sticks to ASI is almost instant. The mathematical definition of continuity relies on your function being mathematically formal. Is distance from earth to the moon an even number of plank-lengths? Well there are a huge number of slightly different measurements you could make, depending on when you measure and exactly what points you measure between, and how you deal with relitivistic length contraction, the answer will be different. In a microsecond in which the code that is a fooming AI is doing garbage collection, is AI progress happening? You have identified an empirical variable called AI progress, but whether or not it is continuous depends on exactly how you fill in the details.
Imagine superintelligence has happened and we are discussing the details afterwords. We were in a world basically like this one, and then someone ran some code, which gained total cosmic power within a millisecond. Someone tries to argue that this was a continuous improvement, just a very fast one. What evidence would convince you one way or the other on this?
This post grew out of conversations with several people, including Daniel Kokotajlo, grue_slinky and Linda Lisefors, and is based in large part on a collection of scattered comments and blog-posts across lesswrong, along with some podcast interviews - e.g. here. The in-text links near quotes will take you to my sources.
I am attempting to distinguish two possibilities which are often run together - that progress in AI towards AGI (‘takeoff’) will be discontinuous and that it will be fast, but continuous. Resolving this distinction also addresses the claim that there has been a significant shift in arguments for AI presenting an existential risk: from older arguments discussing an ultra-fast intelligence explosion occurring in a single ‘seed AI’ to more moderate scenarios.
I argue that the ‘shift in arguments on AI safety’ is not a total change in basic assumptions (which some observers have claimed) but just a reduction in confidence about a specifically discontinuous takeoff. Finally, I try to explicitly operationalize the practical differences between discontinuous takeoff and fast, continuous takeoff.
Further Reading
Summary: Why AI risk might be solved without additional intervention from Longtermists
Paul Christiano’s original post
MIRIs Thoughts on Discontinuous takeoff
Misconceptions about continuous takeoff
AI Impacts original post
Soft Takeoff can still lead to Decisive Strategic Advantage
Defining Discontinuous Progress
What do I mean by ‘discontinuous’? If we were to graph world GDP over the last 10,000 years, it fits onto a hyperbolic growth pattern. We could call this ‘continuous’ since it is following a single trend, or we could call it ‘discontinuous’ because, on the scale of millennia, the industrial revolution exploded out of nowhere. I will call these sorts of hyperbolic trends ‘continuous, but fast’, in line with Paul Christiano, who argued for continuous takeoff, defining it this way:
I’ll be using Paul’s understanding of ‘discontinuous’ and ‘fast’ here. For progress in AI to be discontinuous, we need a switch to a new growth mode, which will show up as a step function in the capability of AI or in the rate of change of the capability of the AI over time. For takeoff to be fast, it is enough that there is one single growth mode that is hyperbolic or some other function that is very fast-growing.
The view that progress in AI will be discontinuous, not merely very fast by normal human standards, was popular and is still held by many. Here is a canonical explanation of the view, from Eliezer Yudkowsky in 2008. Compare this to the more recent ‘what failure looks like’ to understand the intuitive force of the claim that views on AI risk have totally changed since 2008.
Also see these quotes, one summarizing the view by Paul Christiano and one recent remark from Rob Besinger, summarizing the two key reasons given above for expecting discontinuous takeoff, recursive self-improvement and a discontinuity in capability.
I will argue that the more recent reduction of confidence in discontinuous takeoff is correct, but at the same time many of the ‘original’ arguments (e.g. those given in 2008 by Yudkowsky) for fast, discontinuous takeoff are not mistaken and can be seen as also supporting fast, continuous takeoff.
We should seriously investigate the continuous/discontinuous distinction specifically by narrowing our focus onto arguments that actually distinguish between the two: conceptual investigation about the nature of future AGI, and the practical consequences for alignment work of continuous/discontinuous takeoff.
I have tried to present the arguments in order from least to most controversial, starting with the outside view on technological progress.
There have been other posts discussing arguments for discontinuous progress with approximately this framing. I am not going to repeat their good work here by running over every argument and counterargument (See Further Reading). What I’m trying to do here is get at the underlying assumptions of either side.
The Outside View
There has recently been a switch between talking about AI progress being fast vs slow to talking about it as continuous vs discontinuous. Paul Christiano explains what continuous progress means:
It is not an essential part of the definition that the gap be years; even if the gap is rather short, we still call it continuous takeoff if AI progress increases without sudden jumps; capability increases following series of logistic curves merging together as different component technologies are invented.
This way of understanding progress as being ‘discontinuous’ isn’t uncontroversial, but it was developed because calling the takeoff ‘slow’ instead of ‘fast’ could be seen as a misnomer. Continuous takeoff is a statement about what happens before we reach the point where a fast takeoff is supposed to happen, and is perfectly consistent with the claim that given the stated preconditions for fast takeoff, fast takeoff will happen. It’s a statement that serious problems, possibly serious enough to pose an existential threat, will show up before the window where we expect fast takeoff scenarios to occur.
Outside view: Technology
The starting point for the argument that progress in AI should be continuous is just the observation that this is usually how things work with a technology, especially in a situation where progress is being driven by many actors working at a problem from different angles. If you can do something well in 1 year, it is usually possible to do it slightly less well in 0.9 years. Why is it ‘usually possible’? Because nearly all technologies involve numerous smaller innovations, and it is usually possible to get somewhat good results without some of them. That is why even if each individual component innovation follows a logistic success curve that is unstable between ‘doesnt work at all’ and ‘works’ if you were to plot results/usefulness against effort, progress looks continuous. Note this is what is explicitly rejected by Yudkowsky-2008.
When this doesn’t happen and we get discontinuous progress, it is because of one of two reasons - either there is some fundamental reason why the technology cannot work at all without all the pieces lining up in place, or there is just not much effort being put in, so a few actors can leap ahead of the rest of the world and make several of the component breakthroughs in rapid succession.
I’ll go through three illustrative examples of each situation - the normal case, the low-effort case and the fundamental reasons case.
Guns
Guns followed the continuous progress model. They started out worse than competing ranged weapons like crossbows. By the 15th century, the Arquebus arrived, which had some advantages and disadvantages compared to the crossbow (easier to use, more damaging, but slower to fire and much less accurate). Then came the musket, and later rifles. There were many individual inventions that went from ‘not existing at all’ to ‘existing’ in relatively short intervals, but the overall progress looked roughly continuous. However, the speed of progress still increased dramatically during the industrial revolution and continued to increase, without ever being ‘discontinuous’.
Aircraft
Looking specifically at heavier-than-air flight, it seems clear enough that we went from ‘not being able to do this at all’ to being able to do it in a relatively short time - discontinuous progress. The Wright brothers research drew on a few other pioneers like Otto Lilienthal, but they still made a rapid series of breakthroughs in a fairly short period: using a wind tunnel to rapidly prototype wing shapes, building a sufficiently lightweight engine, working out a method of control. This was possible because at the time, unlike guns, a very small fraction of human research effort was going into developing flight. It was also possible for another reason - the nature of the problem implied that success was always going to be discontinuous. While there are a few intermediate steps, like gliders, there aren’t many between ‘not being able to fly’ and ‘being able to fly’, so progress was unexpected and discontinuous. I think we can attribute most of the flight case to the low degree of effort on a global scale.
Nuclear Weapons
Nuclear Weapons are a purer case of fundamental physical facts causing a discontinuity. A fission chain reaction simply will not occur without many breakthroughs all being brought together, so even if a significant fraction of all the world’s design effort is devoted to research we still get discontinuous progress. If the Manhattan project had uranium centrifuges but didn’t have the Monte Carlo algorithms needed to properly simulate the dynamics of the sphere’s implosion, they would just have had some mostly-useless metal. It’s worth noting here that a lot of early writing about an intelligence explosion explicitly compares it to a nuclear chain reaction.
AI Impacts has done a far more thorough and in-depth investigation of progress along various metrics, confirming the intuition that discontinuous progress usually occurs where fundamental physical facts imply it - e.g. as we switch between different methods of travel or communication.
We should expect, prima facie, a random example of technological progress to be continuous, and we need specific, good reasons to think that progress will be discontinuous. On the other hand, there are more than a few examples of discontinuous progress caused either by the nature of the problem or differential effort, so there is not a colossal burden of proof on discontinuous progress. I think that both the MIRI people (who argue for discontinuous progress) and Christiano (continuous) pretty much agree about this initial point.
A shift in basic assumptions?
The argument has recently been made (by Will MacAskill on the 80,000 hours podcast) that there has been a switch from the ‘old’ arguments focussing on a seed AI leading to an intelligence explosion to new arguments:
And that these arguments don’t have very much in common with the older Bostrom/Yudkowsky scenario (of a single AGI undergoing an intelligence explosion) - except the conclusion that AI presents a uniquely dangerous existential risk. If true, this would be a cause for concern as it would suggest we haven’t become much less confused about basic questions over the last decade. MacAskill again:
If you look back to the section on Defining Discontinuous Progress, this will seem plausible - the very rapid annihalation caused by a single misaligned AGI going FOOM and the accumulation of ‘you get what you measure’ errors in complex systems that compound seem like totally different concerns.
Despite this, I will show later how the old arguments for discontinuous progress and the new arguments for continuous progress share a lot in common.
I claim that the Bostrom/Yudkowsky argument for an intelligence explosion establishes a sufficient condition for very rapid growth, and the current disagreement is about what happens between now and that point. This should raise our confidence that some basic issues related to AI timelines are resolved. However, the fact that this claim, if true, has not been recognized and that discussion of these issues is still as fragmented as it is should be a cause for concern more generally.
I will now turn to inside-view arguments for discontinuous progress, beginning with the Intelligence explosion, to justify what I have just claimed.
Failed Arguments for Discontinuity
If you think AI progress will be discontinuous, it is generally because you think AI is a special case like Nuclear Weapons where several breakthroughs need to combine to produce a sudden gain in capability, or that one big breakthrough produces nearly all the increase on its own. If you think we will create AGI at all, it is generally because you think it will be hugely economically valuable, so the low-effort case like flight does not apply - if there are ways to produce and deploy slightly worse transformative AIs sooner, they will probably be found.
The arguments for AI being a special case either invoke specific evidence from how current progress in Machine Learning looks or from human evolutionary history, or are conceptual. I believe (along with Paul Christiano) the evidential arguments aren’t that useful, and my conclusion will be that we’re left trying to assess the conceptual arguments about the nature of intelligence, which are hard to judge. I’ll offer some ways to attempt that, but not any definitive answer.
But first, what relevance does the old Intelligence Explosion Hypothesis have to this question - is it an argument for discontinuous progress? No, not on its own.
Recursive self-improvement
People don’t talk about the recursive self-improvement as much as they used to, because since the Machine Learning revolution a full recursive self-improvement process has seemed less necessary to create an AGI that is dramatically superior to humans (by analogy with how gradient descent is able to produce capability gains in current AI). Instead, the focus is more on the general idea of ‘rapid capability gain’. From Chollet vs Yudkowsky:
However, the argument that given a certain level of AI capabilities, the rate of capability gain will be very high doesn’t by itself argue for discontinuity. It does mean that the rate of progress in AI has to increase between now and then, but doesn’t say how it will increase.
There needs to be an initial asymmetry in the situation that means an AI beyond a certain level of sophistication can experience rapid capability gain and one below it experiences the current (fairly unimpressive) capability gains with increased optimization power and resources.
The original ‘intelligence explosion’ argument puts an upper bound or sufficient condition on when we enter a regime of very rapid growth - if we have something ‘above human level’ in all relevant capabilities, it will be capable of improving its capabilities. And we know that at the current level, capability gains with increased optimization are (usually) not too impressive.
The general question has to be asked; will we see a sudden increase in the rate of capability gain somewhere between now and the human level’ where the rate must be very high.
The additional claim needed to establish a discontinuity is that recursive self-improvement suddenly goes from ‘not being possible at all’ (our current situation) to possible, so there is a discontinuity as we enter a new growth mode and the graph abruptly ‘folds back in on itself’.
In Christiano’s graph, the gradient at the right end where highly capable AI is already around is pretty much the same in both scenarios, reflecting the basic recursive self-improvement argument.
Here I have taken a diagram from Superintelligence and added a red curve to represent a fast but continuous takeoff scenario.
In Bostrom’s scenario, there are two key moments that represent discontinuities in rate, though not necessarily in absolute capabilities - the first is that around the ‘human baseline’ - when an AI can complete all cognitive tasks a human can, we enter a new growth mode much faster than the one before because the AI can recursively self-improve. The gradient is fairly constant up until that point. The next discontinuity is at the ‘crossover’ where AI is performing the majority of capability improvement itself.
As in Christiano’s diagram, rates of progress in the red, continuous scenario are very similar to the black scenario after we have superhuman AI, but the difference is that there is a steady acceleration of progress before ‘human level’. This is because before we have AI that is able to accelerate progress to a huge degree, we have AI that is able to accelerate progress to a lesser extent, and so on until we have current AI, which is only slightly able to accelerate growth. Recursive self-improvement, like most other capabilities, does not suddenly go from ‘not possible at all’ to ‘possible’.
The other thing to note is that, since we get substantial acceleration of progress before the ‘human baseline’, the overall timeline is shorter in the red scenario, holding other assumptions about the objective difficulty of AI research constant.
The reason this continuous scenario might not occur is if there is an initial discontinuity which means that we cannot get a particular kind of recursive self-improvement with AIs that are slightly below some level of capability, but can get it with AIs that are slightly above that level.
If AI is in a nuclear-chain reaction like situation where we need to reach a criticality threshold for the rate to suddenly experience a discontinuous jump. We return to the original claim, which we now see needs an independent justification:
Connecting old and new arguments
With less confidence than the previous section, I think this is a point that most people agree on - that there needs to be an initial discontinuity in the rate of return on cognitive investment, for there to be two qualitatively different growth regimes, for there to be Discontinuous progress.
This also answers MacAskill’s objection that too much has changed in basic assumptions. The old recursive self-improvement argument, by giving a significant condition for fast growth that seems feasible (Human baseline AI), leads naturally to an investigation of what will happen in the course of reaching that fast growth regime. Christiano and other current notions of continuous takeoff are perfectly consistent with the counterfactual claim that, if an already superhuman ‘seed AI’ were dropped into a world empty of other AI, it would undergo recursive self-improvement.
This in itself, in conjunction with other basic philosophical claims like the orthogonality thesis, is sufficient to promote AI alignment to attention. Then, following on from that, we developed different models of how progress will look between now and AGI.
So it is not quite right to say, ‘many of the most prominent people are no longer pushing the Bostrom arguments’. From Paul Christiano’s 80,000 hours interview:
The above diagram represents our uncertainty - we know the rate of progress now is relatively slow and that when AI capability is at the human baseline, it must be very fast. What remains to be seen is what happens in between.
The remainder of this post is about the areas where there is not much agreement in AI safety - others reasons that there may be a threshold for generality, recursive self-improvement or a signifance to the 'human level'.
Evidence from Evolution
We do have an example of optimization pressure being applied to produce gains in intelligence: human evolution. The historical record certainly looks discontinuous, in that relatively small changes in human brains over relatively short timescales produced dramatic visible changes in our capabilities. However, this is misleading. The very first thing we should understand is that evolution is a continuous process - it cannot redesign brains from scratch.
Evolution was optimizing for fitness, and driving increases in intelligence only indirectly and intermittently by optimizing for winning at social competition. What happened in human evolution is that it briefly switched to optimizing for increased intelligence, and as soon as that happened our intelligence grew very rapidly but continuously.
I have a theory about why this didn't get discussed earlier - it unfortunately sounds similar to the famous bad argument against AGI being an existential risk: the 'intelligence isn't a superpower' argument. From Chollet vs Yudkowsky:
I worry that in arguing against the claim that general intelligence isn't a meaningful concept or can't be used to compare different animals, some people have been implicitly assuming that evolution has been putting a decent amount of effort into optimizing for general intelligence all along. Alternatively, that arguing for one sounds like another, or that a lot of people have been arguing for both together and haven't distinguished between them.
Claiming that you can meaningfully compare evolved minds on the generality of their intelligence needs to be distinguished from claiming that evolution has been optimizing for general intelligence reasonably hard for a long time, and that consistent pressure ‘pushing up the scale’ hits a point near humans where capabilities suddenly explode despite a constant optimization effort. There is no evidence that evolution was putting roughly constant effort into increasing human intelligence. We could analogize the development of human intelligence to the aircraft case in the last section - where there was relatively little effort put into its development until a sudden burst of activity led to massive gains.
So what can we infer from evolutionary history? We know that human minds can be produced by an incremental and relatively simple optimization process operating over time. Moreover, the difference between ‘produced by an incremental process’ and ‘developed continuously’ is small. If intelligence required the development of a good deal of complex, expedient or detrimental capabilities which were only useful when finally brought together, evolution would not have produced it.
Christiano argues that this is a reason to think progress in AI will be continuous, but this seems to me to be a weak reason. Clearly, there exists a continuous path to general intelligence, but that does not mean it is the easiest path and Christiano’s other argument suggests that the way that we approach AGI will not look much like the route evolution took.
Evolution also suggests that, in some absolute sense, the amount of effort required to produce increases in intelligence is not that large. Especially if you started out with the model that intelligence was being optimized all along, you should update to believing AGI is much easier to create than previously expected.
The Conceptual Arguments
We are left with the difficult to judge conceptual question of whether we should expect a discontinuous jump in capabilities when a set list of AI developments are brought together. Christiano in his original essay just states that there doesn’t seem to be any independent reasons to expect this to be the case. Coming up with any reasons for or against discontinuous progress essentially require us to predict how AGI will work before building it. Rob Besinger told me something similar.
Rob Besinger also said in one post that MIRI’s reasons for predicting discontinuous takeoff boil down to different ideas about what AGI will be like - suggesting that this constitutes the fundamental disagreement.
I suggest we actually try to enumerate the new developments we will need to produce AGI, ones which could arrive discretely in the form of paradigm shifts. We might try to imagine or predict which skills must be combined to reach the ability to do original AI research. Stuart Russell provided a list of these capacities in Human Compatible.
Stuart Russell’s List
For reference, I’ve included two capabilities we already have that I imagine being on a similar list in 1960
AI Impacts List
Note that discontinuities may consist either in sudden increases in capability (e.g. if a sudden breakthrough lets us build AI with full human like cumulative learning), or sudden increases in the rate of improvement (e.g. something that takes us over the purported ‘recursive self-improvement threshold’) or sudden increases in the ability to make use of hardware or knowledge overhang (Suddenly producing an AI with human-like language comprehension, which would be able to read all existing books). Perhaps the disagreement looks like this:
An AI with (e.g.) good perception and object recognition, language comprehension, cumulative learning capability and ability to discover new action sets but a merely adequate or bad ability to manage its mental activity would be (Paul thinks) reasonably capable compared to an AI that is good at all of these things, but (MIRI thinks) it would be much less capable. MIRI has conceptual arguments (to do with the nature of general intelligence) and empirical arguments (comparing human/chimp brains and pragmatic capabilities) in favour of this hypothesis, and Paul thinks the conceptual arguments are too murky and unclear to be persuasive and that the empirical arguments don't show what MIRI thinks they show.
Adjudicating this disagreement is a matter for another post - for now, I will simply note that it does seem like an AI significantly lacking in one of the capabilities on Stuart Russell’s list but proficient in one of the others seems intuitively like it would be much more capable than current AI, but still less capable than very advanced AI. How seriously to take this intuition, I don’t know.
Summary
Relevance to AI Safety
If we have a high degree of confidence in discontinuous progress, we more-or-less know that we’ll get a sudden takeoff where superintelligent AI appears out of nowhere and forms a singleton. On the other hand, if we expect continuous progress then the rate of capability gain is much more difficult to judge. That is already a strong reason to care about whether progress will be continuous or not.
Decisive Strategic Advantage (DSA) leading to Singletons is still possible with continuous progress - we just need a specific reason for a gap to emerge (like a big research program). An example scenario for a decisive strategic advantage leading to a singleton with continuous, relatively fast progress:
Let’s factor out the question of fast or slow takeoff, and try to compare two AI timelines that are similarly fast in objective time, but one contains a discontinuous leap in capability and the other doesn’t. What are the relevant differences with respect to AI Safety? In the discontinuous scenario, we do not require the classic ‘seed AI’ that recursively self-improves - that scenario is too specific and more useful as a thought experiment. Instead, in the discontinuous scenario it is merely a fact of the matter that at a certain point returns on optimization explode and capability gain becomes very rapid where before it was very slow. In the other case, progress is continuous but fast, though presumably not quite as fast.
Any approach to alignment that relies on a less advanced agent supervising a more advanced agent will probably not work in the discontinuous case, since the difference between agents on one side and another side of the discontinuity would be too great. An iterated approach that relies on groups of less intelligent agents supervising a more intelligent agent could work even in a very fast but continuous takeoff, because even if the process took a small amount of objective time, continuous increments of increased capability would still be possible, and less intelligent agents could supervise more intelligent agents.
Discontinuous takeoff suggests ambitious value learning approaches, while continuous takeoff suggests iterated approaches like IDA.
See my earlier post for a discussion of value learning approaches.
In cases where the takeoff is both slow and continuous, we might expect AI to be an outgrowth of modern ML, particularly deep reinforcement learning, in which case the best approach might be a very narrow approach to AI alignment.
The objective time taken for progress in AI is more significant than whether that progress is continuous or discontinuous, but the presence of discontinuities is significant for two key reasons. First, discontinuities imply much faster objective time. Second, big discontinuities will probably thwart iterative approaches to value learning, requiring one-shot, ambitious approaches.