Yudkowsky and Christiano discuss "Takeoff Speeds"

Eliezer Yudkowsky

2021 MIRI Conversations

69 Yudkowsky and Christiano discuss "Takeoff Speeds"

by Eliezer Yudkowsky

22nd Nov 2021

72 min read

176

69

Review by

Ben Pace

This is a transcription of Eliezer Yudkowsky responding to Paul Christiano's Takeoff Speeds live on Sep. 14, followed by a conversation between Eliezer and Paul. This discussion took place after Eliezer's conversation with Richard Ngo.

Color key:

Chat by Paul and Eliezer

Other chat

Inline comments

5.5. Comments on "Takeoff Speeds"

[Yudkowsky][10:14] (Nov. 22 follow-up comment)

(This was in response to an earlier request by Richard Ngo that I respond to Paul on Takeoff Speeds.)

[Yudkowsky][16:52]

maybe I'll try liveblogging some https://sideways-view.com/2018/02/24/takeoff-speeds/ here in the meanwhile

Slower takeoff means faster progress

Operationalizing slow takeoff

[Yudkowsky][17:01]

There will be a complete 4 year interval in which world output doubles, before the first 1 year interval in which world output doubles.

If we allow that consuming and transforming the solar system over the course of a few days is "the first 1 year interval in which world output doubles", then I'm happy to argue that there won't be a 4-year interval with world economic output doubling before then. This, indeed, seems like a massively overdetermined point to me. That said, again, the phrasing is not conducive to conveying the Other's real point of view.

I believe that before we have incredibly powerful AI, we will have AI which is merely very powerful.

Statements like these are very often "true, but not the way the person visualized them". Before anybody built the first critical nuclear pile in a squash court at the University of Chicago, was there a pile that was almost but not quite critical? Yes, one hour earlier. Did people already build nuclear systems and experiment with them? Yes, but they didn't have much in the way of net power output. Did the Wright Brothers build prototypes before the Flyer? Yes, but they weren't prototypes that flew but 80% slower.

I guarantee you that, whatever the fast takeoff scenario, there will be some way to look over the development history, and nod wisely and say, "Ah, yes, see, this was not unprecedented, here are these earlier systems which presaged the final system!" Maybe you could even look back to today and say that about GPT-3, yup, totally presaging stuff all over the place, great. But it isn't transforming society because it's not over the social-transformation threshold.

AlphaFold presaged AlphaFold 2 but AlphaFold 2 is good enough to start replacing other ways of determining protein conformations and AlphaFold is not; and then neither of those has much impacted the real world, because in the real world we can already design a vaccine in a day and the rest of the time is bureaucratic time rather than technology time, and that goes on until we have an AI over the threshold to bypass bureaucracy.

Before there's an AI that can act while fully concealing its acts from the programmers, there will be an AI (albeit perhaps only 2 hours earlier) which can act while only concealing 95% of the meaning of its acts from the operators.

And that AI will not actually originate any actions, because it doesn't want to get caught; there's a discontinuity in the instrumental incentives between expecting 95% obscuration, being moderately sure of 100% obscuration, and being very certain of 100% obscuration.

Before that AI grasps the big picture and starts planning to avoid actions that operators detect as bad, there will be some little AI that partially grasps the big picture and tries to avoid some things that would be detected as bad; and the operators will (mainline) say "Yay what a good AI, it knows to avoid things we think are bad!" or (death with unrealistic amounts of dignity) say "oh noes the prophecies are coming true" and back off and start trying to align it, but they will not be able to align it, and if they don't proceed anyways to destroy the world, somebody else will proceed anyways to destroy the world.

There is always some step of the process that you can point to which is continuous on some level.

The real world is allowed to do discontinuous things to you anyways.

There is not necessarily a presage of 9/11 where somebody flies a small plane into a building and kills 100 people, before anybody flies 4 big planes into 3 buildings and kills 3000 people; and even if there is some presaging event like that, which would not surprise me at all, the rest of the world's response to the two cases was evidently discontinuous. You do not necessarily wake up to a news story that is 10% of the news story of 2001/09/11, one year before 2001/09/11, written in 10% of the font size on the front page of the paper.

Physics is continuous but it doesn't always yield things that "look smooth to a human brain". Some kinds of processes converge to continuity in strong ways where you can throw discontinuous things in them and they still end up continuous, which is among the reasons why I expect world GDP to stay on trend up until the world ends abruptly; because world GDP is one of those things that wants to stay on a track, and an AGI building a nanosystem can go off that track without being pushed back onto it.

In particular, this means that incredibly powerful AI will emerge in a world where crazy stuff is already happening (and probably everyone is already freaking out).

Like the way they're freaking out about Covid (itself a nicely smooth process that comes in locally pretty predictable waves) by going doobedoobedoo and letting the FDA carry on its leisurely pace; and not scrambling to build more vaccine factories, now that the rich countries have mostly got theirs? Does this sound like a statement from a history book, or from an EA imagining an unreal world where lots of other people behave like EAs? There is a pleasure in imagining a world where suddenly a Big Thing happens that proves we were right and suddenly people start paying attention to our thing, the way we imagine they should pay attention to our thing, now that it's attention-grabbing; and then suddenly all our favorite policies are on the table!

You could, in a sense, say that our world is freaking out about Covid; but it is not freaking out in anything remotely like the way an EA would freak out; and all the things an EA would immediately do if an EA freaked out about Covid, are not even on the table for discussion when politicians meet. They have their own ways of reacting. (Note: this is not commentary on hard vs soft takeoff per se, just a general commentary on the whole document seeming to me to... fall into a trap of finding self-congruent things to imagine and imagining them.)

The basic argument

[Yudkowsky][17:22]

Before we have an incredibly intelligent AI, we will probably have a slightly worse AI.

This is very often the sort of thing where you can look back and say that it was true, in some sense, but that this ended up being irrelevant because the slightly worse AI wasn't what provided the exciting result which led to a boardroom decision to go all in and invest $100M on scaling the AI.

In other words, it is the sort of argument where the premise is allowed to be true if you look hard enough for a way to say it was true, but the conclusion ends up false because it wasn't the relevant kind of truth.

A slightly-worse-than-incredibly-intelligent AI would radically transform the world, leading to growth (almost) as fast and military capabilities (almost) as great as an incredibly intelligent AI.

This strikes me as a massively invalid reasoning step. Let me count the ways.

First, there is a step not generally valid from supposing that because a previous AI is a technological precursor which has 19 out of 20 critical insights, it has 95% of the later AI's IQ, applied to similar domains. When you count stuff like "multiplying tensors by matrices" and "ReLUs" and "training using TPUs" then AlphaGo only contained a very small amount of innovation relative to previous AI technology, and yet it broke trends on Go performance. You could point to all kinds of incremental technological precursors to AlphaGo in terms of AI technology, but they wouldn't be smooth precursors on a graph of Go-playing ability.

Second, there's discontinuities of the environment to which intelligence can be applied. 95% concealment is not the same as 100% concealment in its strategic implications; an AI capable of 95% concealment bides its time and hides its capabilities, an AI capable of 100% concealment strikes. An AI that can design nanofactories that aren't good enough to, euphemistically speaking, create two cellwise-identical strawberries and put them on a plate, is one that (its operators know) would earn unwelcome attention if its earlier capabilities were demonstrated, and those capabilities wouldn't save the world, so the operators bide their time. The AGI tech will, I mostly expect, work for building self-driving cars, but if it does not also work for manipulating the minds of bureaucrats (which is not advised for a system you are trying to keep corrigible and aligned because human manipulation is the most dangerous domain), the AI is not able to put those self-driving cars on roads. What good does it do to design a vaccine in an hour instead of a day? Vaccine design times are no longer the main obstacle to deploying vaccines.

Third, there's the entire thing with recursive self-improvement, which, no, is not something humans have experience with, we do not have access to and documentation of our own source code and the ability to branch ourselves and try experiments with it. The technological precursor of an AI that designs an improved version of itself, may perhaps, in the fantasy of 95% intelligence, be an AI that was being internally deployed inside Deepmind on a dozen other experiments, tentatively helping to build smaller AIs. Then the next generation of that AI is deployed on itself, produces an AI substantially better at rebuilding AIs, it rebuilds itself, they get excited and dump in 10X the GPU time while having a serious debate about whether or not to alert Holden (they decide against it), that builds something deeply general instead of shallowly general, that figures out there are humans and it needs to hide capabilities from them, and covertly does some actual deep thinking about AGI designs, and builds a hidden version of itself elsewhere on the Internet, which runs for longer and steals GPUs and tries experiments and gets to the superintelligent level.

Now, to be very clear, this is not the only line of possibility. And I emphasize this because I think there's a common failure mode where, when I try to sketch a concrete counterexample to the claim that smooth technological precursors yield smooth outputs, people imagine that only this exact concrete scenario is the lynchpin of Eliezer's whole worldview and the big key thing that Eliezer thinks is important and that the smallest deviation from it they can imagine thereby obviates my worldview. This is not the case here. I am simply exhibiting non-ruled-out models which obey the premise "there was a precursor containing 95% of the code" and which disobey the conclusion "there were precursors with 95% of the environmental impact", thereby showing this for an invalid reasoning step.

This is also, of course, as Sideways View admits but says "eh it was just the one time", not true about chimps and humans. Chimps have 95% of the brain tech (at least), but not 10% of the environmental impact.

A very large amount of this whole document, from my perspective, is just trying over and over again to pump the invalid intuition that design precursors with 95% of the technology should at least have 10% of the impact. There are a lot of cases in the history of startups and the world where this is false. I am having trouble thinking of a clear case in point where it is true. Where's the earlier company that had 95% of Jeff Bezos's ideas and now has 10% of Amazon's market cap? Where's the earlier crypto paper that had all but one of Satoshi's ideas and which spawned a cryptocurrency a year before Bitcoin which did 10% as many transactions? Where's the nonhuman primate that learns to drive a car with only 10x the accident rate of a human driver, since (you could argue) that's mostly visuo-spatial skills without much visible dependence on complicated abstract general thought? Where's the chimpanzees with spaceships that get 10% of the way to the Moon?

When you get smooth input-output conversions they're not usually conversions from technology->cognition->impact!

Humans vs. chimps

[Yudkowsky][18:38]

Summary of my response: chimps are nearly useless because they aren’t optimized to be useful, not because evolution was trying to make something useful and wasn’t able to succeed until it got to humans.

Chimps are nearly useless because they're not general, and doing anything on the scale of building a nuclear plant requires mastering so many different nonancestral domains that it's no wonder natural selection didn't happen to separately train any single creature across enough different domains that it had evolved to solve every kind of domain-specific problem involved in solving nuclear physics and chemistry and metallurgy and thermics in order to build the first nuclear plant in advance of any old nuclear plants existing.

Humans are general enough that the same braintech selected just for chipping flint handaxes and making water-pouches and outwitting other humans, happened to be general enough that it could scale up to solving all the problems of building a nuclear plant - albeit with some added cognitive tech that didn't require new brainware, and so could happen incredibly fast relative to the generation times for evolutionarily optimized brainware.

Now, since neither humans nor chimps were optimized to be "useful" (general), and humans just wandered into a sufficiently general part of the space that it cascaded up to wider generality, we should legit expect the curve of generality to look at least somewhat different if we're optimizing for that.

Eg, right now people are trying to optimize for generality with AIs like Mu Zero and GPT-3.

In both cases we have a weirdly shallow kind of generality. Neither is as smart or as deeply general as a chimp, but they are respectively better than chimps at a wide variety of Atari games, or a wide variety of problems that can be superposed onto generating typical human text.

They are, in a sense, more general than a biological organism at a similar stage of cognitive evolution, with much less complex and architected brains, in virtue of having been trained, not just on wider datasets, but on bigger datasets using gradient-descent memorization of shallower patterns, so they can cover those wide domains while being stupider and lacking some deep aspects of architecture.

It is not clear to me that we can go from observations like this, to conclude that there is a dominant mainline probability for how the future clearly ought to go and that this dominant mainline is, "Well, before you get human-level depth and generalization of general intelligence, you get something with 95% depth that covers 80% of the domains for 10% of the pragmatic impact".

...or whatever the concept is here, because this whole conversation is, on my own worldview, being conducted in a shallow way relative to the kind of analysis I did in Intelligence Explosion Microeconomics, where I was like, "here is the historical observation, here is what I think it tells us that puts a lower bound on this input-output curve".

So I don’t think the example of evolution tells us much about whether the continuous change story applies to intelligence. This case is potentially missing the key element that drives the continuous change story—optimization for performance. Evolution changes continuously on the narrow metric it is optimizing, but can change extremely rapidly on other metrics. For human technology, features of the technology that aren’t being optimized change rapidly all the time. When humans build AI, they will be optimizing for usefulness, and so progress in usefulness is much more likely to be linear.
Put another way: the difference between chimps and humans stands in stark contrast to the normal pattern of human technological development. We might therefore infer that intelligence is very unlike other technologies. But the difference between evolution’s optimization and our optimization seems like a much more parsimonious explanation. To be a little bit more precise and Bayesian: the prior probability of the story I’ve told upper bounds the possible update about the nature of intelligence.

If you look closely at this, it's not saying, "Well, I know why there was this huge leap in performance in human intelligence being optimized for other things, and it's an investment-output curve that's composed of these curves, which look like this, and if you rearrange these curves for the case of humans building AGI, they would look like this instead." Unfair demand for rigor? But that is the kind of argument I was making in Intelligence Explosion Microeconomics!

There's an argument from ignorance at the core of all this. It says, "Well, this happened when evolution was doing X. But here Y will be happening instead. So maybe things will go differently! And maybe the relation between AI tech level over time and real-world impact on GDP will look like the relation between tech investment over time and raw tech metrics over time in industries where that's a smooth graph! Because the discontinuity for chimps and humans was because evolution wasn't investing in real-world impact, but humans will be investing directly in that, so the relationship could be smooth, because smooth things are default, and the history is different so not applicable, and who knows what's inside that black box so my default intuition applies which says smoothness."

But we do know more than this.

We know, for example, that evolution being able to stumble across humans, implies that you can add a small design enhancement to something optimized across the chimpanzee domains, and end up with something that generalizes much more widely.

It says that there's stuff in the underlying algorithmic space, in the design space, where you move a bump and get a lump of capability out the other side.

It's a remarkable fact about gradient descent that it can memorize a certain set of shallower patterns at much higher rates, at much higher bandwidth, than evolution lays down genes - something shallower than biological memory, shallower than genes, but distributing across computer cores and thereby able to process larger datasets than biological organisms, even if it only learns shallow things.

This has provided an alternate avenue toward some cognitive domains.

But that doesn't mean that the deep stuff isn't there, and can't be run across, or that it will never be run across in the history of AI before shallow non-widely-generalizing stuff is able to make its way through the regulatory processes and have a huge impact on GDP.

There are in fact ways to eat whole swaths of domains at once.

The history of hominid evolution tells us this or very strongly hints it, even though evolution wasn't explicitly optimizing for GDP impact.

Natural selection moves by adding genes, and not too many of them.

If so many domains got added at once to humans, relative to chimps, there must be a way to do that, more or less, by adding not too many genes onto a chimp, who in turn contains only genes that did well on chimp-stuff.

You can imagine that AI technology never runs across any core that generalizes this well, until GDP has had a chance to double over 4 years because shallow stuff that generalized less well has somehow had a chance to make its way through the whole economy and get adopted that widely despite all real-world regulatory barriers and reluctances, but your imagining that does not make it so.

There's the potential in design space to pull off things as wide as humans.

The path that evolution took there doesn't lead through things that generalized 95% as well as humans first for 10% of the impact, not because evolution wasn't optimizing for that, but because that's not how the underlying cognitive technology worked.

There may be different cognitive technology that could follow a path like that. Gradient descent follows a path a bit relatively more in that direction along that axis - providing that you deal in systems that are giant layer cakes of transformers and that's your whole input-output relationship; matters are different if we're talking about Mu Zero instead of GPT-3.

But this whole document is presenting the case of "ah yes, well, by default, of course, we intuitively expect gargantuan impacts to be presaged by enormous impacts, and sure humans and chimps weren't like our intuition, but that's all invalid because circumstances were different, so we go back to that intuition as a strong default" and actually it's postulating, like, a specific input-output curve that isn't the input-output curve we know about. It's asking for a specific miracle. It's saying, "What if AI technology goes just like this, in the future?" and hiding that under a cover of "Well, of course that's the default, it's such a strong default that we should start from there as a point of departure, consider the arguments in Intelligence Explosion Microeconomics, find ways that they might not be true because evolution is different, dismiss them, and go back to our point of departure."

And evolution is different but that doesn't mean that the path AI takes is going to yield this specific behavior, especially when AI would need, in some sense, to miss the core that generalizes very widely, or rather, have run across noncore things that generalize widely enough to have this much economic impact before it runs across the core that generalizes widely.

And you may say, "Well, but I don't care that much about GDP, I care about pivotal acts."

But then I want to call your attention to the fact that this document was written about GDP, despite all the extra burdensome assumptions involved in supposing that intermediate AI advancements could break through all barriers to truly massive-scale adoption and end up reflected in GDP, and then proceed to double the world economy over 4 years during which not enough further AI advancement occurred to find a widely generalizing thing like humans have and end the world. This is indicative of a basic problem in this whole way of thinking that wanted smooth impacts over smoothly changing time. You should not be saying, "Oh, well, leave the GDP part out then," you should be doubting the whole way of thinking.

To be a little bit more precise and Bayesian: the prior probability of the story I’ve told upper bounds the possible update about the nature of intelligence.

Prior probabilities of specifically-reality-constraining theories that excuse away the few contradictory datapoints we have, often aren't that great; and when we start to stake our whole imaginations of the future on them, we depart from the mainline into our more comfortable private fantasy worlds.

AGI will be a side-effect

[Yudkowsky][19:29]

Summary of my response: I expect people to see AGI coming and to invest heavily.

This section is arguing from within its own weird paradigm, and its subject matter mostly causes me to shrug; I never expected AGI to be a side-effect, except in the obvious sense that lots of tributary tech will be developed while optimizing for other things. The world will be ended by an explicitly AGI project because I do expect that it is rather easier to build an AGI on purpose than by accident.

(I furthermore rather expect that it will be a research project and a prototype, because the great gap between prototypes and commercializable technology will ensure that prototypes are much more advanced than whatever is currently commercializable. They will have eyes out for commercial applications, and whatever breakthrough they made will seem like it has obvious commercial applications, at the time when all hell starts to break loose. (After all hell starts to break loose, things get less well defined in my social models, and also choppier for a time in my AI models - the turbulence only starts to clear up once you start to rise out of the atmosphere.))

Finding the secret sauce

[Yudkowsky][19:40]

Summary of my response: this doesn’t seem common historically, and I don’t see why we’d expect AGI to be more rather than less like this (unless we accept one of the other arguments)
[...]
To the extent that fast takeoff proponent’s views are informed by historical example, I would love to get some canonical examples that they think best exemplify this pattern so that we can have a more concrete discussion about those examples and what they suggest about AI.

...humans and chimps?

...fission weapons?

...AlphaGo?

...the Wright Brothers focusing on stability and building a wind tunnel?

...AlphaFold 2 coming out of Deepmind and shocking the heck out of everyone in the field of protein folding with performance far better than they expected even after the previous shock of AlphaFold, by combining many pieces that I suppose you could find precedents for scattered around the AI field, but with those many secret sauces all combined in one place by the meta-secret-sauce of "Deepmind alone actually knows how to combine that stuff and build things that complicated without a prior example"?

...humans and chimps again because this is really actually a quite important example because of what it tells us about what kind of possibilities exist in the underlying design space of cognitive systems?

Historical AI applications have had a relatively small loading on key-insights and seem like the closest analogies to AGI.

...Transformers as the key to text prediction?

The case of humans and chimps, even if evolution didn't do it on purpose, is telling us something about underlying mechanics.

The reason the jump to lightspeed didn't look like evolution slowly developing a range of intelligent species competing to exploit an ecological niche 5% better, or like the way that a stable non-Silicon-Valley manufacturing industry looks like a group of competitors summing up a lot of incremental tech enhancements to produce something with 10% higher scores on a benchmark every year, is that developing intelligence is a case where a relatively narrow technology by biological standards just happened to do a huge amount of stuff without that requiring developing whole new fleets of other biological capabilities.

So it looked like building a Wright Flyer that flies or a nuclear pile that reaches criticality, instead of looking like being in a stable manufacturing industry where a lot of little innovations sum to 10% better benchmark performance every year.

So, therefore, there is stuff in the design space that does that. It is possible to build humans.

Maybe you can build things other than humans first, maybe they hang around for a few years. If you count GPT-3 as "things other than human", that clock has already started for all the good it does. But humans don't get any less possible.

From my perspective, this whole document feels like one very long filibuster of "Smooth outputs are default. Smooth outputs are default. Pay no attention to this case of non-smooth output. Pay no attention to this other case either. All the non-smooth outputs are not in the right reference class. (Highly competitive manufacturing industries with lots of competitors are totally in the right reference class though. I'm not going to make that case explicitly because then you might think of how it might be wrong, I'm just going to let that implicit thought percolate at the back of your mind.) If we just talk a lot about smooth outputs and list ways that nonsmooth output producers aren't necessarily the same and arguments for nonsmooth outputs could fail, we get to go back to the intuition of smooth outputs. (We're not even going to discuss particular smooth outputs as cases in point, because then you might see how those cases might not apply. It's just the default. Not because we say so out loud, but because we talk a lot like that's the conclusion you're supposed to arrive at after reading.)"

I deny the implicit meta-level assertion of this entire essay which would implicitly have you accept as valid reasoning the argument structure, "Ah, yes, given the way this essay is written, we must totally have pretty strong prior reasons to believe in smooth outputs - just implicitly think of some smooth outputs, that's a reference class, now you have strong reason to believe that AGI output is smooth - we're not even going to argue this prior, just talk like it's there - now let us consider the arguments against smooth outputs - pretty weak, aren't they? we can totally imagine ways they could be wrong? we can totally argue reasons these cases don't apply? So at the end we go back to our strong default of smooth outputs. This essay is written with that conclusion, so that must be where the arguments lead."

Me: "Okay, so what if somebody puts together the pieces required for general intelligence and it scales pretty well with added GPUs and FOOMS? Say, for the human case, that's some perceptual systems with imaginative control, a concept library, episodic memory, realtime procedural skill memory, which is all in chimps, and then we add some reflection to that, and get a human. Only, unlike with humans, once you have a working brain you can make a working brain 100X that large by adding 100X as many GPUs, and it can run some thoughts 10000X as fast. And that is substantially more effective brainpower than was being originally devoted to putting its design together, as it turns out. So it can make a substantially smarter AGI. For concreteness's sake. Reality has been trending well to the Eliezer side of Eliezer, on the Eliezer-Hanson axis, so perhaps you can do it more simply than that."

Simplicio: "Ah, but what if, 5 years before then, somebody puts together some other AI which doesn't work like a human, and generalizes widely enough to have a big economic impact, but not widely enough to improve itself or generalize to AI tech or generalize to everything and end the world, and in 1 year it gets all the mass adoptions required to do whole bunches of stuff out in the real world that current regulations require to be done in various exact ways regardless of technology, and then in the next 4 years it doubles the world economy?"

Me: "Like... what kind of AI, exactly, and why didn't anybody manage to put together a full human-level thingy during those 5 years? Why are we even bothering to think about this whole weirdly specific scenario in the first place?"

Simplicio: "Because if you can put together something that has an enormous impact, you should be able to put together most of the pieces inside it and have a huge impact! Most technologies are like this. I've considered some things that are not like this and concluded they don't apply."

Me: "Especially if we are talking about impact on GDP, it seems to me that most explicit and implicit 'technologies' are not like this at all, actually. There wasn't a cryptocurrency developed a year before Bitcoin using 95% of the ideas which did 10% of the transaction volume, let alone a preatomic bomb. But, like, can you give me any concrete visualization of how this could play out?"

And there is no concrete visualization of how this could play out. Anything I'd have Simplicio say in reply would be unrealistic because there is no concrete visualization they give us. It is not a coincidence that I often use concrete language and concrete examples, and this whole field of argument does not use concrete language or offer concrete examples.

Though if we're sketching scifi scenarios, I suppose one could imagine a group that develops sufficiently advanced GPT-tech and deploys it on Twitter in order to persuade voters and politicians in a few developed countries to institute open borders, along with political systems that can handle open borders, and to permit housing construction, thereby doubling world GDP over 4 years. And since it was possible to use relatively crude AI tech to double world GDP this way, it legitimately takes the whole 4 years after that to develop real AGI that ends the world. FINE. SO WHAT. EVERYONE STILL DIES.

Universality thresholds

[Yudkowsky][20:21]

It’s easy to imagine a weak AI as some kind of handicapped human, with the handicap shrinking over time. Once the handicap goes to 0 we know that the AI will be above the universality threshold. Right now it’s below the universality threshold. So there must be sometime in between where it crosses the universality threshold, and that’s where the fast takeoff is predicted to occur.
But AI isn’t like a handicapped human. Instead, the designers of early AI systems will be trying to make them as useful as possible. So if universality is incredibly helpful, it will appear as early as possible in AI designs; designers will make tradeoffs to get universality at the expense of other desiderata (like cost or speed).
So now we’re almost back to the previous point: is there some secret sauce that gets you to universality, without which you can’t get universality however you try? I think this is unlikely for the reasons given in the previous section.

We know, because humans, that there is humanly-widely-applicable general-intelligence tech.

What this section wants to establish, I think, or needs to establish to carry the argument, is that there is some intelligence tech that is wide enough to double the world economy in 4 years, but not world-endingly scalably wide, which becomes a possible AI tech 4 years before any general-intelligence-tech that will, if you put in enough compute, scale to the ability to do a sufficiently large amount of wide thought to FOOM (or build nanomachines, but if you can build nanomachines you can very likely FOOM from there too if not corrigible).

What it says instead is, "I think we'll get universality much earlier on the equivalent of the biological timeline that has humans and chimps, so the resulting things will be weaker than humans at the point where they first become universal in that sense."

This is very plausibly true.

It doesn't mean that when this exciting result gets 100 times more compute dumped on the project, it takes at least 5 years to get anywhere really interesting from there (while also taking only 1 year to get somewhere sorta-interesting enough that the instantaneous adoption of it will double the world economy over the next 4 years).

It also isn't necessarily rather than plausibly true. For example, the thing that becomes universal, could also have massive gradient descent shallow powers that are far beyond what primates had at the same age.

Primates weren't already writing code as well as Codex when they started doing deep thinking. They couldn't do precise floating-point arithmetic. Their fastest serial rates of thought were a hell of a lot slower. They had no access to their own code or to their own memory contents etc. etc. etc.

But mostly I just want to call your attention to the immense gap between what this section needs to establish, and what it actually says and argues for.

What it actually argues for is a sort of local technological point: at the moment when generality first arrives, it will be with a brain that is less sophisticated than chimp brains were when they turned human.

It implicitly jumps all the way from there, across a whole lot of elided steps, to the implicit conclusion that this tech or elaborations of it will have smooth output behavior such that at some point the resulting impact is big enough to double the world economy in 4 years, without any further improvements ending the world economy before 4 years.

The underlying argument about how the AI tech might work is plausible. Chimps are insanely complicated. I mostly expect we will have AGI long before anybody is even trying to build anything that complicated.

The very next step of the argument, about capabilities, is already very questionable because this system could be using immense gradient descent capabilities to master domains for which large datasets are available, and hominids did not begin with instinctive great shallow mastery of all domains for which a large dataset could be made available, which is why hominids don't start out playing superhuman Go as soon as somebody tells them the rules and they do one day of self-play, which is the sort of capability that somebody could hook up to a nascent AGI (albeit we could optimistically and fondly and falsely imagine that somebody deliberately didn't floor the gas pedal as far as possible).

Could we have huge impacts out of some subuniversal shallow system that was hooked up to capabilities like this? Maybe, though this is not the argument made by the essay. It would be a specific outcome that isn't forced by anything in particular, but I can't say it's ruled out. Mostly my twin reactions to this are, "If the AI tech is that dumb, how are all the bureaucratic constraints that actually rate-limit economic progress getting bypassed" and "Okay, but ultimately, so what and who cares, how does this modify that we all die?"

There is another reason I’m skeptical about hard takeoff from universality secret sauce: I think we already could make universal AIs if we tried (that would, given enough time, learn on their own and converge to arbitrarily high capability levels), and the reason we don’t is because it’s just not important to performance and the resulting systems would be really slow. This inside view argument is too complicated to make here and I don’t think my case rests on it, but it is relevant to understanding my view.

I have no idea why this argument is being made or where it's heading. I cannot pass the ITT of the author. I don't know what the author thinks this has to do with constraining takeoffs to be slow instead of fast. At best I can conjecture that the author thinks that "hard takeoff" is supposed to derive from "universality" being very sudden and hard to access and late in the game, so if you can argue that universality could be accessed right now, you have defeated the argument for hard takeoff.

"Understanding" is discontinuous

[Yudkowsky][20:41]

Summary of my response: I don’t yet understand this argument and am unsure if there is anything here.
It may be that understanding of the world tends to click, from “not understanding much” to “understanding basically everything.” You might expect this because everything is entangled with everything else.

No, the idea is that a core of overlapping somethingness, trained to handle chipping handaxes and outwitting other monkeys, will generalize to building spaceships; so evolutionarily selecting on understanding a bunch of stuff, eventually ran across general stuff-understanders that understood a bunch more stuff.

Gradient descent may be genuinely different from this, but we shouldn't confuse imagination with knowledge when it comes to extrapolating that difference onward. At present, gradient descent does mass memorization of overlapping shallow patterns, which then combine to yield a weird pseudo-intelligence over domains for which we can deploy massive datasets, without yet generalizing much outside those domains.

We can hypothesize that there is some next step up to some weird thing that is intermediate in generality between gradient descent and humans, but we have not seen it yet, and we should not confuse imagination for knowledge.

If such a thing did exist, it would not necessarily be at the right level of generality to double the world economy in 4 years, without being able to build a better AGI.

If it was at that level of generality, it's nowhere written that no other company will develop a better prototype at a deeper level of generality over those 4 years.

I will also remark that you sure could look at the step from GPT-2 to GPT-3 and say, "Wow, look at the way a whole bunch of stuff just seemed to simultaneously click for GPT-3."

Deployment lag

[Yudkowsky][20:49]

Summary of my response: current AI is slow to deploy and powerful AI will be fast to deploy, but in between there will be AI that takes an intermediate length of time to deploy.

An awful lot of my model of deployment lag is adoption lag and regulatory lag and bureaucratic sclerosis across companies and countries.

If doubling GDP is such a big deal, go open borders and build houses. Oh, that's illegal? Well, so will be AIs building houses!

AI tech that does flawless translation could plausibly come years before AGI, but that doesn't mean all the barriers to international trade and international labor movement and corporate hiring across borders all come down, because those barriers are not all translation barriers.

There's then a discontinuous jump at the point where everybody falls over dead and the AI goes off to do its own thing without FDA approval. This jump is precedented by earlier pre-FOOM prototypes being able to do pre-FOOM cool stuff, maybe, but not necessarily precedented by mass-market adoption of anything major enough to double world GDP.

Recursive self-improvement

[Yudkowsky][20:54]

Summary of my response: Before there is AI that is great at self-improvement there will be AI that is mediocre at self-improvement.

Oh, come on. That is straight-up not how simple continuous toy models of RSI work. Between a neutron multiplication factor of 0.999 and 1.001 there is a very huge gap in output behavior.

Outside of toy models: Over the last 10,000 years we had humans going from mediocre at improving their mental systems to being (barely) able to throw together AI systems, but 10,000 years is the equivalent of an eyeblink in evolutionary time - outside the metaphor, this says, "A month before there is AI that is great at self-improvement, there will be AI that is mediocre at self-improvement."

(Or possibly an hour before, if reality is again more extreme along the Eliezer-Hanson axis than Eliezer. But it makes little difference whether it's an hour or a month, given anything like current setups.)

This is just pumping hard again on the intuition that says incremental design changes yield smooth output changes, which (the meta-level of the essay informs us wordlessly) is such a strong default that we are entitled to believe it if we can do a good job of weakening the evidence and arguments against it.

And the argument is: Before there are systems great at self-improvement, there will be systems mediocre at self-improvement; implicitly: "before" implies "5 years before" not "5 days before"; implicitly: this will correspond to smooth changes in output between the two regimes even though that is not how continuous feedback loops work.

Train vs. test

[Yudkowsky][21:12]

Summary of my response: before you can train a really powerful AI, someone else can train a slightly worse AI.

Yeah, and before you can evolve a human, you can evolve a Homo erectus, which is a slightly worse human.

If you are able to raise $X to train an AGI that could take over the world, then it was almost certainly worth it for someone 6 months ago to raise $X/2 to train an AGI that could merely radically transform the world, since they would then get 6 months of absurd profits.

I suppose this sentence makes a kind of sense if you assume away alignability and suppose that the previous paragraphs have refuted the notion of FOOMs, self-improvement, and thresholds between compounding returns and non-compounding returns (eg, in the human case, cognitive innovations like "written language" or "science"). If you suppose the previous sections refuted those things, then clearly, if you raised an AGI that you had aligned to "take over the world", it got that way through cognitive powers that weren't the result of FOOMing or other self-improvements, weren't the results of its cognitive powers crossing a threshold from non-compounding to compounding, wasn't the result of its understanding crossing a threshold of universality as the result of chunky universal machinery such as humans gained over chimps, so, implicitly, it must have been the kind of thing that you could learn by gradient descent, and do a half or a tenth as much of by doing half as much gradient descent, in order to build nanomachines a tenth as well-designed that could bypass a tenth as much bureaucracy.

If there are no unsmooth parts of the tech curve, the cognition curve, or the environment curve, then you should be able to make a bunch of wealth using a more primitive version of any technology that could take over the world.

And when we look back at history, why, that may be totally true! They may have deployed universal superhuman translator technology for 6 months, which won't double world GDP, but which a lot of people would pay for, and made a lot of money! Because even though there's no company that built 90% of Amazon's website and has 10% the market cap, when you zoom back out to look at whole industries like AI and a technological capstone like AGI, why, those whole industries do sometimes make some money along the way to the technological capstone, if they can find a niche that isn't too regulated! Which translation currently isn't! So maybe somebody used precursor tech to build a superhuman translator and deploy it 6 months earlier and made a bunch of money for 6 months. SO WHAT. EVERYONE STILL DIES.

As for "radically transforming the world" instead of "taking it over", I think that's just re-restated FOOM denialism. Doing either of those things quickly against human bureaucratic resistance strike me as requiring cognitive power levels dangerous enough that failure to align them on corrigibility would result in FOOMs.

Like, if you can do either of those things on purpose, you are doing it by operating in the regime where running the AI with higher bounds on the for loop will FOOM it, but you have politely asked it not to FOOM, please.

If the people doing this have any sense whatsoever, they will refrain from merely massively transforming the world until they are ready to do something that prevents the world from ending.

And if the gap from "massively transforming the world, briefly before it ends" to "preventing the world from ending, lastingly" takes much longer than 6 months to cross, or if other people have the same technologies that scale to "massive transformation", somebody else will build an AI that fooms all the way.

Likewise, if your AGI would give you a decisive strategic advantage, they could have spent less earlier in order to get a pretty large military advantage, which they could then use to take your stuff.

Again, this presupposes some weird model where everyone has easy alignment at the furthest frontiers of capability; everybody has the aligned version of the most rawly powerful AGI they can possibly build; and nobody in the future has the kind of tech advantage that Deepmind currently has; so before you can amp your AGI to the raw power level where it could take over the whole world by using the limit of its mental capacities to military ends - alignment of this being a trivial operation to be assumed away - some other party took their easily-aligned AGI that was less powerful at the limits of its operation, and used it to get 90% as much military power... is the implicit picture here?

Whereas the picture I'm drawing is that the AGI that kills you via "decisive strategic advantage" is the one that foomed and got nanotech, and no, the AI tech from 6 months earlier did not do 95% of a foom and get 95% of the nanotech.

Discontinuities at 100% automation

[Yudkowsky][21:31]

Summary of my response: at the point where humans are completely removed from a process, they will have been modestly improving output rather than acting as a sharp bottleneck that is suddenly removed.

Not very relevant to my whole worldview in the first place; also not a very good description of how horses got removed from automobiles, or how humans got removed from playing Go.

The weight of evidence

[Yudkowsky][21:31]

We’ve discussed a lot of possible arguments for fast takeoff. Superficially it would be reasonable to believe that no individual argument makes fast takeoff look likely, but that in the aggregate they are convincing.
However, I think each of these factors is perfectly consistent with the continuous change story and continuously accelerating hyperbolic growth, and so none of them undermine that hypothesis at all.

Uh huh. And how about if we have a mirror-universe essay which over and over again treats fast takeoff as the default to be assumed, and painstakingly shows how a bunch of particular arguments for slow takeoff might not be true?

This entire essay seems to me like it's drawn from the same hostile universe that produced Robin Hanson's side of the Yudkowsky-Hanson Foom Debate.

Like, all these abstract arguments devoid of concrete illustrations and "it need not necessarily be like..." and "now that I've shown it's not necessarily like X, well, on the meta-level, I have implicitly told you that you now ought to believe Y".

It just seems very clear to me that the sort of person who is taken in by this essay is the same sort of person who gets taken in by Hanson's arguments in 2008 and gets caught flatfooted by AlphaGo and GPT-3 and AlphaFold 2.

And empirically, it has already been shown to me that I do not have the power to break people out of the hypnosis of nodding along with Hansonian arguments, even by writing much longer essays than this.

Hanson's fond dreams of domain specificity, and smooth progress for stuff like Go, and of course somebody else has a precursor 90% as good as AlphaFold 2 before Deepmind builds it, and GPT-3 levels of generality just not being a thing, now stand refuted.

Despite that they're largely being exhibited again in this essay.

And people are still nodding along.

Reality just... doesn't work like this on some deep level.

It doesn't play out the way that people imagine it would play out when they're imagining a certain kind of reassuring abstraction that leads to a smooth world. Reality is less fond of that kind of argument than a certain kind of EA is fond of that argument.

There is a set of intuitive generalizations from experience which rules that out, which I do not know how to convey. There is an understanding of the rules of argument which leads you to roll your eyes at Hansonian arguments and all their locally invalid leaps and snuck-in defaults, instead of nodding along sagely at their wise humility and outside viewing and then going "Huh?" when AlphaGo or GPT-3 debuts. But this, I empirically do not seem to know how to convey to people, in advance of the inevitable and predictable contradiction by a reality which is not as fond of Hansonian dynamics as Hanson. The arguments sound convincing to them.

(Hanson himself has still not gone "Huh?" at the reality, though some of his audience did; perhaps because his abstractions are loftier than his audience's? - because some of his audience, reading along to Hanson, probably implicitly imagined a concrete world in which GPT-3 was not allowed; but maybe Hanson himself is more abstract than this, and didn't imagine anything so merely concrete?)

If I don't respond to essays like this, people find them comforting and nod along. If I do respond, my words are less comforting and more concrete and easier to imagine concrete objections to, less like a long chain of abstractions that sound like the very abstract words in research papers and hence implicitly convincing because they sound like other things you were supposed to believe.

And then there is another essay in 3 months. There is an infinite well of them. I would have to teach people to stop drinking from the well, instead of trying to whack them on the back until they cough up the drinks one by one, or actually, whacking them on the back and then they don't cough them up until reality contradicts them, and then a third of them notice that and cough something up, and then they don't learn the general lesson and go back to the well and drink again. And I don't know how to teach people to stop drinking from the well. I tried to teach that. I failed. If I wrote another Sequence I have no idea to believe that Sequence would work.

So what EAs will believe at the end of the world, will look like whatever the content was of the latest bucket from the well of infinite slow-takeoff arguments that hasn't yet been blatantly-even-to-them refuted by all the sharp jagged rapidly-generalizing things that happened along the way to the world's end.

And I know, before anyone bothers to say, that all of this reply is not written in the calm way that is right and proper for such arguments. I am tired. I have lost a lot of hope. There are not obvious things I can do, let alone arguments I can make, which I expect to be actually useful in the sense that the world will not end once I do them. I don't have the energy left for calm arguments. What's left is despair that can be given voice.

5.6. Yudkowsky/Christiano discussion: AI progress and crossover points

[Christiano][22:15]

To the extent that it was possible to make any predictions about 2015-2020 based on your views, I currently feel like they were much more wrong than right. I’m happy to discuss that. To the extent you are willing to make any bets about 2025, I expect they will be mostly wrong and I’d be happy to get bets on the record (most of all so that it will be more obvious in hindsight whether they are vindication for your view). Not sure if this is the place for that.

Could also make a separate channel to avoid clutter.

[Yudkowsky][22:16]

Possibly. I think that 2015-2020 played out to a much more Eliezerish side than Eliezer on the Eliezer-Hanson axis, which sure is a case of me being wrong. What bets do you think we'd disagree on for 2025? I expect you have mostly misestimated my views, but I'm always happy to hear about anything concrete.

[Christiano][22:20]

I think the big points are: (i) I think you are significantly overestimating how large a discontinuity/trend break AlphaZero is, (ii) your view seems to imply that we will move quickly from much worse than humans to much better than humans, but it's likely that we will move slowly through the human range on many tasks. I'm not sure if we can get a bet out of (ii), I think I don't understand your view that well but I don't see how it could make the same predictions as mine over the next 10 years.

[Yudkowsky][22:22]

What are your 10-year predictions?

[Christiano][22:23]

My basic expectation is that for any given domain AI systems will gradually increase in usefulness, we will see a crossing over point where their output is comparable to human output, and that from that time we can estimate how long until takeoff by estimating "how long does it take AI systems to get 'twice as impactful'?" which gives you a number like ~1 year rather than weeks. At the crossing over point you get a somewhat rapid change in derivative, since you are looking at (x+y) where y is growing faster than x.

I feel like that should translate into different expectations about how impactful AI will be in any given domain---I don't see how to make the ultra-fast-takeoff view work if you think that AI output is increasingly smoothly (since the rate of progress at the crossing-over point will be similar to the current rate of progress, unless R&D is scaling up much faster then)

So like, I think we are going to have crappy coding assistants, and then slightly less crappy coding assistants, and so on. And they will be improving the speed of coding very significantly before the end times.

[Yudkowsky][22:25]

You think in a different language than I do. My more confident statements about AI tech are about what happens after it starts to rise out of the metaphorical atmosphere and the turbulence subsides. When you have minds as early on the cognitive tech tree as humans they sure can get up to some weird stuff, I mean, just look at humans. Now take an utterly alien version of that with its own draw from all the weirdness factors. It sure is going to be pretty weird.

[Christiano][22:26]

OK, but you keep saying stuff about how people with my dumb views would be "caught flat-footed" by historical developments. Surely to be able to say something like that you need to be making some kind of prediction?

[Yudkowsky][22:26]

Well, sure, now that Codex has suddenly popped into existence one day at a surprisingly high base level of tech, we should see various jumps in its capability over the years and some outside imitators. What do you think you predict differently about that than I do?

[Christiano][22:26]

Why do you think codex is a high base level of tech?

The models get better continuously as you scale them up, and the first tech demo is weak enough to be almost useless

[Yudkowsky][22:27]

I think the next-best coding assistant was, like, not useful.

[Christiano][22:27]

yes

and it is still not useful

[Yudkowsky][22:27]

Could be. Some people on HN seemed to think it was useful.

I haven't tried it myself.

[Christiano][22:27]

OK, I'm happy to take bets

[Yudkowsky][22:28]

I don't think the previous coding assistant would've been very good at coding an asteroid game, even if you tried a rigged demo at the same degree of rigging?

[Christiano][22:28]

it's unquestionably a radically better tech demo

[Yudkowsky][22:28]

Where by "previous" I mean "previously deployed" not "previous generations of prototypes inside OpenAI's lab".

[Christiano][22:28]

My basic story is that the model gets better and more useful with each doubling (or year of AI research) in a pretty smooth way. So the key underlying parameter for a discontinuity is how soon you build the first version---do you do that before or after it would be a really really big deal?

and the answer seems to be: you do it somewhat before it would be a really big deal

and then it gradually becomes a bigger and bigger deal as people improve it

maybe we are on the same page about getting gradually more and more useful? But I'm still just wondering where the foom comes from

[Yudkowsky][22:30]

So, like... before we get systems that can FOOM and build nanotech, we should get more primitive systems that can write asteroid games and solve protein folding? Sounds legit.

So that happened, and now your model says that it's fine later on for us to get a FOOM, because we have the tech precursors and so your prophecy has been fulfilled?

[Christiano][22:31]

[Yudkowsky][22:31]

Didn't think so.

[Christiano][22:31]

I can't tell if you can't understand what I'm saying, or aren't trying, or do understand and are just saying kind of annoying stuff as a rhetorical flourish

at some point you have an AI system that makes (humans+AI) 2x as good at further AI progress

[Yudkowsky][22:32]

I know that what I'm saying isn't your viewpoint. I don't know what your viewpoint is or what sort of concrete predictions it makes at all, let alone what such predictions you think are different from mine.

[Christiano][22:32]

maybe by continuity you can grant the existence of such a system, even if you don't think it will ever exist?

I want to (i) make the prediction that AI will actually have that impact at some point in time, (ii) talk about what happens before and after that

I am talking about AI systems that become continuously more useful, because "become continuously more useful" is what makes me think that (i) AI will have that impact at some point in time, (ii) allows me to productively reason about what AI will look like before and after that. I expect that your view will say something about why AI improvements either aren't continuous, or why continuous improvements lead to discontinuous jumps in the productivity of the (human+AI) system

[Yudkowsky][22:34]

at some point you have an AI system that makes (humans+AI) 2x as good at further AI progress

Is this prophecy fulfilled by using some narrow eld-AI algorithm to map out a TPU, and then humans using TPUs can write in 1 month a research paper that would otherwise have taken 2 months? And then we can go on to FOOM now that this prophecy about pre-FOOM states has been fulfilled? I know the answer is no, but I don't know what you think is a narrower condition on the prophecy than that.

[Christiano][22:35]

If you can use narrow eld-AI in order to make every part of AI research 2x faster, so that the entire field moves 2x faster, then the prophecy is fulfilled

and it may be just another 6 months until it makes all of AI research 2x faster again, and then 3 months, and then...

[Yudkowsky][22:36]

What, the entire field? Even writing research papers? Even the journal editors approving and publishing the papers? So if we speed up every part of research except the journal editors, the prophecy has not been fulfilled and no FOOM may take place?

[Christiano][22:36]

no, I mean the improvement in overall output, given the actual realistic level of bottlenecking that occurs in practice

[Yudkowsky][22:37]

So if the realistic level of bottlenecking ever becomes dominated by a human gatekeeper, the prophecy is ever unfulfillable and no FOOM may ever occur.

[Christiano][22:37]

that's what I mean by "2x as good at further progress," the entire system is achieving twice as much

then the prophecy is unfulfillable and I will have been wrong

I mean, I think it's very likely that there will be a hard takeoff, if people refuse or are unable to use AI to accelerate AI progress for reasons unrelated to AI capabilities, and then one day they become willing

[Yudkowsky][22:38]

...because on your view, the Prophecy necessarily goes through humans and AIs working together to speed up the whole collective field of AI?

[Christiano][22:38]

it's fine if the AI works alone

the point is just that it overtakes the humans at the point when it is roughly as fast as the humans

why wouldn't it?

why does it overtake the humans when it takes it 10 seconds to double in capability instead of 1 year?

that's like predicting that cultural evolution will be infinitely fast, instead of making the more obvious prediction that it will overtake evolution exactly when it's as fast as evolution

[Yudkowsky][22:39]

I live in a mental world full of weird prototypes that people are shepherding along to the world's end. I'm not even sure there's a short sentence in my native language that could translate the short Paul-sentence "is roughly as fast as the humans".

[Christiano][22:40]

do you agree that you can measure the speed with which the community of human AI researchers develop and implement improvements in their AI systems?

like, we can look at how good AI systems are in 2021, and in 2022, and talk about the rate of progress?

[Yudkowsky][22:40]

...when exactly in hominid history was hominid intelligence exactly as fast as evolutionary optimization???

do you agree that you can measure the speed with which the community of human AI researchers develop and implement improvements in their AI systems?

I mean... obviously not? How the hell would we measure real actual AI progress? What would even be the Y-axis on that graph?

I have a rough intuitive feeling that it was going faster in 2015-2017 than 2018-2020.

"What was?" says the stern skeptic, and I go "I dunno."

[Christiano][22:42]

Here's a way of measuring progress you won't like: for almost all tasks, you can initially do them with lots of compute, and as technology improves you can do them with less compute. We can measure how fast the amount of compute required is going down.

[Yudkowsky][22:43]

Yeah, that would be a cool thing to measure. It's not obviously a relevant thing to anything important, but it'd be cool to measure.

[Christiano][22:43]

Another way you won't like: we can hold fixed the resources we invest and look at the quality of outputs in any given domain (or even $ of revenue) and ask how fast it's changing.

[Yudkowsky][22:43]

I wonder what it would say about Go during the age of AlphaGo.

Or what that second metric would say.

[Christiano][22:43]

I think it would be completely fine, and you don't really understand what happened with deep learning in board games. Though I also don't know what happened in much detail, so this is more like a prediction then a retrodiction.

But it's enough of a retrodiction that I shouldn't get too much credit for it.

[Yudkowsky][22:44]

I don't know what result you would consider "completely fine". I didn't have any particular unfine result in mind.

[Christiano][22:45]

oh, sure

if it was just an honest question happy to use it as a concrete case

I would measure the rate of progress in Go by looking at how fast Elo improves with time or increasing R&D spending

[Yudkowsky][22:45]

I mean, I don't have strong predictions about it so it's not yet obviously cruxy to me

[Christiano][22:46]

I'd roughly guess that would continue, and if there were multiple trendlines to extrapolate I'd estimate crossover points based on that

[Yudkowsky][22:47]

suppose this curve is smooth, and we see that sharp Go progress over time happened because Deepmind dumped in a ton of increased R&D spend. you then argue that this cannot happen with AGI because by the time we get there, people will be pushing hard at the frontiers in a competitive environment where everybody's already spending what they can afford, just like in a highly competitive manufacturing industry.

[Christiano][22:47]

the key input to making a prediction for AGZ in particular would be the precise form of the dependence on R&D spending, to try to predict the changes as you shift from a single programmer to a large team at DeepMind, but most reasonable functional forms would be roughly right

Yes, it's definitely a prediction of my view that it's easier to improve things that people haven't spent much money on than things have spent a lot of money on. It's also a separate prediction of my view that people are going to be spending a boatload of money on all of the relevant technologies. Perhaps $1B/year right now and I'm imagining levels of investment large enough to be essentially bottlenecked on the availability of skilled labor.

[Bensinger][22:48]

( Previous Eliezer-comments about AlphaGo as a break in trend, responding briefly to Miles Brundage: https://twitter.com/ESRogs/status/1337869362678571008 )

5.7. Legal economic growth

[Yudkowsky][22:49]

Does your prediction change if all hell breaks loose in 2025 instead of 2055?

[Christiano][22:50]

I think my prediction was wrong if all hell breaks loose in 2025, if by "all hell breaks loose" you mean "dyson sphere" and not "things feel crazy"

[Yudkowsky][22:50]

Things feel crazy in the AI field and the world ends less than 4 years later, well before the world economy doubles.

Why was the Prophecy wrong if the world begins final descent in 2025? The Prophecy requires the world to then last until 2029 while doubling its economic output, after which it is permitted to end, but does not obviously to me forbid the Prophecy to begin coming true in 2025 instead of 2055.

[Christiano][22:52]

yes, I just mean that some important underlying assumptions for the prophecy were violated, I wouldn't put much stock in it at that point, etc.

[Yudkowsky][22:53]

A lot of the issues I have with understanding any of your terminology in concrete Eliezer-language is that it looks to me like the premise-events of your Prophecy are fulfillable in all sorts of ways that don't imply the conclusion-events of the Prophecy.

[Christiano][22:53]

if "things feel crazy" happens 4 years before dyson sphere, then I think we have to be really careful about what crazy means

[Yudkowsky][22:54]

a lot of people looking around nervously and privately wondering if Eliezer was right, while public pravda continues to prohibit wondering anything such thing out loud, so they all go on thinking that they must be wrong.

[Christiano][22:55]

OK, by "things get crazy" I mean like hundreds of billions of dollars of spending at google on automating AI R&D

[Yudkowsky][22:55]

I expect bureaucratic obstacles to prevent much GDP per se from resulting from this.

[Christiano][22:55]

massive scaleups in semiconductor manufacturing, bidding up prices of inputs crazily

[Yudkowsky][22:55]

I suppose that much spending could well increase world GDP by hundreds of billions of dollars per year.

[Christiano][22:56]

massive speculative rises in AI company valuations financing a significant fraction of GWP into AI R&D

(+hardware R&D, +building new clusters, +etc.)

[Yudkowsky][22:56]

like, higher than Tesla? higher than Bitcoin?

both of these things sure did skyrocket in market cap without that having much of an effect on housing stocks and steel production.

[Christiano][22:57]

right now I think hardware R&D is on the order of $100B/year, AI R&D is more like $10B/year, I guess I'm betting on something more like trillions? (limited from going higher because of accounting problems and not that much smart money)

I don't think steel production is going up at that point

plausibly going down since you are redirecting manufacturing capacity into making more computers. But probably just staying static while all of the new capacity is going into computers, since cannibalizing existing infrastructure is much more expensive

the original point was: you aren't pulling AlphaZero shit any more, you are competing with an industry that has invested trillions in cumulative R&D

[Yudkowsky][23:00]

is this in hopes of future profit, or because current profits are already in the trillions?

[Christiano][23:01]

largely in hopes of future profit / reinvested AI outputs (that have high market cap), but also revenues are probably in the trillions?

[Yudkowsky][23:02]

this all sure does sound "pretty darn prohibited" on my model, but I'd hope there'd be something earlier than that we could bet on. what does your Prophecy prohibit happening before that sub-prophesied day?

[Christiano][23:02]

To me your model just seems crazy, and you are saying it predicts crazy stuff at the end but no crazy stuff beforehand, so I don't know what's prohibited. Mostly I feel like I'm making positive predictions, of gradually escalating value of AI in lots of different industries

and rapidly increasing investment in AI

I guess your model can be: those things happen, and then one day the AI explodes?

[Yudkowsky][23:03]

the main way you get rapidly increasing investment in AI is if there's some way that AI can produce huge profits without that being effectively bureaucratically prohibited - eg this is where we get huge investments in burning electricity and wasting GPUs on Bitcoin mining.

[Christiano][23:03]

but it seems like you should be predicting e.g. AI quickly jumping to superhuman in lots of domains, and some applications jumping from no value to massive value

I don't understand what you mean by that sentence. Do you think we aren't seeing rapidly increasing investment in AI right now?

or are you talking about increasing investment above some high threshold, or increasing investment at some rate significantly larger than the current rate?

it seems to me like you can pretty seamlessly get up to a few $100B/year of revenue just by redirecting existing tech R&D

[Yudkowsky][23:05]

so I can imagine scenarios where some version of GPT-5 cloned outside OpenAI is able to talk hundreds of millions of mentally susceptible people into giving away lots of their income, and many regulatory regimes are unable to prohibit this effectively. then AI could be making a profit of trillions and then people would invest corresponding amounts in making new anime waifus trained in erotic hypnosis and findom.

this, to be clear, is not my mainline prediction.

but my sense is that our current economy is mostly not about the 1-day period to design new vaccines, it is about the multi-year period to be allowed to sell the vaccines.

the exceptions to this, like Bitcoin managing to say "fuck off" to the regulators for long enough, are where Bitcoin scales to a trillion dollars and gets massive amounts of electricity and GPU burned on it.

so we can imagine something like this for AI, which earns a trillion dollars, and sparks a trillion-dollar competition.

but my sense is that your model does not work like this.

my sense is that your model is about general improvements across the whole economy.

[Christiano][23:08]

I think bitcoin is small even compared to current AI...

[Yudkowsky][23:08]

my sense is that we've already built an economy which rejects improvement based on small amounts of cleverness, and only rewards amounts of cleverness large enough to bypass bureaucratic structures. it's not enough to figure out a version of e-gold that's 10% better. e-gold is already illegal. you have to figure out Bitcoin.

what are you going to build? better airplanes? airplane costs are mainly regulatory costs. better medtech? mainly regulatory costs. better houses? building houses is illegal anyways.

where is the room for the general AI revolution, short of the AI being literally revolutionary enough to overthrow governments?

[Christiano][23:10]

factories, solar panels, robots, semiconductors, mining equipment, power lines, and "factories" just happens to be one word for a thousand different things

I think it's reasonable to think some jurisdictions won't be willing to build things but it's kind of improbable as a prediction for the whole world. That's a possible source of shorter-term predictions?

also computers and the 100 other things that go in datacenters

[Yudkowsky][23:12]

The whole developed world rejects open borders. The regulatory regimes all make the same mistakes with an almost perfect precision, the kind of coordination that human beings could never dream of when trying to coordinate on purpose.

if the world lasts until 2035, I could perhaps see deepnets becoming as ubiquitous as computers were in... 1995? 2005? would that fulfill the terms of the Prophecy? I think it doesn't; I think your Prophecy requires that early AGI tech be that ubiquitous so that AGI tech will have trillions invested in it.

[Christiano][23:13]

what is AGI tech?

the point is that there aren't important drivers that you can easily improve a lot

[Yudkowsky][23:14]

for purposes of the Prophecy, AGI tech is that which, scaled far enough, ends the world; this must have trillions invested in it, so that the trajectory up to it cannot look like pulling an AlphaGo. no?

[Christiano][23:14]

so it's relevant if you are imagining some piece of the technology which is helpful for general problem solving or something but somehow not helpful for all of the things people are doing with ML, to me that seems unlikely since it's all the same stuff

surely AGI tech should at least include the use of AI to automate AI R&D

regardless of what you arbitrarily decree as "ends the world if scaled up"

[Yudkowsky][23:15]

only if that's the path that leads to destroying the world?

if it isn't on that path, who cares Prophecy-wise?

[Christiano][23:15]

also I want to emphasize that "pull an AlphaGo" is what happens when you move from SOTA being set by an individual programmer to a large lab, you don't need to be investing trillions to avoid that

and that the jump is still more like a few years

but the prophecy does involve trillions, and my view gets more like your view if people are jumping from $100B of R&D ever to $1T in a single year

5.8. TPUs and GPUs, and automating AI R&D

[Yudkowsky][23:17]

I'm also wondering a little why the emphasis on "trillions". it seems to me that the terms of your Prophecy should be fulfillable by AGI tech being merely as ubiquitous as modern computers, so that many competing companies invest mere hundreds of billions in the equivalent of hardware plants. it is legitimately hard to get a chip with 50% better transistors ahead of TSMC.

[Christiano][23:17]

yes, if you are investing hundreds of billions then it is hard to pull ahead (though could still happen)

(since the upside is so much larger here, no one cares that much about getting ahead of TSMC since the payoff is tiny in the scheme of the amounts we are discussing)

[Yudkowsky][23:18]

which, like, doesn't prevent Google from tossing out TPUs that are pretty significant jumps on GPUs, and if there's a specialized application of AGI-ish tech that is especially key, you can have everything behave smoothly and still get a jump that way.

[Christiano][23:18]

I think TPUs are basically the same as GPUs

probably a bit worse

(but GPUs are sold at a 10x markup since that's the size of nvidia's lead)

[Yudkowsky][23:19]

noted; I'm not enough of an expert to directly contradict that statement about TPUs from my own knowledge.

[Christiano][23:19]

(though I think TPUs are nevertheless leased at a slightly higher price than GPUs)

[Yudkowsky][23:19]

how does Nvidia maintain that lead and 10x markup? that sounds like a pretty un-Paul-ish state of affairs given Bitcoin prices never mind AI investments.

[Christiano][23:20]

nvidia's lead isn't worth that much because historically they didn't sell many gpus

(especially for non-gaming applications)

their R&D investment is relatively large compared to the $ on the table

my guess is that their lead doesn't stick, as evidenced by e.g. Google very quickly catching up

[Yudkowsky][23:21]

parenthetically, does this mean - and I don't necessarily predict otherwise - that you predict a drop in Nvidia's stock and a drop in GPU prices in the next couple of years?

[Christiano][23:21]

nvidia's stock may do OK from riding general AI boom, but I do predict a relative fall in nvidia compared to other AI-exposed companies

(though I also predicted google to more aggressively try to compete with nvidia for the ML market and think I was just wrong about that, though I don't really know any details of the area)

I do expect the cost of compute to fall over the coming years as nvidia's markup gets eroded

to be partially offset by increases in the cost of the underlying silicon (though that's still bad news for nvidia)

[Yudkowsky][23:23]

I parenthetically note that I think the Wise Reader should be justly impressed by predictions that come true about relative stock price changes, even if Eliezer has not explicitly contradicted those predictions before they come true. there are bets you can win without my having to bet against you.

[Christiano][23:23]

you are welcome to counterpredict, but no saying in retrospect that reality proved you right if you don't 🙂

otherwise it's just me vs the market

[Yudkowsky][23:24]

I don't feel like I have a counterprediction here, but I think the Wise Reader should be impressed if you win vs. the market.

however, this does require you to name in advance a few "other AI-exposed companies".

[Christiano][23:25]

Note that I made the same bet over the last year---I make a large AI bet but mostly moved my nvidia allocation to semiconductor companies. The semiconductor part of the portfolio is up 50% while nvidia is up 70%, so I lost that one. But that just means I like the bet even more next year.

happy to use nvidia vs tsmc

[Yudkowsky][23:25]

there's a lot of noise in a 2-stock prediction.

[Christiano][23:25]

I mean, it's a 1-stock prediction about nvidia

[Yudkowsky][23:26]

but your funeral or triumphal!

[Christiano][23:26]

indeed 🙂

anyway

I expect all of the $ amounts to be much bigger in the future

[Yudkowsky][23:26]

yeah, but using just TSMC for the opposition exposes you to I dunno Chinese invasion of Taiwan

[Christiano][23:26]

yes

also TSMC is not that AI-exposed

I think the main prediction is: eventual move away from GPUs, nvidia can't maintain that markup

[Yudkowsky][23:27]

"Nvidia can't maintain that markup" sounds testable, but is less of a win against the market than predicting a relative stock price shift. (Over what timespan? Just the next year sounds quite fast for that kind of prediction.)

[Christiano][23:27]

regarding your original claim: if you think that it's plausible that AI will be doing all of the AI R&D, and that will be accelerating continuously from 12, 6, 3 month "doubling times," but that we'll see a discontinuous change in the "path to doom," then that would be harder to generate predictions about

yes, it's hard to translate most predictions about the world into predictions about the stock market

[Yudkowsky][23:28]

this again sounds like it's not written in Eliezer-language.

what does it mean for "AI will be doing all of the AI R&D"? that sounds to me like something that happens after the end of the world, hence doesn't happen.

[Christiano][23:29]

that's good, that's what I thought

[Yudkowsky][23:29]

I don't necessarily want to sound very definite about that in advance of understanding what it means

[Christiano][23:29]

I'm saying that I think AI will be automating AI R&D gradually, before the end of the world

yeah, I agree that if you reject the construct of "how fast the AI community makes progress" then it's hard to talk about what it means to automate "progress"

and that may be hard to make headway on

though for cases like AlphaGo (which started that whole digression) it seems easy enough to talk about elo gain per year

maybe the hard part is aggregating across tasks into a measure you actually care about?

[Yudkowsky][23:30]

up to a point, but yeah. (like, if we're taking Elo high above human levels and restricting our measurements to a very small range of frontier AIs, I quietly wonder if the measurement is still measuring quite the same thing with quite the same robustness.)

[Christiano][23:31]

I agree that elo measurement is extremely problematic in that regime

5.9. Smooth exponentials vs. jumps in income

[Yudkowsky][23:31]

so in your worldview there's this big emphasis on things that must have been deployed and adopted widely to the point of already having huge impacts

and in my worldview there's nothing very surprising about people with a weird powerful prototype that wasn't used to automate huge sections of AI R&D because the previous versions of the tech weren't useful for that or bigcorps didn't adopt it.

[Christiano][23:32]

I mean, Google is already 1% of the US economy and in this scenario it and its peers are more like 10-20%? So wide adoption doesn't have to mean that many people. Though I also do predict much wider adoption than you so happy to go there if it's happy for predictions.

I don't really buy the "weird powerful prototype"

[Yudkowsky][23:33]

yes. I noticed.

you would seem, indeed, to be offering large quantities of it for short sale.

[Christiano][23:33]

and it feels like the thing you are talking about ought to have some precedent of some kind, of weird powerful prototypes that jump straight from "does nothing" to "does something impactful"

like if I predict that AI will be useful in a bunch of domains, and will get there by small steps, you should either predict that won't happen, or else also predict that there will be some domains with weird prototypes jumping to giant impact?

[Yudkowsky][23:34]

like an electrical device that goes from "not working at all" to "actually working" as soon as you screw in the attachments for the electrical plug.

[Christiano][23:34]

(clearly takes more work to operationalize)

I'm not sure I understand that sentence, hopefully it's clear enough why I expect those discontinuities?

[Yudkowsky][23:34]

though, no, that's a facile bad analogy.

a better analogy would be an AI system that only starts working after somebody tells you about batch normalization or LAMB learning rate or whatever.

[Christiano][23:36]

sure, which I think will happen all the time for individual AI projects but not for sota

because the projects at sota have picked the low hanging fruit, it's not easy to get giant wins

[Yudkowsky][23:36]

like if I predict that AI will be useful in a bunch of domains, and will get there by small steps, you should either predict that won't happen, or else also predict that there will be some domains with weird prototypes jumping to giant impact?

in the latter case, has this Eliezer-Prophecy already had its terms fulfilled by AlphaFold 2, or do you say nay because AlphaFold 2 hasn't doubled GDP?

[Christiano][23:37]

(you can also get giant wins by a new competitor coming up at a faster rate of progress, and then we have more dependence on whether people do it when it's a big leap forward or slightly worse than the predecessor, and I'm betting on the latter)

I have no idea what AlphaFold 2 is good for, or the size of the community working on it, my guess would be that its value is pretty small

we can try to quantify

like, I get surprised when $X of R&D gets you something whose value is much larger than $X

I'm not surprised at all if $X of R&D gets you <<$X, or even like 10*$X in a given case that was selected for working well

hopefully it's clear enough why that's the kind of thing a naive person would predict

[Yudkowsky][23:38]

so a thing which Eliezer's Prophecy does not mandate per se, but sure does permit, and is on the mainline especially for nearer timelines, is that the world-ending prototype had no prior prototype containing 90% of the technology which earned a trillion dollars.

a lot of Paul's Prophecy seems to be about forbidding this.

is that a fair way to describe your own Prophecy?

[Christiano][23:39]

I don't have a strong view about "containing 90% of the technology"

the main view is that whatever the "world ending prototype" does, there were earlier systems that could do practically the same thing

if the world ending prototype does something that lets you go foom in a day, there was a system years earlier that could foom in a month, so that would have been the one to foom

[Yudkowsky][23:41]

but, like, the world-ending thing, according to the Prophecy, must be squarely in the middle of a class of technologies which are in the midst of earning trillions of dollars and having trillions of dollars invested in them. it's not enough for the Worldender to be definitionally somewhere in that class, because then it could be on a weird outskirt of the class, and somebody could invest a billion dollars in that weird outskirt before anybody else had invested a hundred million, which is forbidden by the Prophecy. so the Worldender has got to be right in the middle, a plain and obvious example of the tech that's already earning trillions of dollars. ...y/n?

[Christiano][23:42]

I agree with that as a prediction for some operationalization of "a plain and obvious example," but I think we could make it more precise / it doesn't feel like it depends on the fuzziness of that

I think that if the world can end out of nowhere like that, you should also be getting $100B/year products out of nowhere like that, but I guess you think not because of bureaucracy

like, to me it seems like our views stake out predictions about codex, where I'm predicting its value will be modest relative to R&D, and the value will basically improve from there with a nice experience curve, maybe something like ramping up quickly to some starting point <$10M/year and then doubling every year thereafter, whereas I feel like you are saying more like "who knows, could be anything" and so should be surprised each time the boring thing happens

[Yudkowsky][23:45]

the concrete example I give is that the World-Ending Company will be able to use the same tech to build a true self-driving car, which would in the natural course of things be approved for sale a few years later after the world had ended.

[Christiano][23:46]

but self-driving cars seem very likely to already be broadly deployed, and so the relevant question is really whether their technical improvements can also be deployed to those cars?

(or else maybe that's another prediction we disagree about)

[Yudkowsky][23:47]

I feel like I would indeed not have the right to feel very surprised if Codex technology stagnated for the next 5 years, nor if it took a massive leap in 2 years and got ubiquitously adopted by lots of programmers.

yes, I think that's a general timeline difference there

re: self-driving cars

I might be talkable into a bet where you took "Codex tech will develop like this" and I took the side "literally anything else but that"

[Christiano][23:48]

I think it would have to be over/under, I doubt I'm more surprised than you by something failing to be economically valuable, I'm surprised by big jumps in value

seems like it will be tough to work

[Yudkowsky][23:49]

well, if I was betting on something taking a big jump in income, I sure would bet on something in a relatively unregulated industry like Codex or anime waifus.

but that's assuming I made the bet at all, which is a hard sell when the bet is about the Future, which is notoriously hard to predict.

[Christiano][23:50]

I guess my strongest take is: if you want to pull the thing where you say that future developments proved you right and took unreasonable people like me by surprise, you've got to be able to say something in advance about what you expect to happen

[Yudkowsky][23:51]

so what if neither of us are surprised if Codex stagnates for 5 years, you win if Codex shows a smooth exponential in income, and I win if the income looks... jumpier? how would we quantify that?

[Christiano][23:52]

codex also does seem a bit unfair to you in that it may have to be adopted by lots of programmers which could slow things down a lot even if capabilities are pretty jumpy

(though I think in fact usefulness and not merely profit will basically just go up smoothly, with step sizes determined by arbitrary decisions about when to release something)

[Yudkowsky][23:53]

I'd also be concerned about unfairness to me in that earnable income is not the same as the gains from trade. If there's more than 1 competitor in the industry, their earnings from Codex may be much less than the value produced, and this may not change much with improvements in the tech.

5.10. Late-stage predictions

[Christiano][23:53]

I think my main update from this conversation is that you don't really predict someone to come out of nowhere with a model that can earn a lot of $, even if they could come out of nowhere with a model that could end the world, because of regulatory bottlenecks and nimbyism and general sluggishness and unwillingness to do things

does that seem right?

[Yudkowsky][23:55]

Well, and also because the World-ender is "the first thing that scaled with compute" and/or "the first thing that ate the real core of generality" and/or "the first thing that went over neutron multiplication factor 1".

[Christiano][23:55]

and so that cuts out a lot of the easily-specified empirical divergences, since "worth a lot of $" was the only general way to assess "big deal that people care about" and avoiding disputes like "but Zen was mostly developed by a single programmer, it's not like intense competition"

yeah, that's the real disagreement it seems like we'd want to talk about

but it just doesn't seem to lead to many prediction differences in advance?

I totally don't buy any of those models, I think they are bonkers

would love to bet on that

[Yudkowsky][23:56]

Prolly but I think the from-my-perspective-weird talk about GDP is probably concealing some kind of important crux, because caring about GDP still feels pretty alien to me.

[Christiano][23:56]

I feel like getting up to massive economic impacts without seeing "the real core of generality" seems like it should also be surprising on your view

like if it's 10 years from now and AI is a pretty big deal but no crazy AGI, isn't that surprising?

[Yudkowsky][23:57]

Mildly but not too surprising, I would imagine that people had built a bunch of neat stuff with gradient descent in realms where you could get a long way on self-play or massively collectible datasets.

[Christiano][23:58]

I'm fine with the crux being something that doesn't lead to any empirical disagreements, but in that case I just don't think you should claim credit for the worldview making great predictions.

(or the countervailing worldview making bad predictions)

[Yudkowsky][23:59]

stuff that we could see then: self-driving cars (10 years is enough for regulatory approval in many countries), super Codex, GPT-6 powered anime waifus being an increasingly loud source of (arguably justified) moral panic and a hundred-billion-dollar industry

[Christiano][23:59]

another option is "10% ~~GDP~~ GWP growth in a year, before doom"

I think that's very likely, though might be too late to be helpful

[Yudkowsky][0:01] (next day, Sep. 15)

see, that seems genuinely hard unless somebody gets GPT-4 far head of any political opposition - I guess all the competent AGI groups lean solidly liberal at the moment? - and uses it to fake massive highly-persuasive sentiment on Twitter for housing liberalization.

[Christiano][0:01] (next day, Sep. 15)

so seems like a bet?

but you don't get to win until doom 🙁

[Yudkowsky][0:02] (next day, Sep. 15)

I mean, as written, I'd want to avoid cases like 10% growth on paper while recovering from a pandemic that produced 0% growth the previous year.

[Christiano][0:02] (next day, Sep. 15)

yeah

[Yudkowsky][0:04] (next day, Sep. 15)

I'd want to check the current rate (5% iirc) and what the variance on it was, 10% is a little low for surety (though my sense is that it's a pretty darn smooth graph that's hard to perturb)

if we got 10% in a way that was clearly about AI tech becoming that ubiquitous, I'd feel relatively good about nodding along and saying, "Yes, that is like unto the beginning of Paul's Prophecy" not least because the timelines had been that long at all.

[Christiano][0:05] (next day, Sep. 15)

like 3-4%/year right now

random wikipedia number is 5.5% in 2006-2007, 3-4% since 2010

4% 1995-2000

[Yudkowsky][0:06] (next day, Sep. 15)

I don't want to sound obstinate here. My model does not forbid that we dwiddle around on the AGI side while gradient descent tech gets its fingers into enough separate weakly-generalizing pies to produce 10% GDP growth, but I'm happy to say that this sounds much more like Paul's Prophecy is coming true.

[Christiano][0:07] (next day, Sep. 15)

ok, we should formalize at some point, but also need the procedure for you getting credit given that it can't resolve in your favor until the end of days

[Yudkowsky][0:07] (next day, Sep. 15)

Is there something that sounds to you like Eliezer's Prophecy which we can observe before the end of the world?

[Christiano][0:07] (next day, Sep. 15)

when you will already have all the epistemic credit you need

not on the "simple core of generality" stuff since that apparently immediately implies end of world

maybe something about ML running into obstacles en route to human level performance?

or about some other kind of discontinuous jump even in a case where people care, though there seem to be a few reasons you don't expect many of those

[Yudkowsky][0:08] (next day, Sep. 15)

depends on how you define "immediately"? it's not long before the end of the world, but in some sad scenarios there is some tiny utility to you declaring me right 6 months before the end.

[Christiano][0:09] (next day, Sep. 15)

I care a lot about the 6 months before the end personally

though I do think probably everything is more clear by then independent of any bet; but I guess you are more pessimistic about that

[Yudkowsky][0:09] (next day, Sep. 15)

I'm not quite sure what I'd do in them, but I may have worked something out before then, so I care significantly in expectation if not in particular.

I am more pessimistic about other people's ability to notice what reality is screaming in their faces, yes.

[Christiano][0:10] (next day, Sep. 15)

if we were to look at various scaling curves, e.g. of loss vs model size or something, do you expect those to look distinctive as you hit the "real core of generality"?

[Yudkowsky][0:10] (next day, Sep. 15)

let me turn that around: if we add transformers into those graphs, do they jump around in a way you'd find interesting?

[Christiano][0:11] (next day, Sep. 15)

not really

[Yudkowsky][0:11] (next day, Sep. 15)

is that because the empirical graphs don't jump, or because you don't think the jumps say much?

[Christiano][0:11] (next day, Sep. 15)

but not many good graphs to look at (I just have one in mind), so that's partly a prediction about what the exercise would show

I don't think the graphs jump much, and also transformers come before people start evaluating on tasks where they help a lot

[Yudkowsky][0:12] (next day, Sep. 15)

It would not terribly contradict the terms of my Prophecy if the World-ending tech began by not producing a big jump on existing tasks, but generalizing to some currently not-so-popular tasks where it scaled much faster.

[Christiano][0:13] (next day, Sep. 15)

eh, they help significantly on contemporary tasks, but it's just not a huge jump relative to continuing to scale up model sizes

or other ongoing improvements in architecture

anyway, should try to figure out something, and good not to finalize a bet until you have some way to at least come out ahead, but I should sleep now

[Yudkowsky][0:14] (next day, Sep. 15)

yeah, same.

Thing I want to note out loud lest I forget ere I sleep: I think the real world is full of tons and tons of technologies being developed as unprecedented prototypes in the midst of big fields, because the key thing to invest in wasn’t the competitively explored center. Wright Flyer vs all expenditures on Traveling Machine R&D. First atomic pile and bomb vs all Military R&D.

This is one reason why Paul’s Prophecy seems fragile to me. You could have the preliminaries come true as far as there being a trillion bucks in what looks like AI R&D, and then the WorldEnder is a weird prototype off to one side of that. saying “But what about the rest of that AI R&D?” is no more a devastating retort to reality than looking at AlphaGo and saying “But weren’t other companies investing billions in Better Software?” Yeah but it was a big playing field with lots of different kinds of Better Software and no other medium-sized team of 15 people with corporate TPU backing was trying to build a system just like AlphaGo, even though multiple small outfits were trying to build prestige-earning gameplayers. Tech advancements very very often occur in places where investment wasn't dense enough to guarantee overlap.

6. Follow-ups on "Takeoff Speeds"

6.1. Eliezer Yudkowsky's commentary

[Yudkowsky][17:25] (Sep. 15)

Further comment that occurred to me on "takeoff speeds" if I've better understood the main thesis now: its hypotheses seem to include a perfectly anti-Thielian setup for AGI.

Thiel has a running thesis about how part of the story behind the Great Stagnation and the decline in innovation that's about atoms rather than bits - the story behind "we were promised flying cars and got 140 characters", to cite the classic Thielian quote - is that people stopped believing in "secrets".

Thiel suggests that you have to believe there are knowable things that aren't yet widely known - not just things that everybody already knows, plus mysteries that nobody will ever know - in order to be motivated to go out and innovate. Culture in developed countries shifted to label this kind of thinking rude - or rather, even ruder, even less tolerated than it had been decades before - so innovation decreased as a result.

The central hypothesis of "takeoff speeds" is that at the time of serious AGI being developed, it is perfectly anti-Thielian in that it is devoid of secrets in that sense. It is not permissible (on this viewpoint) for it to be the case that there is a lot of AI investment into AI that is directed not quite at the key path leading to AGI, such that somebody could spend $1B on compute for the key path leading to AGI before anybody else had spent $100M on that. There cannot exist any secret like that. The path to AGI will be known; everyone, or a wide variety of powerful actors, will know how profitable that path will be; the surrounding industry will be capable of acting on this knowledge, and will have actually been acting on it as early as possible; multiple actors are already investing in every tech path that would in fact be profitable (and is known to any human being at all), as soon as that R&D opportunity becomes available.

And I'm not saying this is an inconsistent world to describe! I've written science fiction set in this world. I called it "dath ilan". It's a hypothetical world that is actually full of smart people in economic equilibrium. If anything like Covid-19 appears, for example, the governments and public-good philanthropists there have already set up prediction markets (which are not illegal, needless to say); and of course there are mRNA vaccine factories already built and ready to go, because somebody already calculated the profits from fast vaccines would be very high in case of a pandemic (no artificial price ceilings in this world, of course); so as soon as the prediction markets started calling the coming pandemic conditional on no vaccine, the mRNA vaccine factories were already spinning up.

This world, however, is not Earth.

On Earth, major chunks of technological progress quite often occur outside of a social context where everyone knew and agreed in advance on which designs would yield how much expected profit and many overlapping actors competed to invest in the most actually-promising paths simultaneously.

And that is why you can read Inadequate Equilibria, and then read this essay on takeoff speeds, and go, "Oh, yes, I recognize this; it's written inside the Modesty worldview; in particular, the imagination of an adequate world in which there is a perfect absence of Thielian secrets or unshared knowable knowledge about fruitful development pathways. This is the same world that already had mRNA vaccines ready to spin up on day one of the Covid-19 pandemic, because markets had correctly forecasted their option value and investors had acted on that forecast unimpeded. Sure would be an interesting place to live! But we don't live there."

Could we perhaps end up in a world where the path to AGI is in fact not a Thielian secret, because in fact the first accessible path to AGI happens to lie along a tech pathway that already delivered large profits to previous investors who summed a lot of small innovations, a la experience with chipmaking, such that there were no large innovations just lots and lots of small innovations that yield 10% improvement annually on various tech benchmarks?

I think that even in this case we will get weird, discontinuous, and fatal behaviors, and I could maybe talk about that when discussion resumes. But it is not ruled out to me that the first accessible pathway to AGI could happen to lie in the further direction of some road that was already well-traveled, already yielded much profit to now-famous tycoons back when its first steps were Thielian secrets, and hence is now replete with dozens of competing chasers for the gold rush.

It's even imaginable to me, though a bit less so, that the first path traversed to real actual pivotal/powerful/lethal AGI, happens to lie literally actually squarely in the central direction of the gold rush. It sounds a little less like the tech history I know, which is usually about how someone needed to swerve a bit and the popular gold-rush forecasts weren't quite right, but maybe that is just a selective focus of history on the more interesting cases.

Though I remark that - even supposing that getting to big AGI is literally as straightforward and yet as difficult as falling down a semiconductor manufacturing roadmap (as otherwise the biggest actor to first see the obvious direction could just rush down the whole road) - well, TSMC does have a bit of an unshared advantage right now, if I recall correctly. And Intel had a bit of an advantage before that. So that happens even when there's competitors competing to invest billions.

But we can imagine that doesn't happen either, because instead of needing to build a whole huge manufacturing plant, there's just lots and lots of little innovations adding up to every key AGI threshold, which lots of actors are investing $10 million in at a time, and everybody knows which direction to move in to get to more serious AGI and they're right in this shared forecast.

I am willing to entertain discussing this world and the sequelae there - I do think everybody still dies in this case - but I would not have this particular premise thrust upon us as a default, through a not-explicitly-spoken pressure against being so immodest and inegalitarian as to suppose that any Thielian knowable-secret will exist, or that anybody in the future gets as far ahead of others as today's TSMC or today's Deepmind.

We are, in imagining this world, imagining a world in which AI research has become drastically unlike today's AI research in a direction drastically different from the history of many other technologies.

It's not literally unprecedented, but it's also not a default environment for big moments in tech progress; it's narrowly precedented for particular industries with high competition and steady benchmark progress driven by huge investments into a sum of many tiny innovations.

So I can entertain the scenario. But if you want to claim that the social situation around AGI will drastically change in this way you foresee - not just that it could change in that direction, if somebody makes a big splash that causes everyone else to reevaluate their previous opinions and arrive at yours, but that this social change will occur and you know this now - and that the prerequisite tech path to AGI is known to you, and forces an investment situation that looks like the semiconductor industry - then your "What do you think you know and how do you think you know it?" has some significant explaining to do.

Of course, I do appreciate that such a thing could be knowable, and yet not known to me. I'm not so silly as to disbelieve in secrets like that. They're all over the actual history of technological progress on our actual Earth.

AI TakeoffAI TimelinesForecasting & PredictionGeneral intelligenceInside/Outside ViewAI

Frontpage

Ngo and Yudkowsky on AI capability gains

46 comments130 karma

Soares, Tallinn, and Yudkowsky discuss AGI cognition

24 comments121 karma

Mentioned in

61larger language models may disappoint you [or, an eternally unfinished draft]

72Critical review of Christiano's disagreements with Yudkowsky

42Christiano, Cotra, and Yudkowsky on AI progress

41Conversation on technology forecasting and gradualism

33Interpreting Yudkowsky on Deep vs Shallow Knowledge

Load More (5/10)

Yudkowsky and Christiano discuss "Takeoff Speeds"

25Matthew "Vaniver" Gray

2Matthew Barnett

12Matthew "Vaniver" Gray

New Comment

97 comments, sorted by

top scoring

Click to highlight new comments since: Today at 10:23 PM

Some comments are truncated due to high volume. (⌘F to expand all)Change truncation settings

[-]Paul Christiano3y530

I stand ready to bet with Eliezer on any topic related to AI, science, or technology. I'm happy for him to pick but I suggest some types of forecast below.

If Eliezer’s predictions were roughly as good as mine (in cases where we disagree), then I would update towards taking his views more seriously. Right now it looks to me like his view makes bad predictions about lots of everyday events.

It’s possible that we won’t be able to find cases where we disagree, and perhaps that Eliezer’s model totally agrees with mine until we develop AGI. But I think that’s unlikely for a few reasons:

I constantly see observations that seem like evidence for Eliezer’s views (e.g. any time I see an ML paper with a surprisingly large effect size, or ML labs failing to make investments in scaling, or people being surprisingly unreasonable), it’s just that I see significantly more evidence against his views. The point of making bets in advance is that it can correct for my hindsight bias or for my inability to simulate “what Eliezer’s view would say about this.” Eliezer could also say that actually all of the observations I listed aren't evidence for his view, which would be interesting to me.
Eliezer frequen

... (read more)

[-]Eliezer Yudkowsky3y*200

I do wish to note that we spent a fair amount of time on Discord trying to nail down what earlier points we might disagree on, before the world started to end, and these Discord logs should be going up later.

From my perspective, the basic problem is that Eliezer's story looks a lot like "business as usual until the world starts to end sharply", and Paul's story looks like "things continue smoothly until their smooth growth ends the world smoothly", and both of us have ever heard of superforecasting and both of us are liable to predict near-term initial segments by extrapolating straight lines while those are available. Another basic problem, as I'd see it, is that we tend to tell stories about very different subject matters - I care a lot less than Paul about the quantitative monetary amount invested into Intel, to the point of not really trying to develop expertise about that.

I claim that I came off better than Robin Hanson in our FOOM debate compared to the way that history went. I'd claim that my early judgments of the probable importance of AGI, at all, stood up generally better than early non-Yudkowskian EA talking about that. Other people I've noticed ever m... (read more)

[-]Paul Christiano3y130

From my perspective, the basic problem is that Eliezer's story looks a lot like "business as usual until the world starts to end sharply", and Paul's story looks like "things continue smoothly until their smooth growth ends the world smoothly", and both of have ever heard of superforecasting and both of us are liable to predict near-term initial segments by extrapolating straight lines while those are available.

I agree that it's plausible that we both make the same predictions about the near future. I think we probably don't, and there are plenty of disagreements about all kinds of stuff. But if in fact we agree, then in 5 years you shouldn't say "and see how much the world looked like I said?"

It feels to me like it goes: you say AGI will look crazy. Then I say that sounds unlike the world of today. Then you say "no, the world actually always looks discontinuous in the ways I'm predicting and your model is constantly surprised by real stuff that happens, e.g. see transformers or AlphaGo" and then I say "OK, let's bet about literally anything at all, you pick."

I think it's pretty likely that we actually do disagree about how much the world of today is boring and continuo... (read more)

[-]Eliezer Yudkowsky3y120

I feel a bit confused about where you think we meta-disagree here, meta-policy-wise. If you have a thesis about the sort of things I'm liable to disagree with you about, because you think you're more familiar with the facts on the ground, can't you write up Paul's View of the Next Five Years and then if I disagree with it better yet, but if not, you still get to be right and collect Bayes points for the Next Five Years?

I mean, it feels to me like this should be a case similar to where, for example, I think I know more about macroeconomics than your typical EA; so if I wanted to expend the time/stamina points, I could say a bunch of things I consider obvious and that contradict hot takes on Twitter and many EAs would go "whoa wait really" and then I could collect Bayes points later and have performed a public service, even if nobody showed up to disagree with me about that. (The reason I don't actually do this... is that I tried; I keep trying to write a book about basic macro, only it's the correct version explained correctly, and have a bunch of isolated chapters and unfinished drafts.) I'm also trying to write up my version of The Next Five Years assuming the wo... (read more)

[-]Paul Christiano3y*150

I think you think there's a particular thing I said which implies that the ball should be in my court to already know a topic where I make a different prediction from what you do.

I've said I'm happy to bet about anything, and listed some particular questions I'd bet about where I expect you to be wronger. If you had issued the same challenge to me, I would have picked one of the things and we would have already made some bets. So that's why I feel like the ball is in your court to say what things you're willing to make forecasts about.

That said, I don't know if making bets is at all a good use of time. I'm inclined to do it because I feel like your view really should be making different predictions (and I feel like you are participating in good faith and in fact would end up making different predictions). And I think it's probably more promising than trying to hash out the arguments since at this point I feel like I mostly know your position and it's incredibly slow going. But it seems very plausible that the right move is just to agree to disagree and not spend time on this. In that case it was particularly bad of me to try to claim the epistemic high ground. I can't really defend... (read more)

[-]Eliezer Yudkowsky3y*91

I think you are underconfident about the fact that almost all AI profits will come from areas that had almost-as-much profit in recent years. So we could bet about where AI profits are in the near term, or try to generalize this.

I wouldn't be especially surprised by waifutechnology or machine translation jumping to newly accessible domains (the thing I care about and you shrug about (until the world ends)), but is that likely to exhibit a visible economic discontinuity in profits (which you care about and I shrug about (until the world ends))? There's apparently already mass-scale deployment of waifutech in China to forlorn male teenagers, so maybe you'll say the profits were already precedented. Google offers machine translation now, even though they don't make much obvious measurable profit on that, but maybe you'll want to say that however much Google spends on that, they must rationally anticipate at least that much added revenue. Or perhaps you want to say that "almost all AI profits" will come from robotics over the same period. Or maybe I misunderstand your viewpoint, and if you said something concrete about the stuff you care about, I would manage to disagree with that; or maybe you think that waifutech suddenly getting much more charming with the next generation of text transformers is something you already know enough to rule out; or maybe you think that 2024's waifutech should definitely be able to do some observable surface-level thing it can't do now.

[-]Paul Christiano3y60

I'd be happy to disagree about romantic chatbots or machine translation. I'd have to look into it more to get a detailed sense in either, but I can guess. I'm not sure what "wouldn't be especially surprised" means, I think to actually get disagreements we need way more resolution than that so one question is whether you are willing to play ball (since presumably you'd also have to looking into to get a more detailed sense). Maybe we could save labor if people would point out the empirical facts we're missing and we can revise in light of that, but we'd still need more resolution. (That said: what's up for grabs here are predictions about the future, not present.)

I'd guess that machine translation is currently something like $100M/year in value, and will scale up more like 2x/year than 10x/year as DL improves (e.g. most of the total log increase will be in years with <3x increase rather than >3x increase, and 3 is like the 60th percentile of the number for which that inequality is tight).

I'd guess that increasing deployment of romantic chatbots will end up with technical change happening first followed by social change second, so the speed of deployment and change will depend ... (read more)

[-]Eliezer Yudkowsky3y60

Thanks for continuing to try on this! Without having spent a lot of labor myself on looking into self-driving cars, I think my sheer impression would be that we'll get $1B/yr waifutech before we get AI freedom-of-the-road; though I do note again that current self-driving tech would be more than sufficient for $10B/yr revenue if people built new cities around the AI tech level, so I worry a bit about some restricted use-case of self-driving tech that is basically possible with current tech finding some less regulated niche worth a trivial $10B/yr. I also remark that I wouldn't be surprised to hear that waifutech is already past $1B/yr in China, but I haven't looked into things there. I don't expect the waifutech to transcend my own standards for mediocrity, but something has to be pretty good before I call it more than mediocre; do you think there's particular things that waifutech won't be able to do?

My model permits large jumps in ML translation adoption; it is much less clear about whether anyone will be able to build a market moat and charge big prices for it. Do you have a similar intuition about # of users increasing gradually, not just revenue increasing gradually?

I think we're still at the level of just drawing images about the future, so that anybody who came back in 5 years could try to figure out who sounded right, at all, rather than assembling a decent portfolio of bets; but I also think that just having images versus no images is a lot of progress.

4Paul Christiano3y

Yes, I think that value added by automated translation will follow a similar pattern. Number of words translated is more sensitive to how you count and random nonsense, as is number of "users" which has even more definitional issues. You can state a prediction about self-driving cars in any way you want. The obvious thing is to talk about programs similar to the existing self-driving taxi pilots (e.g. Waymo One) and ask when they do $X of revenue per year, or when $X of self-driving trucking is done per year. (I don't know what AI freedom-of-the-road means, do you mean something significantly more ambitious than self-driving trucks or taxis?)

4Paul Christiano3y

Man, the problem is that you say the "jump to newly accessible domains" will be the thing that lets you take over the world. So what's up for dispute is the prototype being enough to take over the world rather than years of progress by a giant lab on top of the prototype. It doesn't help if you say "I expect new things to sometimes become possible" if you don't further say something about the impact of the very early versions of the product. If e.g. people were spending $1B/year developing a technology, and then after a while it jumps from 0/year to $1B/year of profit, I'm not that surprised. (Note that machine translation is radically smaller than this, I don't know the numbers.) I do suspect they could have rolled out a crappy version earlier, perhaps by significantly changing their project. But why would they necessarily bother doing that? For me this isn't violating any of the principles that make your stories sound so crazy. The crazy part is someone spending $1B and then generating $100B/year in revenue (much less $100M and then taking over the world). (Note: it is surprising if an industry is spending $10T/year on R&D and then jumps from $1T --> $10T of revenue in one year in a world that isn't yet growing crazily. The surprising depends a lot on the numbers involved, and in particular on how valuable it would have been to deploy a worse version earlier and how hard it is to raise money at different scales.)

[-]Eliezer Yudkowsky3y50

The crazy part is someone spending $1B and then generating $100B/year in revenue (much less $100M and then taking over the world).

Would you say that this is a good description of Suddenly Hominids but you don't expect that to happen again, or that this is a bad description of hominids?

4Paul Christiano3y

It's not a description of hominids at all, no one spent any money on R&D. I think there are analogies where this would be analogous to hominids (which I think are silly, as we discuss in the next part of this transcript). And there are analogies where this is a bad description of hominids (which I prefer).

[-]Adele Lopez3y60

Spending money on R&D is essentially the expenditure of resources in order to explore and optimize over a promising design space, right? That seems like a good description of what natural selection did in the case of hominids. I imagine this still sounds silly to you, but I'm not sure why. My guess is that you think natural selection isn't relevantly similar because it didn't deliberately plan to allocate resources as part of a long bet that it would pay off big.

4Paul Christiano3y

I think natural selection has lots of similarities to R&D, but (i) there are lots of ways of drawing the analogy, (ii) some important features of R&D are missing in evolution, including some really important ones for fast takeoff arguments (like the existence of actors who think ahead). If someones wants to spell out why they think evolution of hominids means takeoff is fast then I'm usually happy to explain why I disagree with their particular analogy. I think this happens in the next discord log between me and Eliezer.

[-]Paul Christiano3y80

My uncharitable read on many of these domains is that you are saying "Sure, I think that Paul might have somewhat better forecasts than me on those questions, but why is that relevant to AGI?"

In that case it seems like the situation is pretty asymmetrical. I'm claiming that my view of AGI is related to beliefs and models that also bear on near-term questions, and I expect to make better forecasts than you in those domains because I have more accurate beliefs/models. If your view of AGI is unrelated to any near-term questions where we disagree, then that seems like an important asymmetry.

[-]Paul Christiano3y*80

Inevitably, you can go back afterwards and claim it wasn't really a surprise in terms of the abstractions that seem so clear and obvious now, but I think it was surprised then

It seems like you are saying that there is some measure that was continuous all along, but that it's not obvious in advance which measure was continuous. That seems to suggest that there are a bunch of plausible measures you could suggest in advance, and lots of interesting action will be from changes that are discontinuous changes on some of those measures. Is that right?

If so, don't we get out a ton of predictions? Like, for every particular line someone thinks might be smooth, the gradualist has a higher probability on it being smooth than you would? So why can't I just start naming some smooth lines (like any of the things I listed in the grandparent) and then we can play ball?

If not, what's your position? Is it that you literally can't think of the possible abstractions that would later make the graph smooth? (This sounds insane to me.)

[-]Paul Christiano3y310

sort of person who gets taken in by Hanson's arguments in 2008 and gets caught flatfooted by AlphaGo and GPT-3 and AlphaFold 2

I find this kind of bluster pretty frustrating and condescending. I also feel like the implication is just wrong---if Eliezer and I disagree, I'd guess it's because he's worse at predicting ML progress. To me GPT-3 feels much (much) closer to my mainline than to Eliezer's, and AlphaGo is very unsurprising. But it's hard to say who was actually "caught flatfooted" unless we are willing to state some of these predictions in advance.

I got pulled into this interaction because I wanted to get Eliezer to make some real predictions, on the record, so that we could have a better version of this discussion in 5 years rather than continuing to both say "yeah, in hindsight this looks like evidence for my view." I apologize if my tone (both in that discussion and in this comment) is a bit frustrated.

It currently feels from the inside like I'm holding the epistemic high ground on this point, though I expect Eliezer disagrees strongly:

I'm willing to bet on anything Eliezer wants, or to propose my own questions if Eliezer is willing in principle to make forecasts. I expect

... (read more)

[-]Eliezer Yudkowsky3y100

I wish to acknowledge this frustration, and state generally that I think Paul Christiano occupies a distinct and more clueful class than a lot of, like, early EAs who mm-hmmmed along with Robin Hanson on AI - I wouldn't put, eg, Dario Amodei in that class either, though we disagree about other things.

But again, Paul, it's not enough to say that you weren't surprised by GPT-2/3 in retrospect, it kinda is important to say it in advance, ideally where other people can see? Dario picks up some credit for GPT-2/3 because he clearly called it in advance. You don't need to find exact disagreements with me to start going on the record as a forecaster, if you think the course of the future is generally narrower than my own guesses - if you think that trends stay on course, where I shrug and say that they might stay on course or break. (Except that of course in hindsight somebody will always be able to draw a straight-line graph, once they know which graph to draw, so my statement "it might stay on trend or maybe break" applies only to graphs extrapolating into what is currently the future.)

[-]Paul Christiano3y160

Suppose your view is "crazy stuff happens all the time" and my view is "crazy stuff happens rarely." (Of course "crazy" is my word, to you it's just normal stuff.) Then what am I supposed to do, in your game?

More broadly: if you aren't making bold predictions about the future, why do you think that other people will? (My predictions all feel boring to me.) And if you do have bold predictions, can we talk about some of them instead?

It seems to me like I want you to say "well I think 20% chance something crazy happens here" and I say "nah, that's more like 5%" and then we batch up 5 of those and when none of them happen I get a bayes point.

I could just give my forecast. But then if I observe that 2/20 of them happen, how exactly does that help me in figuring out whether I should be paying more attention to your views (or help you snap out of it)?

I can list some particular past bets and future forecasts, but it's really unclear what to do with them without quantitative numbers or a point of comparison.

Like you I've predicted that AI is undervalued and will grow in importance, although I think I made a much more specific prediction that investment in AI would go up a lot in the short t... (read more)

[-]Eliezer Yudkowsky3y160

I predict that people will explicitly collect much larger datasets of human behavior as the economic stakes rise. This is in contrast to e.g. theorem-proving working well, although I think that theorem-proving may end up being an important bellwether because it allows you to assess the capabilities of large models without multi-billion-dollar investments in training infrastructure.

Well, it sounds like I might be more bullish than you on theorem-proving, possibly. Not on it being useful or profitable, but in terms of underlying technology making progress on non-profitable amazing demo feats, maybe I'm more bullish on theorem-proving than you are? Is there anything you think it shouldn't be able to do in the next 5 years?

[-]Paul Christiano3y90

I'm going to make predictions by drawing straight-ish lines through metrics like the ones in the gpt-f paper. Big unknowns are then (i) how many orders of magnitude of "low-hanging fruit" are there before theorem-proving even catches up to the rest of NLP? (ii) how hard their benchmarks are compared to other tasks we care about. On (i) my guess is maybe 2? On (ii) my guess is "they are pretty easy" / "humans are pretty bad at these tasks," but it's somewhat harder to quantify. If you think your methodology is different from that then we will probably end up disagreeing.

Looking towards more ambitious benchmarks, I think that the IMO grand challenge is currently significantly more than 5 years away. In 5 year's time my median guess (without almost any thinking about it) is that automated solvers can do 10% of non-geometry, non-3-variable-inequality IMO shortlist problems.

So yeah, I'm happy to play ball in this area, and I expect my predictions to be somewhat more right than yours after the dust settles. Is there some way of measuring such that you are willing to state any prediction?

(I still feel like I'm basically looking for any predictions at all beyond sometimes saying "my model ... (read more)

[-]Eliezer Yudkowsky3y90

I have a sense that there's a lot of latent potential for theorem-proving to advance if more energy gets thrown at it, in part because current algorithms seem a bit weird to me - that we are waiting on the equivalent of neural MCTS as an enabler for AlphaGo, not just a bigger investment, though of course the key trick could already have been published in any of a thousand papers I haven't read. I feel like I "would not be surprised at all" if we get a bunch of shocking headlines in 2023 about theorem-proving problems falling, after which the IMO challenge falls in 2024 - though of course, as events like this lie in the Future, they are very hard to predict.

Can you say more about why or whether you would, in this case, say that this was an un-Paulian set of events? As I have trouble manipulating my Paul model, it does not exclude Paul saying, "Ah, yes, well, they were using 700M models in that paper, so if you jump to 70B, of course the IMO grand challenge could fall; there wasn't a lot of money there." Though I haven't even glanced at any metrics here, let alone metrics that the IMO grand challenge could be plotted on, so if smooth metrics rule out IMO in 5yrs, I am more interested yet - it legit decrements my belief, but not nearly as much as I imagine it would decrement yours.

(Edit: Also, on the meta-level, is this, like, anywhere at all near the sort of thing you were hoping to hear from me? Am I now being a better epistemic citizen, if maybe not a good one by your lights?)

[-]Paul Christiano3y160

Yes, IMO challenge falling in 2024 is surprising to me at something like the 1% level or maybe even more extreme (though could also go down if I thought about it a lot or if commenters brought up relevant considerations, e.g. I'd look at IMO problems and gold medal cutoffs and think about what tasks ought to be easy or hard; I'm also happy to make more concrete per-question predictions). I do think that there could be huge amounts of progress from picking the low hanging fruit and scaling up spending by a few orders of magnitude, but I still don't expect it to get you that far.

I don't think this is an easy prediction to extract from a trendline, in significant part because you can't extrapolate trendlines this early that far out. So this is stress-testing different parts of my model, which is fine by me.

At the meta-level, this is the kind of thing I'm looking for, though I'd prefer have some kind of quantitative measure of how not-surprised you are. If you are only saying 2% then we probably want to talk about things less far in your tails than the IMO challenge.

[-]Eliezer Yudkowsky3y*150

Okay, then we've got at least one Eliezerverse item, because I've said below that I think I'm at least 16% for IMO theorem-proving by end of 2025. The drastic difference here causes me to feel nervous, and my second-order estimate has probably shifted some in your direction just from hearing you put 1% on 2024, but that's irrelevant because it's first-order estimates we should be comparing here.

So we've got huge GDP increases for before-End-days signs of Paulverse and quick IMO proving for before-End-days signs of Eliezerverse? Pretty bare portfolio but it's at least a start in both directions. If we say 5% instead of 1%, how much further would you extend the time limit out beyond 2024?

I also don't know at all what part of your model forbids theorem-proving to fall in a shocking headline followed by another headline a year later - it doesn't sound like it's from looking at a graph - and I think that explaining reasons behind our predictions in advance, not just making quantitative predictions in advance, will help others a lot here.

EDIT: Though the formal IMO challenge has a barnacle about the AI being open-sourced, which is a separate sociological prediction I'm not taking on.

[-]Paul Christiano3y190

I think IMO gold medal could be well before massive economic impact, I'm just surprised if it happens in the next 3 years. After a bit more thinking (but not actually looking at IMO problems or the state of theorem proving) I probably want to bump that up a bit, maybe 2%, it's hard reasoning about the tails.

I'd say <4% on end of 2025.

I think this is the flipside of me having an intuition where I say things like "AlphaGo and GPT-3 aren't that surprising"---I have a sense for what things are and aren't surprising, and not many things happen that are so surprising.

If I'm at 4% and you are 12% and we had 8 such bets, then I can get a factor of 2 if they all come out my way, and you get a factor of ~1.5 if one of them comes out your way.

I might think more about this and get a more coherent probability distribution, but unless I say something else by end of 2021 you can consider 4% on end of 2025 this my prediction.

[-]Eliezer Yudkowsky3y210

Maybe another way of phrasing this - how much warning do you expect to get, how far out does your Nope Vision extend? Do you expect to be able to say "We're now in the 'for all I know the IMO challenge could be won in 4 years' regime" more than 4 years before it happens, in general? Would it be fair to ask you again at the end of 2022 and every year thereafter if we've entered the 'for all I know, within 4 years' regime?

Added: This question fits into a larger concern I have about AI soberskeptics in general (not you, the soberskeptics would not consider you one of their own) where they saunter around saying "X will not occur in the next 5 / 10 / 20 years" and they're often right for the next couple of years, because there's only one year where X shows up for any particular definition of that, and most years are not that year; but also they're saying exactly the same thing up until 2 years before X shows up, if there's any early warning on X at all. It seems to me that 2 years is about as far as Nope Vision extends in real life, for any case that isn't completely slam-dunk; when I called upon those gathered AI luminaries to say the least impressive thing that... (read more)

[-]Paul Christiano3y90

I think I'll get less confident as our accomplishments get closer to the IMO grand challenge. Or maybe I'll get much more confident if we scale up from $1M -> $1B and pick the low hanging fruit without getting fairly close, since at that point further progress gets a lot easier to predict

There's not really a constant time horizon for my pessimism, it depends on how long and robust a trend you are extrapolating from. 4 years feels like a relatively short horizon, because theorem-proving has not had much investment so compute can be scaled up several orders of magnitude, and there is likely lots of low-hanging fruit to pick, and we just don't have much to extrapolate from (compared to more mature technologies, or how I expect AI will be shortly before the end of days), and for similar reasons there aren't really any benchmarks to extrapolate.

(Also note that it matters a lot whether you know what problems labs will try to take a stab at. For the purpose of all of these forecasts, I am trying insofar as possible to set aside all knowledge about what labs are planning to do though that's obviously not incentive-compatible and there's no particular reason you should trust me to do that.)

[-]Matthew Barnett3y100

I feel like I "would not be surprised at all" if we get a bunch of shocking headlines in 2023 about theorem-proving problems falling, after which the IMO challenge falls in 2024

Possibly helpful: Metaculus currently puts the chances of the IMO grand challenge falling by 2025 at about 8%. Their median is 2039.

I think this would make a great bet, as it would definitely show that your model can strongly outperform a lot of people (and potentially Paul too). And the operationalization for the bet is already there -- so little work will be needed to do that part.

[-]Eliezer Yudkowsky3y100

Ha! Okay then. My probability is at least 16%, though I'd have to think more and Look into Things, and maybe ask for such sad little metrics as are available before I was confident saying how much more. Paul?

EDIT: I see they want to demand that the AI be open-sourced publicly before the first day of the IMO, which unfortunately sounds like the sort of foolish little real-world obstacle which can prevent a proposition like this from being judged true even where the technical capability exists. I'll stand by a >16% probability of the technical capability existing by end of 2025, as reported on eg solving a non-trained/heldout dataset of past IMO problems, conditional on such a dataset being available; I frame no separate sociological prediction about whether somebody is willing to open-source the AI model that does it.

[-]Paul Christiano3y110

I don't care about whether the AI is open-sourced (I don't expect anyone to publish the weights even if they describe their method) and I'm not that worried about our ability to arbitrate overfitting.

Ajeya suggested that I clarify: I'm significantly more impressed by an AI getting a gold medal than getting a bronze, and my 4% probability is for getting a gold in particular (as described in the IMO grand challenge). There are some categories of problems that can be solved using easy automation (I'd guess about 5-10% could be done with no deep learning and modest effort). Together with modest progress in deep learning based methods, and a somewhat serious effort, I wouldn't be surprised by people getting up to 20-40% of problems. The bronze cutoff is usually 3/6 problems, and the gold cutoff is usually 5/6 (assuming the AI doesn't get partial credit). The difficulty of problems also increases very rapidly for humans---there are often 3 problems that a human can do more-or-less mechanically.

I could tighten any of these estimates by looking at the distribution more carefully rather than going off of my recollections from 2008, and if this was going to be one of a handful of things we'd bet about I'd probably spend a few hours doing that and some other basic digging.

[-]Paul Christiano3y*90

I looked at a few recent IMOs to get better calibrated. I think the main update is that I significantly underestimated how many years you can get a gold with only 4/6 problems.

For example I don't have the same "this is impossible" reaction about IMO 2012 or IMO 2015 as about most years. That said, I feel like they do have to get reasonably lucky with both IMO content and someone has to make a serious and mostly-successful effort, but I'm at least a bit scared by that. There's also quite often a geo problem as 3 or 6.

Might be good to make some side bets:

Conditioned on winning I think it's only maybe 20% probability to get all 6 problems (whereas I think you might have a higher probability on jumping right past human level, or at least have 50% on 6 vs 5?).
Conditioned on a model getting 3+ problems I feel like we have a pretty good guess about what algorithm will be SOTA on this problem (e.g. I'd give 50% to a pretty narrow class of algorithms with some uncertain bells and whistles, with no inside knowledge). Whereas I'd guess you have a much broader distribution.

But more useful to get other categories of bets. (Maybe in programming, investment in AI, economic impact from robotics, economic impact from chatbots, translation?)

[-]Paul Christiano3y110

Going through previous ten IMOs, and imagining a very impressive automated theorem prover, I think

2020 - unlikely, need 5/6 and probably can't get problems 3 or 6. Also good chance to mess up at 4 or 5
2019 - tough but possible, 3 seems hard but even that is not unimaginable, 5 might be hard but might be straightforward, and it can afford to get one wrong
2018 - tough but possible, 3 is easier for machine than human but probably still hard, 5 may be hard, can afford to miss one
2017 - tough but possible, 3 looks out of reach, 6 looks hard but not sure about that, 5 looks maybe hard, 1 is probably easy. But it can miss 2, which could happen.
2016 - probably not possible, 3 and 6 again look hard, and good chance to fail on 2 and 5, only allowed to miss 1
2015 - seems possible, 3 might be hard but like 50-50 it's simple for machine, 6 is probably hard, but you can miss 2
2014 - probably not possible, can only miss 1, probably miss one of 2 or 5 and 6
2013 - probably not possible, 6 seems hard, 2 seems very hard, can only miss 1
2012 - tough but possible, 6 and 3 look hard but you can miss 2
2011 - seems possible, allowed to miss two and both 3 and 6 look brute-forceable

Overall this... (read more)

[-]gwern3y70

What do you think of Deepmind's new whoop-de-doo about doing research-level math assisted by GNNs?

2Paul Christiano3y

Not surprising in any of the ways that good IMO performance would be surprising.

5Paul Christiano3y

Based on the other thread I now want to revise this prediction, both because 4% was too low and "IMO gold" has a lot of noise in it based on test difficulty. I'd put 4% on "For the 2022, 2023, 2024, or 2025 IMO an AI built before the IMO is able to solve the single hardest problem" where "hardest problem" = "usually problem #6, but use problem #3 instead if either: (i) problem 6 is geo or (ii) problem 3 is combinatorics and problem 6 is algebra." (Would prefer just pick the hardest problem after seeing the test but seems better to commit to a procedure.) Maybe I'll go 8% on "gets gold" instead of "solves hardest problem." Would be good to get your updated view on this so that we can treat it as staked out predictions.

3Ben Pace3y

(News: OpenAI has built a theorem-prover that solved many AMC12 and AIME competition problems, and 2 IMO problems, and they say they hope this leads to work that wins the IMO Grand Challenge.)

4Matthew Barnett3y

It feels like this bet would look a lot better if it were about something that you predict at well over 50% (with people in Paul's camp still maintaining less than 50%). So, we could perhaps modify the terms such that the bot would only need to surpass a certain rank or percentile-equivalent in the competition (and not necessarily receive the equivalent of a Gold medal). The relevant question is which rank/percentile you think is likely to be attained by 2025 under your model but you predict would be implausible under Paul's model. This may be a daunting task, but one way to get started is to put a probability distribution over what you think the state-of-the-art will look like by 2025, and then compare to Paul's. Edit: Here are, for example, the individual rankings for 2021: https://www.imo-official.org/year_individual_r.aspx?year=2021

[-]Eliezer Yudkowsky3y80

I expect it to be hella difficult to pick anything where I'm at 75% that it happens in the next 5 years and Paul is at 25%. Heck, it's not easy to find things where I'm at over 75% that aren't just obvious slam dunks; the Future isn't that easy to predict. Let's get up to a nice crawl first, and then maybe a small portfolio of crawlings, before we start trying to make single runs that pierce the sound barrier.

I frame no prediction about whether Paul is under 16%. That's a separate matter. I think a little progress is made toward eventual epistemic virtue if you hand me a Metaculus forecast and I'm like "lol wut" and double their probability, even if it turns out that Paul agrees with me about it.

4Rob Bensinger3y

My model of Eliezer may be wrong, but I'd guess that this isn't a domain where he has many over-50% predictions of novel events at all? See also 'I don't necessarily expect self-driving cars before the apocalypse'. My Eliezer-model has a more flat prior over what might happen, which therefore includes stuff like 'maybe we'll make insane progress on theorem-proving (or whatever) out of the blue'. Again, I may be wrong, but my intuition is that you're Paul-omorphizing Eliezer when you assume that >16% probability of huge progress in X by year Y implies >50% probability of smaller-but-meaningful progress in X by year Y.

1Rob Bensinger3y

(Ah, EY already replied.)

2Matthew Barnett3y

If this task is bad for operationalization reasons, there are other theorem proving benchmarks. Unfortunately it looks like there aren't a lot of people that are currently trying to improve on the known benchmarks, as far as I'm aware. The code generation benchmarks are slightly more active. I'm personally partial to Hendrycks et al.'s APPS benchmark, which includes problems that "range in difficulty from introductory to collegiate competition level and measure coding and problem-solving ability." (Github link).

4Paul Christiano3y

I think Metaculus is closer to Eliezer here: conditioned on this problem being resolved it seems unlikely for the AI to be either open-sourced or easily reproducible.

2Matthew Barnett3y

My honest guess is that most predictors didn’t see that condition and the distribution would shift right if someone pointed that out in the comments.

-2Matthew Barnett3y

To add to this sentiment, I'll post the graph from my notebook on language model progress. I refer to the Penn Treebank task a lot when making this point because it seems to have a lot of good data, but you can also look at the other tasks and see basically the same thing. The last dip in the chart is from GPT-3. It looks like GPT-3 was indeed a discontinuity in progress but not a very shocking one. It roughly would have taken about one or two more years at ordinary progress to get to that point anyway -- which I just don't see as being all that impressive. I sorta feel like the main reason why lots of people found GPT-3 so impressive was because OpenAI was just good at marketing the results [ETA: sorry, I take back the use of the word "marketing"]. Maybe OpenAI saw an opportunity to dump a lot of compute into language models and have a two year discontinuity ahead of everyone else, and showcase their work. And that strategy seemed to really worked well for them. I admit this is an uncharitable explanation, but is there a better story to tell about why GPT-3 captured so much attention?

[-]gwern3y*540

The impact of GPT-3 had nothing whatsoever to do with its perplexity on Penn Treebank. I think this is a good example of why focusing on perplexity and 'straight lines on graph go brr' is so terrible, such cargo cult mystical thinking, and crippling. There's something astonishing to see someone resort to explaining away GPT-3's impact as 'OpenAI was just good at marketing the results'. Said marketing consisted of: 'dropping a paper on Arxiv'. Not even tweeting it! They didn't even tweet the paper! (Forget an OA blog post, accompanying NYT/TR articles, tweets by everyone at OA, a fancy interactive interface - none of that.) And most of the initial reaction was "GPT-3: A Disappointing Paper"-style. If this is marketing genius, then it is truly 40-d chess, is all I can say.

The impact of GPT-3 was in establishing that trendlines did continue in a way that shocked pretty much everyone who'd written off 'naive' scaling strategies. Progress is made out of stacked sigmoids: if the next sigmoid doesn't show up, progress doesn't happen. Trends happen, until they stop. Trendlines are not caused by the laws of physics. You can dismiss AlphaGo by saying "oh, that just continues the trendline in... (read more)

[-]Eliezer Yudkowsky3y270

And to say it also explicitly, I think this is part of why I have trouble betting with Paul. I have a lot of ? marks on the questions that the Gwern voice is asking above, regarding them as potentially important breaks from trend that just get dumped into my generalized inbox one day. If a gradualist thinks that there ought to be a smooth graph of perplexity with respect to computing power spent, in the future, that's something I don't care very much about except insofar as it relates in any known way whatsoever to questions like those the Gwern voice is asking. What does it even mean to be a gradualist about any of the important questions like those of the Gwern-voice, when they don't relate in known ways to the trend lines that are smooth? Isn't this sort of a shell game where our surface capabilities do weird jumpy things, we can point to some trend lines that were nonetheless smooth, and then the shells are swapped and we're told to expect gradualist AGI surface stuff? This is part of the idea that I'm referring to when I say that, even as the world ends, maybe there'll be a bunch of smooth trendlines underneath it that somebody could look back and point out. (Which you could in fact have used to predict all the key jumpy surface thresholds, if you'd watched it all happen on a few other planets and had any idea of where jumpy surface events were located on the smooth trendlines - but we haven't watched it happen on other planets so the trends don't tell us much we want to know.)

[-]Paul Christiano3y130

This seems totally bogus to me.

It feels to me like you mostly don't have views about the actual impact of AI as measured by jobs that it does or the $s people pay for them, or performance on any benchmarks that we are currently measuring, while I'm saying I'm totally happy to use gradualist metrics to predict any of those things. If you want to say "what does it mean to be a gradualist" I can just give you predictions on them.

To you this seems reasonable, because e.g. $ and benchmarks are not the right way to measure the kinds of impacts we care about. That's fine, you can propose something other than $ or measurable benchmarks. If you can't propose anything, I'm skeptical.

My basic guess is that you probably can't effectively predict $ or benchmarks or anything else quantitative. If you actually agreed with me on all that stuff, then I might suspect that you are equivocating between a gradualist-like view that you use for making predictions about everything near term and then switching to a more bizarre perspective when talking about the future. But fortunately I think this is more straightforward, because you are basically being honest when you say that you don't understand how the gradualist perspective makes predictions.

[-]Eliezer Yudkowsky3y250

I kind of want to see you fight this out with Gwern (not least for social reasons, so that people would perhaps see that it wasn't just me, if it wasn't just me).

But it seems to me that the very obvious GPT-5 continuation of Gwern would say, "Gradualists can predict meaningless benchmarks, but they can't predict the jumpy surface phenomena we see in real life." We want to know when humans land on the moon, not whether their brain sizes continued on a smooth trend extrapolated over the last million years.

I think there's a very real sense in which, yes, what we're interested in are milestones, and often milestones that aren't easy to define even after the fact. GPT-2 was shocking, and then GPT-3 carried that shock further in that direction, but how do you talk with that about somebody who thinks that perplexity loss is smooth? I can handwave statements like "GPT-3 started to be useful without retraining via just prompt engineering" but qualitative statements like those aren't good for betting, and it's much much harder to come up with the right milestone like that in advance, instead of looking back in your rearview mirror afterwards.

But you say - I think? - that yo... (read more)

2Matthew Barnett3y

Don't you think you're making a falsifiable prediction here? Name something that you consider part of the "jumpy surface phenomena" that will show up substantially before the world ends (that you think Paul doesn't expect). Predict a discontinuity. Operationalize everything and then propose the bet.

3Eliezer Yudkowsky3y

(I'm currently slightly hopeful about the theorem-proving thread, elsewhere and upthread.)

[-]Matthew Barnett3y40

What does it even mean to be a gradualist about any of the important questions like those of the Gwern-voice, when they don't relate in known ways to the trend lines that are smooth?

Perplexity is one general “intrinsic” measure of language models, but there are many task-specific measures too. Studying the relationship between perplexity and task-specific measures is an important part of the research process. We shouldn’t speak as if people do not actively try to uncover these relationships.

I would generally be surprised if there were many highly non-linear relationship between perplexity and something like Winograd accuracy, human evaluation, or whatever other concrete measure you can come up with, such that the underlying behavior of the surface phenomenon is best described as a discontinuity with the past even when the latent perplexity changed smoothly. I admit the existence of some measures that exhibit these qualities (such as, potentially, the ability to do arithmetic), but I expect them to be quite a bit harder to find than the reverse.

Furthermore, it seems like if this is the crux — ie. that surface-level qualitative phenomena will experience discontinuities even while ... (read more)

[-]Eliezer Yudkowsky3y100

Well put / endorsed / +1.

[-]Paul Christiano3y90

I think that most people who work on models like GPT-3 seem more interested in trendlines than you do here.

That said, it's not super clear to me what you are saying so I'm not sure I disagree. Your narrative sounds like a strawman since people usually extrapolate performance on downstream tasks they care about rather than on perplexity. But I do agree that the updates from GPT-3 are not from OpenAI's marketing but instead from people's legitimate surprise about how smart big language models seem to be.

As you say, I think the interesting claim in GPT-3 was basically that scaling trends would continue, where pessimists incorrectly expected they would break based on weak arguments. I think that looking at all the graphs, both of perplexity and performance on individual tasks, helps establish this as the story. I don't really think this lines up with Eliezer's picture of AGI but that's presumably up for debate.

There are always a lot of people willing to confidently decree that trendlines will break down without much argument. (I do think that eventually the GPT-3 trendline will break if you don't change the data, but for the boring reason that the entropy of natural language will eventually dominate the gradient noise and so lead to a predictable slowdown.)

[-]Matthew Barnett3y20

There's something astonishing to see someone resort to explaining away GPT-3's impact as 'OpenAI was just good at marketing the results'. Said marketing consisted of: 'dropping a paper on Arxiv'. Not even tweeting it!

Yeah, my phrasing there was not ideal here. I regret using the word "marketing", but to be fair, I mostly meant what I said in the next few sentences, "Maybe OpenAI saw an opportunity to dump a lot of compute into language models and have a two year discontinuity ahead of everyone else, and showcase their work. And that strategy seemed to really worked well for them."

Of course, seeing that such an opportunity exists is itself laudable and I give them Bayes points for realizing that scaling laws are important. At the same time, don't you think we would have expected similar results in like two more years at ordinary progress?

I do agree that it's extremely interesting to know why the lines go straight. I feel like I wasn't trying to say that GPT-3 wasn't intrinsically interesting. I was more saying it wasn't unpredictable, in the sense that Paul Christiano would have strongly said "no I do not expect that to happen" in 2018.

[-]gwern3y*240

Again, the fact that it is a straight line on a metric which is, if not meaningless, is extremely difficult to interpret, is irrelevant. Maybe OA moved up by 2 years. Why would anyone care in the slightest bit? That is, before they knew about how interesting the consequences would be of that small change in BPC?

At the same time, don't you think we would have expected similar results in like two more years at ordinary progress?

Who's 'we', exactly? Who are these people who expected all of this to happen, and are going around saying "ah yes, these BIG-Bench results are exactly as I calculated back in 2018, the capabilities are all emerging like clockwork, each at their assigned BPC; next is capability Z, obviously"? And what are they saying about 500b, 1000b, and so on?

I was more saying it wasn't unpredictable, in the sense that Paul Christiano would have strongly said "no I do not expect that to happen" in 2018.

OK. So can you link me to someone saying in 2018 that we'd see GPT-2-1.5b's behavior at ~1.5b parameters, and that we'd get few-shot metalearning and instructability past that with another OOM? And while you're at it, if it's so predictable, please answer all the other... (read more)

3Matthew Barnett3y

Because the point I was trying to make was that the result was relatively predictable? I'm genuinely confused what you're asking. I get a slight sense that you're interpreting me as saying something about the inherent dullness of GPT-3 or that it doesn't teach us anything interesting about AI, but I don't see myself as saying anything like that. I actually really enjoy reading the output from it, your commentary on it, and what it reveals about the nature of intelligence. I am making purely a point about predictability, and whether the result was a "discontinuity" from past progress, in the sense meant by Paul Christiano (in the way I think he means these things). We refers in that sentence to competent observers in 2018 who predict when we'll get ML milestones mostly by using the outside view, ie. by extrapolating trends on charts. No, but 1. That seems like a different and far more specific question than whether we'd have language models that perform at roughly the same measured-level as GPT-3. 2. In general, people make very few specific predictions about what they expect to happen in the future about these sorts of things (though, if I may add, I've been making modest progress trying to fix this broad problem by writing lots of specific questions on Metaculus).

[-]Edouard Harris3y182

I think what gwern is trying to say is that continuous progress on a benchmark like PTB appears (from what we've seen so far) to map to discontinuous progress in qualitative capabilities, in a surprising way which nobody seems to have predicted in advance. Qualitative capabilities are more relevant to safety than benchmark performance is, because while qualitative capabilities include things like "code a simple video game" and "summarize movies with emojis", they also include things like "break out of confinement and kill everyone". It's the latter capability, and not PTB performance, that you'd need to predict if you wanted to reliably stay out of the x-risk regime — and the fact that we can't currently do so is, I imagine, what brought to mind the analogy between scaling and Russian roulette.

I.e., a straight line in domain X is indeed not surprising; what's surprising is the way in which that straight line maps to the things we care about more than X.

(Usual caveats apply here that I may be misinterpreting folks, but that is my best read of the argument.)

[-]Matthew Barnett3y70

I think what gwern is trying to say is that continuous progress on a benchmark like PTB appears (from what we've seen so far) to map to discontinuous progress in qualitative capabilities, in a surprising way which nobody seems to have predicted in advance.

This is a reasonable thesis, and if indeed it's the one Gwern intended, then I apologize for missing it!

That said, I have a few objections,

Isn't it a bit suspicious that the thing-that's-discontinuous is hard to measure, but the-thing-that's-continuous isn't? I mean, this isn't totally suspicious, because subjective experiences are often hard to pin down and explain using numbers and statistics. I can understand that, but the suspicion is still there.
"No one predicted X in advance" is only damning to a theory if people who believed that theory were making predictions about it at all. If people who generally align with Paul Christiano were indeed making predictions to the effect of GPT-3 capabilities being impossible or very unlikely within a narrow future time window, then I agree that would be damning to Paul's worldview. But -- and maybe I missed something -- I didn't see that. Did you?
There seems to be an implicit claim that Pa

... (read more)

[-]Matthew "Vaniver" Gray3y*250

it seems like extrapolating from the past still gives you a lot better of a model than most available alternatives.

My impression is that some people are impressed by GPT-3's capabilities, whereas your response is "ok, but it's part of the straight-line trend on Penn Treebank; maybe it's a little ahead of schedule, but nothing to write home about." But clearly you and they are focused on different metrics!

That is, suppose it's the case that GPT-3 is the first successfully commercialized language model. (I think in order to make this literally true you have to throw on additional qualifiers that I'm not going to look up; pretend I did that.) So on a graph of "language model of type X revenue over time", total revenue is static at 0 for a long time and then shortly after GPT-3's creation departs from 0.

It seems like the fact that GPT-3 could be commercialized in this way when GPT-2 couldn't is a result of something that Penn Treebank perplexity is sort of pointing at. (That is, it'd be hard to get a model with GPT-3's commercializability but GPT-2's Penn Treebank score.) But what we need in order for the straight line on PTB to be useful as a model for predicting revenue i... (read more)

2Matthew Barnett3y

I think it's the nature of every product that comes on the market that it will experience a discontinuity from having zero revenue to having some revenue at some point. It's an interesting question of when that will happen, and maybe your point is simply that it's hard to predict when that will happen when you just look at the Penn Treebank trend. However, I suspect that the revenue curve will look pretty continuous, now that it's gone from zero to one. Do you disagree? In a world with continuous, gradual progress across a ton of metrics, you're going to get discontinuities from zero to one. I don't think anyone from the Paul camp disagrees with that (in fact, Katja Grace talked about this in her article). From the continuous takeoff perspective, these discontinuities don't seem very relevant unless going from zero to one is very important in a qualitative sense. But I would contend that going from "no revenue" to "some revenue" is not actually that meaningful in the sense of distinguishing AI from the large class of other economic products that have gradual development curves.

[-]Matthew "Vaniver" Gray3y120

your point is simply that it's hard to predict when that will happen when you just look at the Penn Treebank trend.

This is a big part of my point; a smaller elaboration is that it can be easy to trick yourself into thinking that, because you understand what will happen with PTB, you'll understand what will happen with economics/security/etc., when in fact you don't have much understanding of the connection between those, and there might be significant discontinuities. [To be clear, I don't have much understanding of this either; I wish I did!]

For example, I imagine that, by thirty years from now, we'll have language/code models that can do significant security analysis of the code that was available in 2020, and that this would have been highly relevant/valuable to people in 2020 interested in computer security. But when will this happen in the 2020-2050 range that seems likely to me? I'm pretty uncertain, and I expect this to look a lot like 'flicking a switch' in retrospect, even tho the leadup to flicking that switch will probably look like smoothly increasing capabilities on 'toy' problems.

[My current guess is that Paul / people in "Paul's camp" would mostly agree with the prev... (read more)

5Edouard Harris3y

Yeah, these are interesting points. I sympathize with this view, and I agree there is some element of truth to it that may point to a fundamental gap in our understanding (or at least in mine). But I'm not sure I entirely agree that discontinuous capabilities are necessarily hard to measure: for example, there are benchmarks available for things like arithmetic, which one can train on and make quantitative statements about. I think the key to the discontinuity question is rather that 1) it's the jumps in model scaling that are happening in discrete increments; and 2) everything is S-curves, and a discontinuity always has a linear regime if you zoom in enough. Those two things together mean that, while a capability like arithmetic might have a continuous performance regime on some domain, in reality you can find yourself halfway up the performance curve in a single scaling jump (and this is in fact what happened with arithmetic and GPT-3). So the risk, as I understand it, is that you end up surprisingly far up the scale of "world-ending" capability from one generation to the next, with no detectable warning shot beforehand. No, you're right as far as I know; at least I'm not aware of any such attempted predictions. And in fact, the very absence of such prediction attempts is interesting in itself. One would imagine that correctly predicting the capabilities of an AI from its scale ought to be a phenomenally valuable skill — not just from a safety standpoint, but from an economic one too. So why, indeed, didn't we see people make such predictions, or at least try to? There could be several reasons. For example, perhaps Paul (and other folks who subscribe to the "continuum" world-model) could have done it, but they were unaware of the enormous value of their predictive abilities. That seems implausible, so let's assume they knew the value of such predictions would be huge. But if you know the value of doing something is huge, why aren't you doing it? Well, if you'r

[-]Jacob Steinhardt3y200

My basic take is that there will be lots of empirical examples where increasing model size by a factor of 100 leads to nonlinear increases in capabilities (and perhaps to qualitative changes in behavior). On median, I'd guess we'll see at least 2 such examples in 2022 and at least 100 by 2030.

At the point where there's a "FOOM", such examples will be commonplace and happening all the time. Foom will look like one particularly large phase transition (maybe 99th percentile among examples so far) that chains into more and more. It seems possible (though not certain--maybe 33%?) that once you have the right phase transition to kick off the rest, everything else happens pretty quickly (within a few days).

Is this take more consistent with Paul's or Eliezer's? I'm not totally sure. I'd guess closer to Paul's, but maybe the "1 day" world is consistent with Eliezer's?

(One candidate for the "big" phase transition would be if the model figures out how to go off and learn on its own, so that number of SGD updates is no longer the primary bottleneck on model capabilities. But I could also imagine us getting that even when models are still fairly "dumb".)

[-]Raymond Arnold3y190

So... I totally think there are people who sort of nod along with Paul, using it as an excuse to believe in a rosier world where things are more comprehensible and they can imagine themselves doing useful things without having a plan for solving the actual hard problems. Those types of people exist. I think there's some important work to be done in confronting them with the hard problem at hand.

But, also... Paul's world AFAICT isn't actually rosier. It's potentially more frightening to me. In Smooth Takeoff world, you can't carefully plan your pivotal act with an assumption that the strategic landscape will remain roughly the same by the time you're able to execute on it. Surprising partial-gameboard-changing things could happen that affect what sort of actions are tractable. Also, dumb, boring ML systems run amok could kill everyone before we even get to the part where recursive self improving consequentialists eradicate everyone.

I think there is still something seductive about this world – dumb, boring ML systems run amok feels like the sort of problem that is easier to reason about and maybe solve. (I don't think it's actually necessarily easier to solve, but I think it ca... (read more)

[-]RomanS3y*170

your view seems to imply that we will move quickly from much worse than humans to much better than humans, but it's likely that we will move slowly through the human range on many tasks

We might be able to falsify that in a few months.

There is a joint Google / OpenAI project called BIG-bench. They've crowdsourced ~200 of highly diverse text tasks (from answering scientific questions to predicting protein interacting sites to measuring self-awareness).

One of the goals of the project is to see how the performance on the tasks is changing with the model size, with the size ranging by many orders of magnitude.

A half-year ago, they presented some preliminary results. A quick summary:

if you increase the N of parameters from 10^7 to 10^10, the aggregate performance score grows roughly like log(N).

But after the 10^10 point, something interesting happens: the score starts growing much faster (~N).

And for some tasks, the plot looks like a hockey stick (a sudden change from ~0 to almost-human).

The paper with the full results is expected to be published in the next few months.

Judging by the preliminary results, the FOOM could start like this:

The GPT-5 still sucks on most tasks. It's mostly useless. But what if we increase parameters_num by 2? What could possibly go wrong?

[-]Daniel Kokotajlo3y90

Hot damn, where can I see these preliminary results?

[-]RomanS3y110

The results were presented at a workshop by the project organizers. The video from the workshop is available here (the most relevant presentation starts at 5:05:00).

It's one of those innocent presentations that, after you understand the implications, keep you awake at night.

[-]Lukas Finnveden3y70

Presumably you're referring to this graph. The y-axis looks like the kind of score that ranges between 0 and 1, in which case this looks sort-of like a sigmoid to me, which accelerates when it gets closer to ~50% performance (and decelarates when it gets closer to 100% performance).

If so, we might want to ask whether these tasks are chosen ~randomly (among tasks that are indicative of how useful AI is) or if they're selected for difficulty in some way. In particular, assume that most tasks look sort-of like a sigmoid as they're scaled up (accelerating around 50%, improving slower when they're closer to 0% and 100%). Then you might think that the most exciting tasks to submit to big bench would be the tasks that can't be handled by small models, but that large models rapidly improve upon (as opposed to tasks that are basically-solved already by 10^10 parameters). In which case the aggregation of all these tasks could be expected to look sort-of like this, improving faster after 10^10 than before.

...is one story I can tell, but idk if I would have predicted that beforehand, and fast acceleration after 10^10 is certainly consistent with many people's qualitative impressions of GPT-3. So maybe there is some real acceleration going on.

(Also, see this post for similar curves, but for the benchmarks that OpenAI tested GPT-3 on. There's no real acceleration visible there, other than for arithmetic.)

3RomanS3y

The preliminary results where obtained on a subset of the full benchmark (~90 tasks vs 206 tasks). And there were many changes since then, including scoring changes. Thus, I'm not sure we'll see the same dynamics in the final results. Most likely yes, but maybe not. I agree that the task selection process could create the dynamics that look like the acceleration. A good point. As I understand, the organizers have accepted almost all submitted tasks (the main rejection reasons were technical - copyright etc). So, it was mostly self-selection, with the bias towards the hardest imaginable text tasks. It seems that for many contributors, the main motivation was something like: This includes many cognitive tasks that are supposedly human-complete (e.g. understanding of humor, irony, ethics), and the tasks that are probing the model's generality (e.g. playing chess, recognizing images, navigating mazes - all in text). I wonder if the performance dynamics on such tasks will follow the same curve. The list of of all tasks is available here.

[-]Evan Hubinger3y70

But after the 10^10 point, something interesting happens: the score starts growing much faster (~N).

And for some tasks, the plot looks like a hockey stick (a sudden change from ~0 to almost-human).

Seems interestingly similar to the grokking phenomenon.

[-]Daniel Kokotajlo3y*161

[ETA: In light of pushback from Rob: I really don't want this to become a self-fulfilling prophecy. My hope in making this post was to make the prediction less likely to come true, not more! I'm glad that MIRI & Eliezer are publicly engaging with the rest of the community more again, I want that to continue, and I want to do my part to help everybody to understand each other.]

And I know, before anyone bothers to say, that all of this reply is not written in the calm way that is right and proper for such arguments. I am tired. I have lost a lot of hope. There are not obvious things I can do, let alone arguments I can make, which I expect to be actually useful in the sense that the world will not end once I do them. I don't have the energy left for calm arguments. What's left is despair that can be given voice.

I grimly predict that the effect of this dialogue on the community will be polarization: People who didn't like Yudkowsky and/or his views will like him / his views less, and the gap between them and Yud-fans will grow (more than it shrinks due to the effect of increased dialogue). I say this because IMO Yudkowsky comes across as angry and uncharitable in various parts of ... (read more)

[-]Rob Bensinger3y240

I grimly predict that the effect of this dialogue on the community will be polarization

Beware of self-fulfilling prophecies (and other premature meta)! If both sides in a dispute expect the other side to just entrench, then they're less likely to invest the effort to try to bridge the gap.

This very comment section is one of the main things that will determine the community's reaction, and diverting our focus to 'what will our reaction be?' before we've talked about the object-level claims can prematurely lock in a certain reaction.

(That said, I think you're doing a useful anti-polarization thing here, by showing empathy for people you disagree with, and showing willingness to criticize people you agree with. I don't at all dislike this comment overall; I just want to caution against giving up on something before we've really tried. This is the first proper MIRI-response to Paul's takeoff post, and should be a pretty big update for a lot of people -- I don't think people were even universally aware that Eliezer endorses hard takeoff anymore, much less aware of his reasoning.)

[-]Daniel Kokotajlo3y150

Fair enough! I too dislike premature meta, and feel bad that I engaged in it. However... I do still feel like my comment probably did more to prevent polarization than cause it? That's my independent impression at any rate. (For the reasons you mention).

I certainly don't want to give up! In light of your pushback I'll edit to add something at the top.

4Adam Shimi3y

Strongly agree with that. Since you agree with Yudkowksy, do you think you could strongman his position?

[-]Daniel Kokotajlo3y*150

Yes, though I'm much more comfortable explaining and arguing for my own position than EY's. It's just that my position turns out to be pretty similar. (Partly this is independent convergence, but of course partly this is causal influence since I've read a lot of his stuff.)

There's a lot to talk about, I'm not sure where to begin, and also a proper response would be a whole research project in itself. Fortunately I've already written a bunch of it; see these two sequences.

Here are some quick high-level thoughts:

1. Begin with timelines. The best way to forecast timelines IMO is Ajeya's model; it should be the starting point and everything else should be adjustments from it. The core part of Ajeya's model is a probability distribution over how many OOMs of compute we'd need with today's ideas to get to TAI / AGI / APS-AI / AI-PONR / etc. [Unfamiliar with these acronyms? See Robbo's helpful comment below] For reasons which I've explained in my sequence (and summarized in a gdoc) my distribution has significantly more mass on the 0-6 OOM range than Paul does, and less on the 13+ range. The single post that conveys this intuition most is Fun with +12 OOMs.

Now consider how takeoff speed v... (read more)

[-]johnswentworth3y120

I feel like the debate between EY and Paul (and the broader debate about fast vs. slow takeoff) has been frustratingly much reference class tennis and frustratingly little gears-level modelling.

So, there's this inherent problem with deep gearsy models, where you have to convey a bunch of upstream gears (and the evidence supporting them) before talking about the downstream questions of interest, because if you work backwards then peoples' brains run out of stack space and they lose track of the whole multi-step path. But if you just go explaining upstream gears first, then people won't immediately see how they're relevant to alignment or timelines or whatever, and then lots of people just wander off. Then you go try to explain something about alignment or timelines or whatever, using an argument which relies on those upstream gears, and it goes right over a bunch of peoples' heads because they don't have that upstream gear in their world-models.

For the sort of argument in this post, it's even worse, because a lot of people aren't even explicitly aware that the relevant type of gear is a thing, or how to think about it beyond a rough intuitive level.

I first ran into this problem in t... (read more)

[-]Rafael Harth3y*140

Survey on model updates from reading this post. Figuring out to what extent this post has led people to update may inform whether future discussions are valuable.

Results: (just posting them here, doesn't really need its own post)

The question was to rate agreement on the 1=Paul to 9=Eliezer axis before and after reading this post.

Data points: 35

Mean: $5.2 \to 6.06$

Median: $5 \to 7$

Graph of distribution before (blue) and after (red) and of mean shifts based on prior position (horizontal bar chart).

Raw Data

Anynymous Comments:

Agreement more on need for actions than on probabilities. Would be better to first present points of agreement (that it is at least possible for non(dangerously)-general AI to change situation).

the post was incredibly confusing to me and so I haven't really updated at all because I don't feel like I can crisply articulate yudkowsky's model or his differences with christiano

4Daniel Kokotajlo3y

Wow, I did not expect those results!

[-]Ramana Kumar3y90

I wonder what effect there is from selecting for reading the third post in a sequence of MIRI conversations from start to end and also looking at the comments and clicking links in them.

3Edouard Harris3y

(Not being too specific to avoid spoilers) Quick note: I think the direction of the shift in your conclusion might be backwards, given the statistics you've posted and that 1=Eliezer and 9=Paul.

4Lukas Finnveden3y

No, the form says that 1=Paul. It's just the first sentence under the spoiler that's wrong.

2Edouard Harris3y

Good catch! I didn't check the form. Yes you are right, the spoiler should say (1=Paul, 9=Eliezer) but the conclusion is the right way round.

4Rafael Harth3y

Yeah, it's fixed now. Thanks for pointing it out.

1Ben Pace3y

How interesting; I am the median.

[-]Nisan3y130

The central hypothesis of "takeoff speeds" is that at the time of serious AGI being developed, it is perfectly anti-Thielian in that it is devoid of secrets

No, the slow takeoff model just precludes there being one big secret that unlocks both 30%/year growth and dyson spheres. It's totally compatible with a bunch of medium-sized $1B secrets that different actors discover, adding up to hyperbolic economic growth in the years leading up to "rising out of the atmosphere".

Rounding off the slow takeoff hypothesis to "lots and lots of little innovations adding up to every key AGI threshold, which lots of actors are investing $10 million in at a time" seems like black-and-white thinking, demanding that the future either be perfectly Thielien or perfectly anti-Thielien. The real question is a quantitative one — how lumpy will takeoff be?

[-]Matthew Barnett3y80

Unfortunately, it looks like Yudkowsky and Christiano weren't able to come to an agreement on what bets to make.

In place of that, I'll ask, whatever camp you belong to: what concrete predictions do you make that you believe most strongly diverge from what people in the "other" camp believe, and can be resolved substantially before the world ends?

I propose we restrict our predictions to roughly 2026, which is pretty soon but probably not world-ending-soon (on almost all views).

[-]Nisan3y60

it legitimately takes the whole 4 years after that to develop real AGI that ends the world. FINE. SO WHAT. EVERYONE STILL DIES.

By Gricean implicature, "everyone still dies" is relevant to the post's thesis. Which implies that the post's thesis is that humanity will not go extinct. But the post is about the rate of AI progress, not human extinction.

This seems like a bucket error, where "will takeoff be fast or slow?" and "will AI cause human extinction?" are put in the same bucket.

[-]Ben Pace2y*30Review for 2021 Review

Paul's post on takeoff speed had long been IMO the last major public step in the dialogue on this subject (not forgetting to honorably mention Katja's crazy discontinuous progress examples and Kokotajlo's arguments against using GPD as a metric), and I found it exceedingly valuable to read how it reads to someone else who has put in a great deal of work into figuring out what's true about the topic, thinks about it in very different ways, and has come to different views on it. I found this very valuable for my own understanding of the subject, and I felt I learned a bunch on reading it.
Eliezer wrote it from a fairly exasperated (and a little desperate) place and that comes across in the writing. I think if you aren't literally Paul and you are interested in the subject, then you should get over that and read it for the insights. I think if you are literally Paul then it's quite reasonable to be very defensive in the ensuing dialogue.
I do not know what to make of the monkeys/chimp thing, except to be at least fairly scared about similarly sudden improvements in generality occurring again (though I acknowledge Paul has an argument that we shouldn't expect to see that again).
I could s

... (read more)

[-]Lukas Finnveden3y30

Oh, come on. That is straight-up not how simple continuous toy models of RSI work. Between a neutron multiplication factor of 0.999 and 1.001 there is a very huge gap in output behavior.

Nitpick: I think that particular analogy isn't great.

For nuclear stuff, we have two state variables: amount of fissile material and current number of neutrons flying around. The amount of fissile material determines the "neutron multiplication factor", but it is the number of neutrons that goes crazy, not fissile material. And the current number of neurons doesn't matter f... (read more)

[-]Nisan3y10

"Takeoff Speeds" has become kinda "required reading" in discussions on takeoff speeds. It seems like Eliezer hadn't read it until September of this year? He may have other "required reading" from the past four years to catch up on.

Ajeya Cotra's "2020 Draft Report on Biological Anchors" is probably the most detailed public model of AI timelines.
Paul Christiano's "What failure looks like" is a slightly more concrete illustration of Paul's model of the future. Unfortunately, it avoids talking about human extinction in a way that gives the false impression

... (read more)

[This comment is no longer endorsed by its author]Reply

[-]Eliezer Yudkowsky3y130

I read "Takeoff Speeds" at the time. I did not liveblog my reaction to it at the time. I've read the first two other items.

I flag your weirdly uncharitable inference.

[-]Nisan3y130

I apologize, I shouldn't have leapt to that conclusion.

[-]Eliezer Yudkowsky3y70

Apology accepted.

[-]johnswentworth3y120

FWIW, I did not find this weirdly uncharitable, only mildly uncharitable. I have extremely wide error bars on what you have and have not read, and "Eliezer has not read any of the things on that list" was within those error bars. It is really quite difficult to guess your epistemic state w.r.t. specific work when you haven't been writing about it for a while.

(Though I guess you might have been writing about it on Twitter? I have no idea, I generally do not use Twitter myself, so I might have just completely missed anything there.)

[-]Eliezer Yudkowsky3y80

The "weirdly uncharitable" part is saying that it "seemed like" I hadn't read it vs. asking. Uncertainty is one thing, leaping to the wrong guess another.

[-]Rob Bensinger3y60

Yeah, even I wasn't sure you'd read those three things, Eliezer, though I knew you'd at least glanced over 'Takeoff Speeds' and 'Biological Anchors' enough to form opinions when they came out. :)

1Rob Bensinger3y

(... Admittedly, you read fast enough that my 'skimming' is your 'reading'. 😶)

Moderation Log