(This seems to me to be what many people imagine will happen to the pieces of the AGI puzzle other than the piece they’re most familiar with, via some sort of generalized Gell-Mann amnesia: the tech folk know that the technical arena is in shambles, but imagine that policy has the ball, and vice versa on the policy side. But whatever.)
Just wanted to say that this is also my impression, and has been for months. Technical & Policy both seem to be planning on the assumption that the other side will come through in the end, but the things they imagine this involving seem to be things that the other side thinks is pretty unlikely.
And, lest you wonder what sort of single correlated already-known-to-me variable could make my whole argument and confidence come crashing down around me, it's whether humanity's going to rapidly become much more competent about AGI than it appears to be about everything else.
I conclude from this that we should push on making humanity more competent at everything that affects AGI outcomes, including policy, development, deployment, and coordination. In other times I'd think that's pretty much impossible, but on my model of how AI goes our ability to increase our competence at reasoning, evidence, argumentation, and planning is sufficiently correlated with getting closer to AGI that it's only very hard.
I imagine you think that this is basically impossible, i.e. not worth intervening on. Does that seem right?
If so, I'd guess your reasons are something like this:
weaksauce Overton-abiding stuff about 'improving public epistemology by setting GPT-4 loose on Twitter to provide scientifically literate arguments about everything' will be cool but will not actually prevent Facebook AI Research from destroying the world six months later, or some eager open-source collaborative from destroying the world a year later if you manage to stop FAIR specifically.
Does that sound right? Are there other important reasons?
Note: As usual, Rob Bensinger helped me with editing. I recently discussed this model with Alex Lintz, who might soon post his own take on it (edit: here).
Some people seem to be under the impression that I believe AGI ruin is a small and narrow target to hit. This is not so. My belief is that most of the outcome space is full of AGI ruin, and that avoiding it is what requires navigating a treacherous and narrow course.
So, to be clear, here is a very rough model of why I think AGI ruin is likely. (>90% likely in our lifetimes.)[1]
My real models are more subtle, take into account more factors, and are less articulate. But people keep coming to me saying "it sounds to me like you think humanity will somehow manage to walk a tightrope, traverse an obstacle course, and thread a needle in order to somehow hit the narrow target of catastrophe, and I don't understand how you're so confident about this". (Even after reading Eliezer's AGI Ruin post—which I predominantly agree with, and which has a very disjunctive character.)
Hopefully this sort of toy model will at least give you some vague flavor of where I’m coming from.
Simplified Nate-model
The short version of my model is this: from the current position on the game board, a lot of things need to go right, if we are to survive this.
In somewhat more detail, the following things need to go right:
(I could also add a list of possible disasters from misuse, conditional on us successfully navigating all of the above problems. But conditional on us clearing all of the above hurdles, I feel pretty optimistic about the relevant players’ reasonableness, such that the remaining risks seem much more moderate and tractable to my eye. Thus I’ll leave out misuse risk from my AGI-ruin model in this post; e.g., the ">90% likely in our lifetimes" probability is just talking about misalignment risk.)
One way that this list is a toy model is that it’s assuming we have an actual alignment problem to face, under some amount of time pressure. Alternatives include things like getting (fast, high-fidelity) whole-brain emulation before AGI (which comes with a bunch of its own risks, to be clear). The probability that we somehow dodge the alignment problem in such a way puts a floor on how low models like the above can drive the probabilities of success down (though I’m pessimistic enough about the known-to-me non-AGI strategies that my unconditional p(ruin) is nonetheless >90%).
Some of these bullets trade off against each other: sufficiently good technical solutions might obviate the need for good AGI-team dynamics or good global-scale coordination, and so on. So these factors aren't totally disjunctive. But this list hopefully gives you a flavor for how it looks to me like a lot of separate things need to go right, simultaneously, in order for us to survive, at this point. Saving the world requires threading the needle; destroying the world is the default.
Correlations and general competence
You may object: "But Nate, you've warned of the multiple-stage fallacy; surely here you're guilty of the dual fallacy? You can't say that doom is high because three things need to go right, and multiply together the lowish probabilities that all three go right individually, because these are probably correlated."
Yes, they are correlated. They're especially correlated through the fact that the world is derpy.
This is the world where the US federal government's response to COVID was to ban private COVID testing, confiscate PPE bought by states, and warn citizens not to use PPE. It's a world where most of the focus on technical AGI alignment comes from our own local community, takes up a tiny fraction of the field, and most of it doesn't seem to me to be even trying by their own lights to engage with what look to me like the lethal problems.
Some people like to tell themselves that surely we'll get an AI warning shot and that will wake people up; but this sounds to me like wishful thinking from the world where the world has a competent response to the pandemic warning shot we just got.
So yes, these points are correlated. The ability to solve one of these problems is evidence of ability to solve the others, and the good news is that no amount of listing out more problems can drive my probability lower than the probability that I'm simply wrong about humanity's (future) competence. Our survival probability is greater than the product of the probability of solving each individual challenge.
The bad news is that we seem pretty deep in the competence-hole. We are not one mere hard shake away from everyone snapping to our sane-and-obvious-feeling views. You shake the world, and it winds up in some even stranger state, not in your favorite state.
(In the wake of the 2012 US presidential elections, it looked to me like there was clearly pressure in the US electorate that would need to be relieved, and I was cautiously optimistic that maybe the pressure would force the left into some sort of atheistish torch-of-the-enlightenment party and the right into some sort of libertarian individual-rights party. I, uh, wasn't wrong about there being pressure in the US electorate, but, the 2016 US presidential elections were not exactly what I was hoping for. But I digress.)
Regardless, there's a more general sense that a lot of things need to go right, from here, for us to survive; hence all the doom. And, lest you wonder what sort of single correlated already-known-to-me variable could make my whole argument and confidence come crashing down around me, it's whether humanity's going to rapidly become much more competent about AGI than it appears to be about everything else.
(This seems to me to be what many people imagine will happen to the pieces of the AGI puzzle other than the piece they’re most familiar with, via some sort of generalized Gell-Mann amnesia: the tech folk know that the technical arena is in shambles, but imagine that policy has the ball, and vice versa on the policy side. But whatever.)
So that's where we get our remaining probability mass, as far as I can tell: there's some chance I'm wrong about humanity's overall competence (in the nearish future); there's some chance that this whole model is way off-base for some reason; and there's a teeny chance that we manage to walk this particular tightrope, traverse this particular obstacle course, and thread this particular needle.
And again, I stress that the above is a toy model, rather than a full rendering of all my beliefs on the issue. Though my real model does say that a bunch of things have to go right, if we are to succeed from here.
In stark contrast to the multiple people I’ve talked to recently who thought I was arguing that there's a small chance of ruin, but the expected harm is so large as to be worth worrying about. No.