Review

Summary: Yudkowsky argues that an unaligned AI will figure out a way to create self-replicating nanobots, and merely having internet access is enough to bring them into existence. Because of this, it can very quickly replace all human dependencies for its existence and expansion, and thus pursue an unaligned goal, e.g. making paperclips, which will most likely end up in the extinction of humanity.

I however will write below why I think this description massively underestimates the difficulty in creating self-replicating nanobots (even assuming that they are physically possible), which requires focused research in the physical domain, and is not possible without involvement of top-tier human-run labs today.

Why it matters? Some of the assumptions of pessimistic AI alignment researchers, especially by Yudkowsky, rest fundamentally on the fact that the AI will find quick ways to replace humans required for the AI to exist and expand.

  • We have to get AI alignment right the first time we build a Super-AI, and there are no ways to make any corrections after we've built it
    • As long as the AI does not have a way to replace humans outright, even if its ultimate goal may be non-aligned, it can pursue proximate goals that are aligned and safe for it to do. Alignment research can continue and can attempt to make the AI fully aligned or shut it down before it can create nanobots.
  • The first time we build a Super-AI, we don't just have to make sure it's aligned, but we need it to perform a pivotal act like create nanobots to destroy all GPUs
    • I argue below that this framing may be bad because it means performing one of the most dangerous steps first — creating nanobots — which may be best performed by an AI that is much more aligned than a first attempt

What this post is not about: I make no argument about the feasibility of (non-biological) self-replicating nanobots. There may be fundamental reasons why they are impossible/difficult (even for superintelligent AIs)/will not outcompete biological life, an interesting question that is explored more by bhaut. I also don't claim that AI alignment doesn't matter. I think that it's extremely important, but I think it's unlikely that (1) a one-shot process will lead to it and also (2) that one-shot is necessary; I actually think that this kind of thinking increases risk.

Finally, I don't claim that there aren't easier ways to kill all, or almost all, humans, for example pandemics or causing nuclear wars. However, most of these scenarios do not leave any good paths for an AI to expand because there would be no way to get more of its substrate (e.g. GPUs) or power supplies.

Core argument

Building something like a completely new type of nanobot is a massive research undertaking. Even under the assumption that an AI is much more intelligent and can learn and infer from much less data, it cannot do so from no data.

Building a new type of nanobot (not based on biological life) requires not just the ability to design from existing capabilities, but actually doing completely new experiments on how the nanomachinery that is going to be used to do this interacts with itself and the external world. It isn't possible to completely cut out all experiments from the design process, because at least some of the experiments will be about how the physical world works. If you don't know anything about physics, you clearly can't design any kind of machine; I am pretty certain that right now we do not know enough about nanomachines to design a new kind of non-biological self-replicating nanobot that immediately works out of the box.

To build it, you would need high quality labs to do very well specified experiments, build prototypes in later stages and report detailed information on how they failed, until you could arrive at a first sample of a self-replicating nanobot, at which point the AGI might be in a position to replace all humans.

Counterargument 1: We can build some complex machines from blueprints, and they work the first time. As an example, we can certainly design a complex electronics product, manufacture the PCB and add all the chips and other parts. If an experienced engineer does this, there is a good chance it will work the first time. However, new nanomachines would be different, because they cannot be assembled from parts that are already extremely well studied in isolation. We make chips such that when they are used in their specified way, their behaviour is extremely predictable, but no such components currently exist in the world of nanomachines.

Counterargument 2: The AI can simply simulate everything instead of performing any physical experiments. All of the required laws of physics are known: The standard model describes the microscopic world extremely well, and (microscopic) gravity is irrelevant for constructing nanobot, so no (currently unknown) physical theory unifying all laws would be required. While it is indeed possible or even likely that the standard model theoretically describes all details of a working nanobot with the required precision, the problem is that in practice it is impossible to simulate large physical systems using it. Many complex physical systems are still largely modelled empirically (ad-hoc models validated using experiments) rather than it being possible to derive them from first principles. While physicists sometimes claim to derive things from first principles, in practice these derivations often ignore a lot of details which still has to be justified using experiments. An AI can also make progress on better simulation, but simulating complex nanomachines outright is exceedingly unlikely.

Counterargument 3: Nanobots already exist, the AI will just use existing biology. Existing biology is indeed good for making self-replicating nanobots, but at least two difficult problems will remain: To make any kind of effective use of the network of nanobots, it will require creating a communication network using cells that allows them to come together to execute some more complex software to at least connect to the internet (and thus back to the AI). That's still a monumental task to achieve using biological systems and would still require a lot of research.

Counterargument 4: The AI can do the experiments in secret, or hide the true nature of the experiments in things that seem aligned. This could certainly be relevant in the long run, especially if we want the AI to solve complex problems. But on shorter timescales, most of what the AI would need to learn is going to be extremely specific to nanomachines. You do not get data about this by making completely unrelated experiments that do not involve nanotechnology.

Significance for AI alignment

I don't claim that this means we don't need alignment, or that an AI won't eventually be able to build nanobots (if it is feasible at all, of course) — just that it seems highly possible to delay this step by years, if it is the intention of the operator to do so (and it has some minimal cooperation on this from the rest of the world).

This means that it is possible to study AIs with capabilities potentially far exceeding human capabilities. Alignment is likely an iterative process and no one-shot solution exists, but that's probably ok, because well-enough aligned AIs can coexist with humans, be studied, and be improved for the next iteration, without immediately seeing human bodies only as bags of atoms to be harvested to do other things.

Pivotal acts

I think pivotal acts may be a bad idea in general. The arguments for this have been spelled out before, for example by Andrew_Critch. However, even if one believes (a) in the feasibility of nanobots and (b) pivotal acts are necessary, then using nanobots to carry out a pivotal action might be a really bad idea.

If someone decides that a pivotal act should be carried out using nanobots (either on their own or by this being the suggested best option by an AI), they might be inclined to do anything to perform any physical acts necessary for the AI to achieve this, making the AI much more dangerous if it is not perfectly aligned (which in itself may be an impossible problem). Pivotal acts that do not require giving an AI full human-equivalent or better physical capabilities would be much safer (probably still a bad idea).

How could this argument change in the future

I think my argument that building nanobots without massive help from first-tier human labs is true now and for at least several more years. However, over several decades, some things might change substantially, for example:
1. Production processes could be much more automated than they are now. If factories exist that can make new, complex machines without major retooling, they could make it much simpler for an AI to perform completely new tasks in the physical world with minimal human interaction
2. Robotics can advance. Humanoid robots that can peform many physical human tasks may make it possible for the AI to build completely human-independent labs.
3. More research into building nanomachines that eliminates more of the unknowns.
4. More biotech research could also allow more control of the physical world, for example if cell networks can be built to perform some tasks.
5. It is maybe possible that quantum computers are powerful enough to simulate much more complex physical processes than is possible on classical computers, and thus an AI with access to a quantum computer may be able to massively reduce the number of experiments necessary to construct nanobots. (Feels unlikely to me but cannot a priori be excluded)

So whether an AI can achieve nanobots just via internet access will potentially have to be re-evaluated in the future when one or more of these are developed.

What this shouldn't be taken as

I am not arguing alignment is not important, in fact I think it is very important.

1. Regardless of the feasibility of nanobots, I think there are probably vastly easier ways to kill all humans, however they would leave an AI without a practical way to continue existing or expanding.
2. It is also possible that many scenarios exist where an AI does (1) by accident.
3. AIs don't need nanobots to take control of humans and human institutions. There are many other ways that involve using humans against each other and are probably exploitable by much less powerful AIs. (Crucially, however, they do depend on some humans and might require different tools to control AGI risk.)
4. I don't think that this makes the AI alignment trivial to solve, or claim that this gives a recipe to solve it. I just think that it may be fruitful to look into research that starts from moderately aligned AIs and figures out how to get them more aligned rather than having to perform a very risky one-shot experiment.

New Comment
4 comments, sorted by Click to highlight new comments since:

Lacking time right now for a long reply:  The main thrust of my reaction is that this seems like a style of thought which would have concluded in 2008 that it's incredibly unlikely for superintelligences to be able to solve the protein folding problem.  People did, in fact, claim that to me in 2008.  It furthermore seemed to me in 2008 that protein structure prediction by superintelligence was the hardest or least likely step of the pathway by which a superintelligence ends up with nanotech; and in fact I argued only that it'd be solvable for chosen special cases of proteins rather than biological proteins because the special-case proteins could be chosen to have especially predictable pathways.  All those wobbles, all those balanced weak forces and local strange gradients along potential energy surfaces!  All those nonequilibrium intermediate states, potentially with fragile counterfactual dependencies on each interim stage of the solution!  If you were gonna be a superintelligence skeptic, you might have claimed that even chosen special cases of protein folding would be unsolvable.  The kind of argument you are making now, if you thought this style of thought was a good idea, would have led you to proclaim that probably a superintelligence could not solve biological protein folding and that AlphaFold 2 was surely an impossibility and sheer wishful thinking.

If you'd been around then, and said, "Pre-AGI ML systems will be able to solve general biological proteins via a kind of brute statistical force on deep patterns in an existing database of biological proteins, but even superintelligences will not be able to choose special cases of such protein folding pathways to design de novo synthesis pathways for nanotechnological machinery", it would have been a very strange prediction, but you would now have a leg to stand on.  But this, I most incredibly doubt you would have said - the style of thinking you're using would have predicted much more strongly, in 2008 when no such thing had been yet observed, that pre-AGI ML could not solve biological protein folding in general, than that superintelligences could not choose a few special-case solvable de novo folding pathways along sharper potential energy gradients and with intermediate states chosen to be especially convergent and predictable.

When you describe the "emailing protein sequences -> nanotech" route, are you imagining an AGI with computers on which it can run code (like simulations)?  Or do you claim that the AGI could design the protein sequences without writing simulations, by simply thinking about it "in its head"?

At the superintelligent level there's not a binary difference between those two clusters.  You just compute each thing you need to know efficiently.

Counterargument 2 still seems correct to me.

Techniques like Density Functional Theory give pretty-accurate results for molecular systems in an amount of time far less than a full quantum mechanical calculation would take. While in theory quantum computing is a complexity class beyond what classical computers can handle, in practice it seems that it's possible to get quite good results, even on a classical computer. The hardness of simulating atoms and molecules looks like it depends heavily on progress in algorithms, rather than being based on the hardness of simulating arbitrary quantum circuits. Even we humans are continuing to research and come up with improved techniques for molecular simulation. Look up "Ferminet" for one relatively recent AI-based advance. The concern is that a superintelligence may be able to skip ahead in this timeline of algorithmic improvements.

While physicists sometimes claim to derive things from first principles, in practice these derivations often ignore a lot of details which still has to be justified using experiments.

Other ways that approximations can be justified:

  • Using a known-to-be-accurate but more expensive simulation technique to validate a newer less expensive technique.
  • Proving bounds on the error.
  • Comparing with known quantities and results. Plenty of experimental data is already on the internet, including things like the heating curve of water, which depends on the detailed interactions of a very large number of water molecules.

simulating complex nanomachines outright is exceedingly unlikely.

As you mentioned above, once you have a set of basic components in your toolbox that are well understood by you, the process of designing things becomes much easier. So you only really need the expensive physics simulations for designing your basic building blocks. After that, you can coarse-grain these blocks in the larger design you're building. When designing transistors, engineers have to worry about the geometry of the transistor and use detailed simulations of how charge carriers will flow in the semiconductor. In a circuit simulator like LTSpice that's all abstracted away.