Dangers of Closed-Loop AI

Gordon Seidoh Worley

In control theory, an open-loop (or non-feedback) system is one where inputs are independent of outputs. A closed-loop (or feedback) system is one where outputs are input back into the system.

In theory, open-loop systems exist. In reality, no system is truly open-loop because systems are embedded in the physical world where isolation of inputs from outputs cannot be guaranteed. Yet in practice we can build systems that are effectively open-loop by making them ignore weak and unexpected input signals.

Open-loop systems execute plans, but they definitionally can't change their plans based on the results of their actions. An open-loop system can be designed or trained to be good at achieving a goal, but it can't actually do any optimization itself. This ensures that some other system, like a human, must be in the loop to make it better at achieving its goals.

A closed-loop system has the potential to self-optimize because it can observe how effective its actions are and change its behavior based on those observations. For example, an open-loop paperclip-making-machine can't make itself better at making paperclips if it notices it's not producing as many paperclips as possible. A closed-loop paperclip-making-machine can, assuming its designed with circuits that allow it to respond to the feedback in a useful way.

AIs are control systems, and thus can be either open- or close-loop. I posit that open-loop AIs are less likely to pose an existential threat than closed-loop AIs. Why? Because open-loop AIs require someone to make them better, and that creates an opportunity for a human to apply judgement based on what they care about. For comparison, a nuclear dead hand device is potentially much more dangerous than a nuclear response system where a human must make the final decision to launch.

This suggests a simple policy to reduce existential risks from AI: restrict the creation of closed-loop AI. That is, restrict the right to produce AI that can modify its behavior (e.g. self-improve) without going through a training process with a human in the loop.

There are several obvious problems with this proposal:

No system is truly open-loop.
A closed-loop system can easily be created by combining 2 or more open-loop systems into a single system.
Systems may look like they are open-loop at one level of abstraction but really be closed-loop at another (e.g. an LLM that doesn't modify its model, but does use memory/context to modify its behavior).
Closed-loop AIs can easily masquerade as open-loop AIs until they've already optimized towards their target enough to be uncontrollable.
Open-loop AIs are still going to be improved. They're part of closed-loop systems with a human in the loop, and can still become dangerous maximizers.

Despite these issues, I still think that, if I were designing a policy to regulate the development of AI, I would include something to place limits on closed-loop AI. A likely form would be a moratorium on autonomous systems that don't include a human in the loop, and especially a moratorium on AIs that are used to either improve themselves or train other AIs. I don't expect such a moratorium to eliminate existential risks from AI, but I do think it could meaningfully reduce the risk of run-away scenarios where humans get cut out before we have a chance to apply our judgement to prevent undesirable outcomes. If I had to put a number on it, such a moratorium perhaps makes us 20% safer.

Author's note: None of this is especially original. I've been saying some version of what's in this post for 10 years to people, but I realized I've never written it down. Most similar arguments I've seen don't use the generic language of control theory and instead are expressed in terms of specific implementations like online vs. offline learning or in terms of recursive self-improvement, and I think it's worthing writing down the general argument without regard to specifics of how any particular AI works.

Perhaps it would be helpful to provide some examples of how closed-loop AI optimization systems are used today - this may illuminate the negative consequences of generalized policy to restrict their implementation.

The majority of advanced process manufacturing systems use some form of closed-loop AI control (Model Predictive Control) that incorporate neural networks for state estimation, and even neural nets for inference on the dynamics of the process (how does a change in a manipulated variable lead to a change in a target control variable, and how do these changes evolve over time). The ones that don't use neural nets use some sort of symbolic regression algorithm that can handle high dimensionality, non-linearity and multiple competing objective functions.

These systems have been in place since the mid-90s (and are in fact one of the earliest commercial applications of neural nets - check the patent history)

Self driving cars, autonomous mobile robots, unmanned aircraft, etc - all of these things are closed-loop AI optimization systems. Even advanced HVAC systems, ovens and temperature control systems adopt these techniques.

These systems are already constrained in the sense that limitations are imposed on the degree and magnitude of adaptation that is allowed to take place. For example - the rate, direction and magnitude of a change to a manipulated variable is constrained by upper and lower control limits, and other factors that account for safety and robustness.

To determine whether (or how) rules around 'human in the loop' should be enforced, we should start by acknowledging how control engineers have solved similar problems in applications that are already ubiquitous in industry.

These systems have been in place since the mid-90s (and are in fact one of the earliest commercial applications of neural nets - check the patent history)

AI ALIGNMENT FORUM
AF

AI ALIGNMENT FORUM
AF

18

Dangers of Closed-Loop AI

18