The ground of optimization

This work was supported by OAK, a monastic community in the Berkeley hills. This document could not have been written without the daily love of living in this beautiful community. The work involved in writing this cannot be separated from the sitting, chanting, cooking, cleaning, crying, correcting, fundraising, listening, laughing, and teaching of the whole community.


What is optimization? What is the relationship between a computational optimization process — say, a computer program solving an optimization problem — and a physical optimization process — say, a team of humans building a house?

We propose the concept of an optimizing system as a physically closed system containing both that which is being optimized and that which is doing the optimizing, and defined by a tendency to evolve from a broad basin of attraction towards a small set of target configurations despite perturbations to the system. We compare our definition to that proposed by Yudkowsky, and place our work in the context of work by Demski and Garrabrant’s Embedded Agency, and Drexler’s Comprehensive AI Services. We show that our definition resolves difficult cases proposed by Daniel Filan. We work through numerous examples of biological, computational, and simple physical systems showing how our definition relates to each.

Introduction

In the field of computer science, an optimization algorithm is a computer program that outputs the solution, or an approximation thereof, to an optimization problem. An optimization problem consists of an objective function to be maximized or minimized, and a feasible region within which to search for a solution. For example we might take the objective function as a minimization problem and the whole real number line as the feasible region. The solution then would be and a working optimization algorithm for this problem is one that outputs a close approximation to this value.

In the field of operations research and engineering more broadly, optimization involves improving some process or physical artifact so that it is fit for a certain purpose or fulfills some set of requirements. For example, we might choose to measure a nail factory by the rate at which it outputs nails, relative to the cost of production inputs. We can view this as a kind of objective function, with the factory as the object of optimization just as the variable x was the object of optimization in the previous example.

There is clearly a connection between optimizing the factory and optimizing for x, but what exactly is this connection? What is it that identifies an algorithm as an optimization algorithm? What is it that identifies a process as an optimization process?

The answer proposed in this essay is: an optimizing system is a physical process in which the configuration of some part of the universe moves predictably towards a small set of target configurations from any point in a broad basin of optimization, despite perturbations during the optimization process.

We do not imagine that there is some engine or agent or mind performing optimization, separately from that which is being optimized. We consider the whole system jointly — engine and object of optimization — and ask whether it exhibits a tendency to evolve towards a predictable target configuration. If so, then we call it an optimizing system. If the basin of attraction is deep and wide then we say that this is a robust optimizing system.

An optimizing system as defined in this essay is known in dynamical systems theory as a dynamical system with one or more attractors. In this essay we show how this framework can help to understand optimization as manifested in physically closed systems containing both engine and object of optimization.

In this way we find that optimizing systems are not something that are designed but are discovered. The configuration space of the world contains countless pockets shaped like small and large basins, such that if the world should crest the rim of one of these pockets then it will naturally evolve towards the bottom of the basin. We care about them because we can use our own agency to tip the world into such a basin and then let go, knowing that from here on things will evolve towards the target region.

All optimization basins have a finite extent. A ball may roll to the center of a valley if initially placed anywhere within the valley, but if it is placed outside the valley then it will roll somewhere else entirely, or perhaps will not roll at all. Similarly, even a very robust optimizing system has an outer rim to its basin of attraction, such that if the configuration of the system is perturbed beyond that rim then the system no longer evolves towards the target that it once did. When an optimizing system deviates beyond its own rim, we say that it dies. An existential catastrophe is when the optimizing system of life on Earth moves beyond its own outer rim.

Example: computing the square root of two

Say I ask my computer to compute the square root of two, for example by opening a python interpreter and typing:

>>> print(math.sqrt(2))
1.41421356237

The value printed here is actually calculated by solving an optimization problem. It works roughly as follows. First we set up an objective function that has as its minimum value the square root of two. One function we could use is

Next we pick an initial estimate for the square root of two, which can be any number whatsoever. Let’s take 1.0 as our initial guess. Then we take a gradient step in the direction indicated by computing the slope of the objective function at our initial estimate:

Then we repeat this process of computing the slope and updating our estimate over and over, and our optimization algorithm quickly converges to the square root of two:

This is gradient descent, and it can be implemented in a few lines of python code:

	current_estimate = 1.0
	step_size = 1e-3
	while True:
		objective = (current_estimate**2 - 2) ** 2
		gradient = 4 * current_estimate * (current_estimate**2 - 2)
		if abs(gradient) < 1e-8:
			break
		current_estimate -= gradient * step_size

But this program has the following unusual property: we can modify the variable that holds the current estimate of the square root of two at any point while the program is running, and the algorithm will still converge to the square root of two. That is, while the code above is running, if I drop in with a debugger and overwrite the current estimate while the loop is still executing, what will happen is that the next gradient step will start correcting for this perturbation, pushing the estimate back towards the square root of two:

If we give the algorithm time to converge to within machine precision of the actual square root of two then the final output will be bit-for-bit identical to the result we would have gotten without the perturbation.

Consider this for a moment. For most kinds of computer code, overwriting a variable while the code is running will either have no effect because the variable isn’t used, or it will have a catastrophic effect and the code will crash, or it will simply cause the code to output the wrong answer. If I use a debugger to drop in on a webserver servicing an http request and I overwrite some variable with an arbitrary value just as the code is performing a loop in which this variable is used in a central way, bad things are likely to happen! Most computer code is not robust to arbitrary in-flight data modifications.

But this code that computes the square root of two is robust to in-flight data modifications, or at least the "current estimate" variable is. It’s not that our perturbation has no effect: if we change the value, the next iteration of the algorithm will compute the objective function and its slope at a completely different point, and each iteration after that will be different to how it would have been if we hadn’t intervened. The perturbation may change the total number of iterations before convergence is reached. But ultimately the algorithm will still output an estimate of the square root of two, and, given time to fully converge, it will output the exact same answer it would have output without the perturbation. This is an unusual breed of computer program indeed!

What is happening here is that we have constructed a physical system consisting of a computer and a python program that computes the square root of two, such that:

  • for a set of starting configurations (in this case the set of configurations in which the "current estimate" variable is set to each representable floating point number),

  • the system exhibits a tendency to evolve towards a small set of target configurations (in this case just the single configuration in which the "current estimate" variable is set to the square root of two),

  • and this tendency is robust to in-flight perturbations to the system’s configuration (in this case robustness is limited to just the dimensions corresponding to changes in the "current estimate" variable).

In this essay I argue that systems that converge to some target configuration, and will do so despite perturbations to the system, are the systems we should rightly call "optimizing systems".

Example: building a house

Consider a group of humans building a house. Let us consider the humans together with the building materials and construction site as a single physical system. Let us imagine that we assemble this system inside a completely closed chamber, including food and sleeping quarters for the humans, lighting, a power source, construction materials, construction blueprint, as well as the physical humans with appropriate instructions and incentives to build the house. If we just put these physical elements together we get a system that has a tendency to evolve under the natural laws of physics towards a configuration in which there is a house matching the blueprint.

We could perturb the system while the house is being built — say by dropping in at night and removing some walls or moving some construction materials about — and this physical system will recover. The team of humans will come in the next day and find the construction materials that were moved, put in new walls to replace the ones that were removed, and so on.

Just like the square root of two example, here is a physical system with:

  • A basin of attraction (all the possible arrangements of viable humans and building materials)

  • A target configuration set that is small relative to the basin of attraction (those in which the building materials have been arranged into a house matching the design)

  • A tendency to evolve towards the target configurations when starting from any point within the basin of attraction, despite in-flight perturbations to the system

Now this system is not infinitely robust. If we really scramble the arrangement of atoms within this system then we’ll quickly wind up with a configuration that does not contain any humans, or in which the building materials are irrevocably destroyed, and then we will have a system without the tendency to evolve towards any small set of final configurations.

In the physical world we are not surprised to find systems that have this tendency to evolve towards a small set of target configurations. If I pick up my dog while he is sleeping and move him by a few inches, he still finds his way to his water bowl when he wakes up. If I pull a piece of bark off a tree, the tree continues to grow in the same upward direction. If I make a noise that surprises a friend working on some math homework, the math homework still gets done. Systems that contain living beings regularly exhibit this tendency to evolve towards target configurations, and tend to do so in a way that is robust to in-flight perturbations. As a result we are familiar with physical systems that have this property, and we are not surprised when they arise in our lives.

But physical systems in general do not have the tendency to evolve towards target configurations. If I move a billiard ball a few inches to the left while a bunch of billiard balls are energetically bouncing around a billiard table, the balls are likely to come to rest in a very different position than if I had not moved the ball. If I change the trajectory of a satellite a little bit, the satellite does not have any tendency to move back into its old orbit.

The computer systems that we have built are still, by and large, more primitive than the living systems that we inhabit, and most computer systems do not have the tendency to evolve robustly towards some set of target configurations, so optimization algorithms as discussed in the previous section, which do have this property, are somewhat unusual.

Defining optimization

An optimizing system is a system that has a tendency to evolve towards one of a set of configurations that we will call the target configuration set, when started from any configuration within a larger set of configurations, which we call the basin of attraction, and continues to exhibit this tendency with respect to the same target configuration set despite perturbations.

Some systems may have a single target configuration towards which they inevitably evolve. Examples are a ball in a steep valley with a single local minimum, and a computer computing the square root of two. Other systems may have a set of target configurations and perturbing the system may cause it to evolve towards a different member of this set. Examples are a ball in a valley with multiple local minima, or a tree growing upwards (perturbing the tree by, for example, cutting off some branches while it is growing will probably change its final shape, but will not change its tendency to grow towards one of the configurations in which it has reached its maximum size).

We can quantify optimizing systems in the following ways.

Robustness. Along how many dimensions can we perturb the system without altering its tendency to evolve towards the target configuration set? What magnitude perturbation can the system absorb along these dimensions? A self-driving car navigating through a city may be robust to perturbations that involve physically moving the car to a different position on the road in the city, but not to perturbations that involve changing the state of physical memory registers that contain critical bits of computer code in the car’s internal computer.

Duality. To what extent can we identify subsets of the system corresponding to "that which is being optimized" and "that which is doing the optimization"? Between engine and object of optimization; between agent and world. Highly dualistic systems may be robust to perturbations of the object of optimization, but brittle with respect to perturbations of the engine of optimization. For example, a system containing a 2020s-era robot moving a vase around is a dualistic optimizing system: there is a clear subset of the system that is the engine of optimization (the robot), and object of optimization (the vase). Furthermore, the robot may be able to deal with a wide variety of perturbations to the environment and to the vase, but there are likely to be numerous small perturbations to the robot itself that will render it inert. In contrast, a tree is a non-dualistic optimizing system: the tree does grow towards a set of target configurations, but it makes no sense to ask which part of the tree is "doing" the optimization and which part is "being" optimized. This latter example is discussed further below.

Retargetability. Is it possible, using only a microscopic perturbation to the system, to change the system such that it is still an optimizing system but with a different target configuration set? A system containing a robot with the goal of moving a vase to a certain location can be modified by making just a small number of microscopic perturbations to key memory registers such that the robot holds the goal of moving the vase to a different location and the whole vase/robot system now exhibits a tendency to evolve towards a different target configuration. In contrast, a system containing a ball rolling towards the bottom of a valley cannot generally be modified by any microscopic perturbation such that the ball will roll to a different target location. A tree is an intermediate example: to cause the tree to evolve towards a different target configuration set — say, one in which its leaves were of a different shape — one would have to modify the genetic code simultaneously in all of the tree’s cells.

Relationship to Yudkowsky’s definition of optimization

In Measuring Optimization Power, Eliezer Yudkowsky defines optimization as a process in which some part of the world ends up in a configuration that is high in an agent’s preference ordering, yet has low probability of arising spontaneously. Yudkowsky’s definition asks us to look at a patch of the world that has already undergone optimization by an agent or mind, and draw conclusions about the power or intelligence of that mind by asking how unlikely it would be for a configuration of equal or greater utility (to the agent) to arise spontaneously.

Our definition differs from this in the following ways:

  • We look at whole systems that evolve naturally under physical laws. We do not assume that we can decompose these systems into some engine and object of optimization, or into mind and environment. We do not look at systems that are "being optimized" by some external entity but rather at "optimizing systems" that exhibit a natural tendency to evolve towards a target configuration set. These optimizing systems may contain subsystems that have the properties of agents, but as we will see there are many instances of optimizing systems that do not contain dualistic agentic subsystems.

  • When discerning the boundary between optimization and non-optimization, we look principally at robustness — whether the system will continue to evolve towards its target configuration set in the face of perturbations — whereas Yudkowsky looks at the improbability of the final configuration.

Relationship to Drexler’s Comprehensive AI Services

Eric Drexler has written about the need to consider AI systems that are not goal-directed agents. He points out that the most economically important AI systems today are not constructed within the agent paradigm, and that in fact agents represent just a tiny fraction of the design space of intelligent systems. For example, a system that identifies faces in images would be an intelligent system but not an agent according to Drexler’s taxonomy. This perspective is highly relevant to our discussion here since we seek to go beyond the narrow agent model in which intelligent systems are conceived of as unitary entities that receive observations from the environment, send actions back into the environment, but are otherwise separate from the environment.

Our perspective is that there is a specific class of intelligent systems — which we call optimizing systems — that are worthy of special attention and study due to their potential to reshape the world. The set of optimizing systems is smaller than the set of all AI services, but larger than the set of goal-directed agentic systems.

Figure: relationship between our optimizing system concept and Drexler’s taxonomy of AI systems

Examples of systems that lie in each of these three tiers are as follows:

  • A system that identifies faces in images by evaluating a feed-forward neural network is an AI system but not an optimizing system.

  • A tree is an optimizing system but not a goal-directed agent system (see section below analyzing a tree as an optimizing system).

  • A robot with the goal of moving a ball to a specific destination is a goal-directed agent system.

Relationship to Garrabrant and Demski’s Embedded Agency

Scott Garrabrant and Abram Demski have written about the many ways that a dualistic view of agency in which one conceives of a hard separation between agent and environment fails to capture the reality of agents that are reducible to the same basic building-blocks as the environments in which they are embedded. They show that if one starts from a dualistic view of agency then it is difficult to design agents capable of reflecting on and making improvements to their own cognitive processes, since the dualistic view of agency rests on a unitary agent whose cognition does not affect the world except via explicit actions. They also show that reasoning about counterfactuals becomes nonsensical if starting from a dualistic view of agency, since the agent’s cognitive processes are governed by the same physical laws as those that govern the environment, and the agent can come to notice this fact, leading to confusion when considering the consequences actions that are different from the actions that the agent will, in fact, output.

One could view the Embedded Agency work as enumerating the many logical pitfalls one falls into if one takes the "optimizer" concept as the starting point for designing intelligent systems, rather than "optimizing system" as we propose here. The present work is strongly inspired by Garrabrant and Demski’s work. Our hope is to point the way to a view of optimization and agency that captures reality sufficiently well to avoid the logical pitfalls identified in the Embedded Agency work.

Example: ball in a valley

Consider a physical ball rolling around in a small valley. According to our definition of optimization, this is an optimizing system:

Configuration space. The system we are studying consists of the physical valley plus the ball

Basin of attraction. The ball could initially be placed anywhere in the valley (these are the configurations comprising the basin of attraction)

Target configuration set. The ball will roll until it ends up at the bottom of the valley (the set of local minima are the target configurations)

We can perturb the ball while it is "in flight", say by changing its position or velocity, and the ball will still ultimately end up at one of the target configurations. This system is robust to perturbations along dimensions corresponding to the spatial position and velocity of the ball, but there are many more dimensions along which this system is not robust. If we change the shape of the ball to a cube, for example, then the ball will not continue rolling to the bottom of the valley.

Example: ball in valley with robot

Consider now a ball in a valley as above, but this time with the addition of an intelligent robot holding the goal of ensuring that the ball reaches the bottom of the valley.

Configuration space. The system we are studying now consists of the physical valley, the ball, and the robot. We consider the evolution of and perturbations to this whole joint system.

Target configuration set. As before, the target configuration is the ball being at the bottom of the valley

Basin of attraction. As before, the basin of attraction consists of all the possible spatial locations that the ball could be placed in the valley.

We can now perturb the system along many more dimensions than in the case where there was no robot. For example, we could introduce a barrier that prevents the ball from rolling downhill past a certain point, and we can then expect a sufficiently intelligent robot to move the ball over the barrier. We can expect a sufficiently well-designed robot to be able to overcome a wide variety of hurdles that gravity would not overcome on its own. Therefore we say that this system is more robust than the system without the robot.

There is a sequence of systems spanning the gap between a ball rolling in a valley, which is robust to a narrow set of perturbations and therefore we say exhibits a weak degree of optimization, up to a robot with a goal of moving a ball around in a valley, which is robust to a much wider set of perturbations, and therefore we say exhibits a stronger degree of optimization. Therefore the difference between systems that do and do not undergo optimization is not a binary distinction but a continuous gradient of increasing robustness to perturbations.

By introducing the robot to the system we have also introduced new dimensions along which the system is fragile: the dimensions corresponding to modifications to the robot itself, and in particular the dimensions corresponding to modifications to the code running on the robot (i.e. physical perturbations to the configuration of the memory cells in which the code is stored). There are two types of perturbation we might consider:

  • Perturbations that destroy the robot. There are numerous ways we could cut wires or scramble computer code that would leave the robot completely non-operational. Many of these would be physically microscopic, such as flipping a single bit in a memory cell containing some critical computer code. In fact there are now more ways to break the system via microscopic perturbations compared to when we were considering a ball in a valley without a robot, since there are few ways to cause a ball not to reach the bottom of a valley by making only a microscopic perturbation to the system, but there are many ways to break modern computer systems via a microscopic perturbation.

  • Perturbations that change the target configurations. We could also make physically microscopic perturbations to this system that change the robot’s goal. For example we might flip the sign on some critical computations in the robot’s code such that the robot works to place the ball at the highest point rather than the lowest. This is still a physical perturbation to the valley/ball/robot system: it is one that affects the configuration of the memory cells containing the robot’s computer code. These kinds of perturbations may point to a concept with some similarity to that of an agent. If we have a system that can be perturbed in a way that preserves the robustness of the basin of convergence but changes the target configuration towards which the system tends to evolve, and if we can find perturbations that cause the target configurations to match our own goals, then we have a way to navigate between convergence basins.

Example: computer performing gradient descent

Consider now a computer running an iterative gradient descent algorithm in order to solve an optimization problem. For concreteness let us imagine that the objective function being optimized is globally convex, in which case the algorithm will certainly reach the global optimum given sufficient time. Let us further imagine that the computer stores its current best estimate of the location of the global optimum (which we will henceforth call the "optimizand") at some known memory location, and updates this after every iteration of gradient descent.

Since this is a purely computational process, it may be tempting to define the configuration space at the computational level — for example by taking the configuration space to be the domain of the objective function. However, it is of utmost importance when analyzing any optimizing system to ground our analysis in a physical system evolving according to the physical laws of nature, just as we have for all previous examples. The reason this is important is to ensure that we always study complete systems, not just some inert part of the system that is "being optimized" by something external to the system. Therefore we analyze this system as follows.

Configuration space. The system consists of a physical computer running some code that performs gradient descent. The configurations of the system are the physical configurations of the atoms comprising the computer.

Target-configuration set. The target configuration set consists of the set of physical configurations of the computer in which the memory cells that store the current optimized state contain the true location of the global optimum (or the closest floating point representation of it).

Basin of attraction. The basin of attraction consists of the set of physical configurations in which there is a viable computer and it is running the gradient descent algorithm.

Example: billiard balls

Let us now examine a system that is not an optimizing system according to our definition. Consider a billiard table with some billiard balls that are currently bouncing around in motion. Left alone, the balls will eventually come to rest in some configuration. Is this an optimizing system?

In order to qualify as an optimizing system, a system must (1) have a tendency to evolve towards a set of target configurations that are small relative to the basin of attraction, and (2) continue to evolve towards the same set of target configurations if perturbed.

If we reach in while the billiard balls are bouncing around and move one of the balls that is in motion, the system will now come to rest in a different configuration. Therefore this is not an optimizing system, because there is no set of target configurations towards which the system evolves despite perturbations. A system does not need to be robust along all dimensions in order to be an optimizing system, but a billiard table exhibits no such robust dimensions at all, so it is not an optimizing system.

Example: satellite in orbit

Consider a second example of a system that is not an optimizing system: a satellite in orbit around Earth. Unlike the billiard balls, there is no chaotic tendency for small perturbations to lead to large deviations in the system’s evolution, but neither is there any tendency for the system to come back to some target configuration when perturbed. If we perturb the satellite’s velocity or position, then from that point on it is in a different orbit and has no tendency to return to its previous orbit. There is no set of target configurations towards which the system evolves despite perturbations, so this is not an optimizing system.

Example: a tree

Consider a patch of fertile ground with a tree growing in it. Is this an optimizing system?

Configuration space. For the sake of concreteness let us take a region of space that is sealed off from the outside world — say 100m x 100m x 100m. This region is filled at the bottom with fertile soil and at the top with an atmosphere conducive to the tree’s growth. Let us say that the region contains a single tree.

We will analyze this system in terms of the arrangement of atoms inside this region of space. Out of all the possible configurations of these atoms, the vast majority consist of a uniform hazy gas. An astronomically tiny fraction of configurations contain a non-trivial mass of complex biological nutrients making up soil. An even tinier fraction of configurations contain a viable tree.

Target-configuration set. A tree has a tendency to grow taller over time, to sprout more branches and leaves, and so on. Furthermore, trees can only grow so tall due to the physics of transporting sugars up and down the trunk. So we can identify a set of target configurations in which the atoms in our region of space are arranged into a tree that has grown to its maximum size (has sprouted as many branches and leaves as it can support given the atmosphere, the soil that it is growing in, and the constraints of its own biology). There are many topologies in which the tree’s branches could divide, many positions that leaves could sprout in, and so on, so there are many configurations within the target configuration set. But this set is still tiny compared to all the ways that the same atoms could be arranged without the constraint of forming a viable tree.

Basin of convergence. This system will evolve towards the target configuration set starting from any configuration in which there is a viable tree. This includes configurations in which there is just a seed in the ground, as well as configurations in which there is a tree of small, medium, or large size. Starting from any of these configurations, if we leave the system to evolve under the natural laws of physics then the tree will grow towards its maximum size, at which point the system will be in one of the target configurations.

Robustness to perturbations. This system is highly robust to perturbations. Consider perturbing the system in any of the following ways:

  • Moving soil from one place to another

  • Removing some leaves from the tree

  • Cutting a branch off the tree

These perturbations might change which particular target configuration is eventually reached — the particular arrangement of branches and leaves in the tree once it reaches its maximum size — but they will not stop the tree from growing taller and evolving towards a target configuration. In fact we could cut the tree right at the base of the trunk and it would continue to evolve towards a target configuration by sprouting a new trunk and growing a whole new tree.

Duality. A tree is a non-dualistic optimizing system. There is no subsystem that is responsible for "doing" the optimization, separately from that which is "being" optimized. Yet the tree does exhibit a tendency to evolve towards a set of target configurations, and can overcome a wide variety of perturbations in order to do so. There are no man-made systems in existence today that are capable of gathering and utilizing resources so flexibly as a tree, from so broad a variety of environments, and there are certainly no man-made systems that can recover from being physically dismembered to such an extent that a tree can recover from being cut at the trunk.

At this point it may be tempting to say that the engine of optimization is natural selection. But recall that we are studying just a single tree growing from seed to maximum size. Can you identify a physical subset of our 100m x 100m x 100m region of space that is this engine of optimization, analogous to how we identified a physical subset of the robot-and-ball system as the engine of optimization (i.e. the physical robot)? Natural selection might be the process by which the initial system came into existence, but it is not the process that drives the growth of the tree towards a target configuration.

It may then be tempting to say that it is the tree’s DNA that is the engine of optimization. It is true that the tree’s DNA exhibits some characteristics of an engine of optimization: it remains unchanged throughout the life of the tree, and physically microscopic perturbations to it can disable the tree. But a tree replicates its DNA in each of its cells, and perturbing just one or a small number of these is not likely to affect the tree’s overall growth trajectory. More importantly, a single strand of DNA does not really have agency on its own: it requires the molecular machinery of the whole cell to synthesize proteins based on the genetic code in the DNA, and the physical machinery of the whole tree to collect and deploy energy, water, and nutrients. Just as it would be incorrect to identify the memory registers containing computer code within a robot as the "true" engine of optimization separate from the rest of the computing and physical machinery that brings this code to life, it is not quite accurate to identify DNA as an engine of optimization. A tree simply does not decompose into engine and object of optimization.

It may also be tempting to ask whether the tree can "really" be said to be undergoing optimization in the absence of any "intention" to reach one of the target configurations. But this expectation of a centralized mind with centralized intentions is really an artifact of us projecting our view of our self onto the world: we believe that we have a centralized mind with centralized intentions, so we focus our attention on optimizing systems with a similar structure. But this turns out to be misguided on two counts: first, the vast majority of optimizing systems do not contain centralized minds, and second, our own minds are actually far less centralized than we think! For now we put this question of whether optimization requires intentions and instead just work within our definition of optimizing systems, which a tree definitely satisfies.

Example: bottle cap

Daniel Filan has pointed out that some definitions of optimization would nonsensically classify a bottle cap as an optimizer, since a bottle cap causes water molecules in a bottle to stay inside the bottle, and the set of configurations in which the molecules are inside a bottle is much smaller than the set of configurations in which the molecules are each allowed to take a position either inside or outside the bottle.

In our framework we have the following:

  • The system consists of a bottle, a bottle cap, and water molecules. The configuration space consists of all the possible spatial arrangements of water molecules, either inside or outside the bottle.

  • The basin of attraction is the set of configurations in which the water molecules are inside the bottle

  • The target configuration set is the same as the basin of attraction

This is not an optimizing system for two reasons.

First, the target configuration set is no smaller than the basin of attraction. To be an optimizing system there must be a tendency to evolve from any configuration within a basin of attraction towards a smaller target configuration set, but in this case the system merely remains within the set of configurations in which the water molecules are inside the bottle. This is no different from a rock sitting on a beach: due to basic chemistry there is a tendency to remain within the set of configurations in which the molecules comprising the rock are physically bound to one another, but it has no tendency to evolve from a wide basin of attraction towards a small set of target configuration.

Second, the bottle cap system is not robust to perturbations since if we perturb the position of a single water molecule so that it is outside the bottle, there is no tendency for it to move back inside the bottle. This is really just the first point above restated, since if there were a tendency for water molecules moved outside the bottle to evolve back towards a configuration in which all the water molecules were inside the bottle, then we would have a basin of attraction larger than the target configuration set.

Example: the human liver

Filan also asks whether one’s liver should be considered an optimizer. Suppose we observe a human working to make money. If this person were deprived of a liver, or if their liver stopped functioning, they would presumably be unable to make money. So are we then to view the liver as an optimizer working towards the goal of making money? Filan asks this question as a challenge to Yudkowsky’s definition of optimization, since it seems absurd to view one’s liver as an optimizer working towards the goal of making money, yet Yudkowsky’s definition of optimization might classify it as such.

In our framework we have the following:

  • The system consists of a human working to make money, together with the whole human economy and world.

  • The basin of attraction consists of the configurations in which there is a healthy human (with a healthy liver) having the goal of making money

  • The target configurations are those in which this person’s bank balance is high. (Interestingly there is no upper bound here, so there is no fixed point but rather a continuous gradient.)

We can expect that this person is capable of overcoming a reasonably broad variety of obstacles in pursuit of making money, so we recognize that this overall system (the human together with the whole economy) is an optimizing system. But Filan would surely agree on this point and his question is more specific: he is asking whether the liver is an optimizer.

In general we cannot expect to decompose optimizing systems into an engine of optimization and object of optimization. We can see that the system has the characteristics of an optimizing system, and we may identify parts, including in this case the person’s liver, that are necessary for these characteristics to exist, but we cannot in general identify any crisp subset of the system as that which is doing the optimization. And picking various subcomponents of the system (such as the person’s liver) and asking "is this the part that is doing the optimization?" does not in general have an answer.

By analogy, suppose we looked at a planet orbiting a star and asked: "which part here is doing the orbiting?" Is it the planet or the star that is the "engine of orbiting"? Or suppose we looked at a car and noticed that the fuel pump is a complex piece of machinery without which the car’s locomotion would cease. We might ask: is this fuel pump the true "engine of locomotion"? These questions don’t have answers because they mistakenly presuppose that we can identify a subsystem that is uniquely responsible for the orbiting of the planet or the locomotion of the car. Asking whether a human liver is an "optimizer" is similarly mistaken: we can see that the liver is a complex piece of machinery that is necessary in order for the overall system to exhibit the characteristics of an optimizing system (robust evolution towards a target configuration set), but beyond this it makes no more sense to ask whether the liver is a true "locus of optimization".

So rather than answering Filan’s question in either the positive or the negative, the appropriate move is to dissolve the concept of an optimizer, and instead ask whether the overall system is an optimizing system.

Example: the universe as a whole

Consider the whole physical universe as a single closed system. Is this an optimizing system?

The second law of thermodynamics tells us that the universe is evolving towards a maximally disordered thermodynamic equilibrium in which it cycles through various maxentropy configuration. We might then imagine that the universe is an optimizing system in which the basin of attraction is all possible configurations of matter and energy, and the target configuration set consists of the maxentropy configurations.

However, this is not quite accurate. Out of all possible configurations of the universe, the vast majority of configurations are at or close to maximum entropy. That is, if we sample a configuration of the universe at random, we have only an astronomically tiny chance of finding anything other than a close-to-uniform gas of basic particles. If we define the basin of attraction as all possible configurations of matter in the universe and the target configuration set as the set of maxentropy configurations, then the target configuration set actually contains almost the entirety of the basin of attraction, with the only configurations that are in the basin of attraction but not the target configuration set being the highly unusual configurations of matter containing stars, galaxies, and so on.

For this reason the universe as a whole does not qualify as an optimizing system under our definition. (Or perhaps it would be more accurate to say that it qualifies as an extremely weak optimizing system.)

Power sources and entropy

The second law of thermodynamics tells us that any closed system will eventually tend towards a maximally disordered state in which matter and energy is spread approximately uniformly through space. So if we were to isolate one of the systems explore above inside a sealed chamber and leave it for a very long period then eventually whatever power source we put inside the sealed chamber would become depleted, and then eventually after that every complex material or compound in the system would degrade into its base products, and then finally we would be left with a chamber filled with a uniform gaseous mixture of whatever base elements we originally put in.

So in this sense there are no optimizing systems at all, since any of the systems above evolve towards their target configuration sets only for a finite period of time, after which they degrade and evolve towards a maxentropy configuration.

This is not a very serious challenge to our definition of optimization since it is common throughout physics and computer science to study various "steady-state" or "fixed point" systems even though the same objection could be made about any of them. We say that a thermometer can be used to build a heat regulator that will keep the temperature of a house within a desired range, and we do not usually need to add the caveat that eventually the house and regulator will degrade into a uniform gaseous mixture due to the heat death of the universe.

Nevertheless, two possible ways to refine our definition are:

  1. We could stipulate that some power source is provided externally to each system we analyze, and then perform our analysis conditional on the existence of that power source.

  2. We could specify a finite time horizon and say that "a system is an optimizing system if it tends towards a target configuration set up to time T".

Connection to dynamical systems theory

The concept of "optimizing system" in this essay is very close to that of a dynamical system with one or more attractors. We offer the following remarks on this connection.

  • A general dynamical system is any system with a state that evolves over time as a function of the state itself. This encompasses a very broad range of systems indeed!

  • In dynamical system theory, an attractor is the term used for what we have called the target configuration set. A fixed point attractor is, in our language, a target configuration set with just one element, such as when computing the square root of two. A limit cycle is, in our language, a system that eventually stably loops through a sequence of states all of which are in the target configuration set, such as a satellite in orbit.

  • We have discussed systems that evolve towards target configurations along some dimensions but not others (e.g. ball in a valley). We have not yet discovered whether dynamical systems theory explicitly studies attractors that operate along a subset of the system’s dimensions.

  • There is a concept of "well-posedness" in dynamical systems theory that justifies the identification of a mathematical model with a physical system. The conditions for a model to be well-posed are (1) that a solution exists (i.e. the model is not self-contradictory), (2) that there is a unique solution (i.e. the model contains enough information to pick out a single system trajectory), and (3) that the solution changes continuously with the initial conditions (the behavior of the system is not too chaotic). This third condition may present an interesting avenue for future investigation as it seems related to but not quite equivalent to our notion of robustness since robustness as we define it additionally requires that the system continue to evolve towards the same attractor state despite perturbations. Exploring this connection may present an interesting avenue for future investigation.

Conclusion

We have proposed a concept that we call "optimizing systems" to describe systems that have a tendency to evolve towards a narrow target configuration set when started from any point within a broader basin of attraction, and continue to do so despite perturbations.

We have analyzed optimizing systems along three dimensions:

  • Robustness, which measures the number of dimensions along which the system is robust to perturbations, and the magnitude of perturbation along these dimensions that the system can withstand.

  • Duality, which measures the extent to which an approximate "engine of optimization" subsystem can be identified.

  • Retargetability, which measures the extent to which the system can be transformed via microscopic perturbations into an equally robust optimizing system but with a different target configuration set.

We have argued that the "optimizer" concept rests on an assumption that optimizing systems can be decomposed into engine and object of optimization (or agent and environment, or mind and world). We have described systems that do exhibit optimization yet cannot be decomposed this way, such as the tree example. We have also pointed out that, even among those systems that can be decomposed approximately into engine and object of optimization (for example, a robot moving a ball around), we will not in general be able to meaningfully answer the question of whether arbitrary subcomponents of the agent are an optimizer not (c.f. the human liver example).

Therefore, while the "optimizer" concept clearly still has much utility in designing intelligent systems, we should be cautious about taking it as a primitive in our understanding of the world. In particular we should not expect questions of the form "is X an optimizer?" to always have answers.

New Comment
50 comments, sorted by Click to highlight new comments since:

This is excellent, it feels way better as a definition of optimization than past attempts :) Thanks in particular for the academic style, specifically relating it to previous work, it made it much more accessible for me.

Let's try to build up some core AI alignment arguments with this definition.

Task: A task is simply an “environment” along with a target configuration set. Whenever I talk about a “task” below, assume that I mean an “interesting” task, i.e. something like “build a chair”, as opposed to “have the air molecules be in one of these particular configurations”.

Solving a task: An object O solves a task T if adding O to T’s environment transforms it into an optimizing system for the T’s target configuration set.

Performance on the task: If O solves task T, its performance is quantified by how quickly it reaches the target configuration set, and how robust it is to perturbations.

Generality of intelligence: The generality of O’s intelligence is a function of the number and diversity of tasks T that it can solve, as well as its performance on those tasks.

Optimizing AI: A computer program for which there exists an interesting task such that the computer program solves that task.

This isn’t exactly right, as it includes e.g. accounting programs or video games, which when paired with a human form an optimizing system for correct financials and winning the game, respectively. You might be able to fix this by saying that the optimizing system has to be robust to perturbations in any human behavior in the environment.

AGI: An optimizing AI whose generality of intelligence is at least as great as that of humans.

Argument for AI risk: As optimizing AIs become more and more general, we will apply them to more economically useful tasks T. However, they also become more and more robust to perturbations, possibly including perturbations such as “we try to turn off the AI”. As a result, we might eventually have AIs that form strong optimizing systems for some task T that isn’t the one we actually wanted, which tends to be bad due to fragility of value.

Deep learning AGI implies mesa optimization: Since deep learning is so sample inefficient, it cannot reach human levels of performance if we apply deep learning directly to each possible task T. (For example, it has to relearn how the world works separately for each task T.) As a result, if we do get AGI primarily via deep learning, it must be that we used deep learning to create a new optimizing AI system, and that system was the AGI.

Argument for mesa optimization: Due to the complexity and noise in the real world, most economically useful tasks require setting up a robust optimizing system, rather than directly creating the target configuration state. (See also the importance of feedback for more on this intuition.) It seems likely that humans will find it easier to create algorithms that then find AGIs that can create these robust optimizing systems, rather than creating an algorithm that is directly an AGI.

(The previous argument also applies: this is basically just a generalization of the previous point to arbitrary AI systems, instead of only deep learning.)

I want to note that under this approach the notion of “search” and “mesa objective” are less natural, which I see as a pro of this approach (see also here): the argument is that we’ll get a general inner optimizing AI, but it doesn’t say much about what task that AI will be optimizing for (and it could be an optimizing AI that is retargetable by human instructions).

Outer alignment: ??? Seems hard to formalize in this framework. This makes me feel like outer alignment is less important as a concept. (I also don’t particularly like formalizations outside of this framework.)

Inner alignment: Ensuring that (conditional on mesa optimization occurring) the inner AGI is aligned with the operator / user, that is, combined with the user it forms an optimizing system for “doing what the user wants”. (Note that this is explicitly not intent alignment, as it is hard to formalize intent alignment in this framework.)

Intent alignment: ??? As mentioned above, it’s hard to formalize in this framework, as intent alignment really does require some notion of “motivation”, “goals”, or “trying”, which this framework explicitly leaves out. I see this as a con of this framework.

Expected utility maximization: One particular architecture that could qualify as an AGI (if the utility function is treated as part of the environment, and not part of the AGI). I see the fact that EU maximization is no longer highlighted as a pro of this approach.

Wireheading: Special case of the argument for AI risk with a weird task of “maximize the number in this register”. Unnatural in this framing of the AI risk problem. I see this as a pro of this framing of the problem, though I expect people disagree with me on this point.

Thanks for the very thoughtful comment Rohin. I was on retreat last week after I published the article and upon returning to computer usage I was delighted by the engagement from you and others.

Generality of intelligence: The generality of O’s intelligence is a function of the number and diversity of tasks T that it can solve, as well as its performance on those tasks.

I like this.

We'll presumably need to give O some information about the goal / target configuration set for each task. We could say that a robot capable of moving a vase around is a little bit general since we can have it solve the tasks of placing the vase at many different locations by inputting some latitude/longitude into some appropriate memory location. But this means we're actually pasting in a different object O for each task T -- each of the objects differs in those memory locations into which we're pasting the latitude/longitude. It might be helpful to think of a "agent schema" function that maps goals to objects, so we take the goal part of the task, compute the object O for that goal, then paste this object into the environment.

It's also important that O be able to solve the task for a reasonably broad range of environments.

Inner alignment

Perhaps we could look at it this way: take a system containing a human that is trying to get something done. This is presumably an optimizing system as humans often robustly move their environment towards some desired target configuration set. Then an inner-aligned AI is an object O such that adding it to this environment does not change the target configuration set, but does change the speed and/or robustness of convergence to that target configuration set.

Intent alignment

Yup very difficult to say much about intentions using the pure outside view approach of this framework. Perhaps we could say that an intent-aligned AI is an inner-aligned AI modulo less robustness. Or perhaps we could say that an intent-aligned AI is an AI that would achieve the goal in a large set of benign environments, but might not achieve it in the presence of unlikely mistakes, unlikely environmental conditions, or the presence of other powerful basins of attraction.

But this doesn't really get at the spirit of Paul's idea, which I think is about really looking inside the AI and understanding its goals.

+1 to all of this.

We'll presumably need to give O some information about the goal / target configuration set for each task.

I was imagining that the tasks can come equipped with some specification, but some sort of counterfactual also makes sense. This also gets around issues of the AI system not being appropriately "motivated" -- e.g. I might be capable of performing the task "lock up puppies in cages", but I wouldn't do it, and so if you only look at my behavior you couldn't say that I was capable of doing that task.

But this doesn't really get at the spirit of Paul's idea, which I think is about really looking inside the AI and understanding its goals.

+1 especially to this

Deep learning AGI implies mesa optimization: Since deep learning is so sample inefficient, it cannot reach human levels of performance if we apply deep learning directly to each possible task T. (For example, it has to relearn how the world works separately for each task T.) As a result, if we do get AGI primarily via deep learning, it must be that we used deep learning to create a new optimizing AI system, and that system was the AGI.

I don't quite understand what this is saying.

Suppose we train a giant deep learning model via self-supervised learning on a ton of real-world data (like GPT-N, but w/ other sensory modalities besides text), and then we build a second system designed to provide a nice interface to the giant model.

We'd give task specifications to the interface, and it would have some smarts about how to consult the model to figure out what to do. (The interface might also be learned, via reinforcement or supervised learning, or it might be hand-coded.)

It seems plausible to me that a system comprising these two pieces, the model and the interface, could be an AGI according to the definition here, in that when combined with a very wide variety of environments (including the task specification in the environment), it could perform at least as well as a human.

And since most of the smarts seem like they'd be in the model rather than the interface, I'd count it as getting AGI "primarily via deep learning", even if the interface was hand-coded.

But it's not clear to me whether that would count as using deep learning to "create a new optimizing AI system", which is itself the AGI. The whole system is an Optimizing AI, according to the definition given above, but neither of the two parts is by itself, and it doesn't seem to have the flavor of mesa-optimization, as I understand it. So it seems like a contradiction to the quoted claim.

Have I misunderstood what you're saying here, or do you disagree with the characterization I gave of the hypothetical model + interface system? (Or have I perhaps misunderstood mesa-optimization?)

The whole system is an Optimizing AI, according to the definition given above, but neither of the two parts is by itself

Yeah, I'm talking about the whole system.

it doesn't seem to have the flavor of mesa-optimization

Yeah, I agree it doesn't fit the explanation / definition in Risks from Learned Optimization. I don't like that definition, and usually mean something like "running the model parameters instantiates a computation that does 'reasoning'", which I think does fit this example. I mentioned this a bit later in the comment:

I want to note that under this approach the notion of “search” and “mesa objective” are less natural, which I see as a pro of this approach [...]: the argument is that we’ll get a general inner optimizing AI, but it doesn’t say much about what task that AI will be optimizing for (and it could be an optimizing AI that is retargetable by human instructions).

Mild optimization: the easiest way to solve hard tasks may be to specify a proxy, which an AI maximizes. The AI steers into configurations which maximize the proxy function. Simple proxies don't usually have target sets which we like, because human value is complex. However, maybe we just want the AI to randomly select a configuration which satisfies the proxy, instead of finding the maximally-proxy-ness configuration, which may be bad due to extremal Goodhart. 

Quantilization tries to solve this by randomly selecting a target configuration from some top quantile, but this is sensitive to how world states are individuated. 

This makes sense, but I think you'd need a different notion of optimizing systems than the one used in this post. (In particular, instead of a target configuration set, you want a continuous notion of goodness, like a utility function / reward function.)

I'm saying the target set for non-mild optimization is the set of configurations which maximize proxy-ness. Just take the argmax. By contrast, we might want to sample uniformly randomly from the set of satisficing configurations, which is much larger. 

(This is assuming a fixed initial state)

It sounds like you're assuming that the target configuration set is built into the AI system. According to me, a major point of this post / framework is to avoid that assumption altogether, and only describe problems in terms of the actual observed system behavior.

(This is why within this framework I couldn't formalize outer alignment, and why wireheading and the search / mesa-objective split is unnatural.)

I see the tension you're pointing at. I think I had in mind something like "an AI is reliably optimizing utility function u over the configuration space (but not necessarily over universe-histories!) if it reliably moves into high-rated configurations", and you could draw different epsilon-neighborhoods of optimality in configuration space. It seems like you should be able to talk about dog-maximizers without requiring that the agent robustly end up in the maximum-dog configurations (and not in max-minus-one-dog configs). 

I'm still confused about parts of this.

In this post, the author proposes a semiformal definition of the concept of "optimization". This is potentially valuable since "optimization" is a word often used in discussions about AI risk, and much confusion can follow from sloppy use of the term or from different people understanding it differently. While the definition given here is a useful perspective, I have some reservations about the claims made about its relevance and applications.

The key paragraph, which summarizes the definition itself, is the following:

An optimizing system is a system that has a tendency to evolve towards one of a set of configurations that we will call the target configuration set, when started from any configuration within a larger set of configurations, which we call the basin of attraction, and continues to exhibit this tendency with respect to the same target configuration set despite perturbations.

In fact, "continues to exhibit this tendency with respect to the same target configuration set despite perturbations" is redundant: clearly as long as the perturbation doesn't push the system out of the basin, the tendency must continue.

This is what is known as "attractor" in dynamical systems theory. For comparison, here is the definition of "attractor" from the Wikipedia:

In the mathematical field of dynamical systems, an attractor is a set of states toward which a system tends to evolve, for a wide variety of starting conditions of the system. System values that get close enough to the attractor values remain close even if slightly disturbed.

The author acknowledges this connection, although he also makes the following remark:

We have discussed systems that evolve towards target configurations along some dimensions but not others (e.g. ball in a valley). We have not yet discovered whether dynamical systems theory explicitly studies attractors that operate along a subset of the system’s dimensions.

I find this remark confusing. An attractor that operates along a subset of the dimension is just an attractor submanifold. This is completely standard in dynamical systems theory.

Given that the definition itself is not especially novel, the post's main claim to value is via the applications. Unfortunately, some of the proposed applications seem to me poorly justified. Specifically, I want to talk about two major examples: the claimed relationship to embedded agency and the claimed relations to comprehensive AI services.

In both cases, the main shortcoming of the definition is that there is an essential property of AI that this definition doesn't capture at all. The author does acknowledge that "goal-directed agent system" is a distinct concept from "optimizing systems". However, he doesn't explain how are they distinct.

One way to formulate the difference is as follows: agency = optimization + learning. An agent is not just capable of steering a particular universe towards a certain outcome, it is capable of steering an entire class of universes, without knowing in advance in which universe it was placed. This underlies all of RL theory, this is implicit in the Shane-Legg definition of intelligence and my own[1], this is what Yudkowsky calls "cross domain".

The issue of learning is not just nitpicking, it is crucial to delineate the boundary around "AI risk", and delineating the boundary is crucial to constructively think of solutions. If we ignore learning and just talk about "optimization risks" then we will have to include the risk of pandemics (because bacteria are optimizing for infection), the risk of false vacuum collapse in particle accelerators (because vacuum bubbles are optimizing for expanding), the risk of runaway global warming (because it is optimizing for increasing temperature) et cetera. But, these are very different risks that require very different solutions.

There is another, less central, difference: the author requires a particular set of "target states" whereas in the context of agency it is more natural to consider utility functions, which means there is a continuous gradation of states rather than just "good states" and "bad states". This is related to the difference the author points out between his definition and Yudkowsky's:

When discerning the boundary between optimization and non-optimization, we look principally at robustness — whether the system will continue to evolve towards its target configuration set in the face of perturbations — whereas Yudkowsky looks at the improbability of the final configuration.

The improbability of the final configuration is a continuous metric, whereas just arriving or not arriving at a particular set is discrete.

Let's see how this shortcoming affects the conclusions. About embedded agency, the author writes:

One could view the Embedded Agency work as enumerating the many logical pitfalls one falls into if one takes the "optimizer" concept as the starting point for designing intelligent systems, rather than "optimizing system" as we propose here.

The correct starting point is "agent", defined in the way I gestured at above. If instead we start with "optimizing system" then we throw away the baby with the bathwater, since the crucial aspect of learning is ignored. This is an essential property of the embedded agency problem: arguably the entire difficulty is about how can we define learning without introducing unphysical dualism (indeed, I have recently addressed this problem, and "optimizing system" doesn't seem very helpful there).

About comprehensive AI services:

Our perspective is that there is a specific class of intelligent systems — which we call optimizing systems — that are worthy of special attention and study due to their potential to reshape the world. The set of optimizing systems is smaller than the set of all AI services, but larger than the set of goal-directed agentic systems.

What is an example of an optimizing AI system that is not agentic? The author doesn't give such an example and instead talks about trees, which are not AIs. I agree that the class of dangerous systems is substantially wider than the class of systems which were explicitly designed with agency in mind. However, this is precisely because agency can arise from such systems even when not explicitly designed, and moreover this is hard to avoid if the system is to be powerful enough for pivotal acts. This is not because there is some class of "optimizing AI systems" which are intermediate between "agentic" and "non-agentic".

To summarize, I agree with and encourage the use of tools from dynamical systems theory to study AI. However, one must acknowledge to correct scope of these tools and what they don't do. Moreover, more work is needed before truly novel conclusions can be obtained by these means.


  1. Modulo issues with traps which I will not go into atm. ↩︎

My biggest objection to this definition is that it inherently requires time. At a bare minimum, there needs to be an "initial state" and a "final state" within the same state space, so we can talk about the system going from outside the target set to inside the target set.

One class of cases which definitely seem like optimization but do not satisfy this property at all: one-shot non-iterative optimization. For instance, I could write a convex function optimizer which works by symbolically differentiating the objective function and then algebraically solving for a point at which the gradient is zero.

Is there an argument that I should not consider this to be an optimizer?

My biggest objection to this definition is that it inherently requires time

Fascinating - but why is this an objection? Is it just the inelegance of not being able to look at a single time slice and answer the question of whether optimization is happening?

One class of cases which definitely seem like optimization but do not satisfy this property at all: one-shot non-iterative optimization.

Yes this is a fascinating case! I'd like to write a whole post about it. Here are my thoughts:

  • First, just as a fun fact, not that it's actually extremely rare to see any non-iterative optimization in practical usage. When we solve linear equations, we could use gaussian elimination but it's so unstable that in practice we use, most likely, the SVD, which is iterative. When we solve a system of polynomial equation we could use something like a Grobner basis or the resultant, but it's so unstable that in practice we something like a companion matrix method, which comes down to an eigenvalue decomposition, which is again iterative.
  • Consider finding the roots of a simple quadratic equation (ie solving a cubic optimization problem). We can use the quadratic equation to do this. But ultimately this comes down to computing a square root, which is typically (though not necessarily) solved with an iterative method.
  • That these methods (for solving linear systems, polynomial systems, and quadratic equations) have at their heart an iterative optimization algorithm is not accidental. The iterative methods involved are not some small or sideline part of what's going on. In fact when you solve a system of polynomial equations using a companion matrix, you spend a lot of energy rearranging the system into a form where it can be solved via an eigenvalue decomposition, and then the eigenvalue decomposition itself is very much operating on the full problem. It's not some unimportant side operation. I find this fascinating.
  • Nevertheless it is possible to solve linear systems, polynomial systems etc with non-iterative methods.
  • These methods are definitely considered "optimization" by any normal use of that term. So in this way my definition doesn't quite line up with the common language use of the word "optimization".
  • But these non-iterative methods actually do not have the core property that I described in the square-root-of-two example. If I reach in and flip a bit while a Guassian elimination is running, the algorithm does not in any sense recover. Since the algorithm is just performing a linear sequence of steps, the error just grows and grows as the computation unfolds. This is the opposite of what happens if I reach in and flip a bit while an SVD is being computed: in this case the error will be driven back to zero by the iterative optimization algorithm.
  • You might say that my focus on error-correction simply doesn't capture the common language use of the term optimization, as demonstrated by the fact that non-iterative optimization algorithms do not have this error-correcting property. You would be correct!
  • But perhaps my real response is that fundamentally I'm interested in these processes that somewhat mysteriously drive the state of the world towards a target configuration, and keep doing so despite perturbations. I think these are central to what AI and agency are. The term "optimizing system" might not be quite right, but it seems close enough to be compelling.

Thanks for the question - I clarified my own thinking while writing up this response.

Fascinating - but why is this an objection? Is it just the inelegance of not being able to look at a single time slice and answer the question of whether optimization is happening?

No, the issue is that the usual definition of an optimization problem (e.g. ) has no built-in notion of time, and the intuitive notion of optimization (e.g. "the system makes Y big") has no built-in notion of time (or at least linear time). It's this really fundamental thing that isn't present in the "original problem", so to speak; it would be very surprising and interesting if time had to be involved when it's not present from the start. 

If I specifically try to brainstorm things-which-look-like-optimization-but-don't-involve-objective-improvement-over-time, then it's not hard to come up with examples:

  • Rather than a function-value "improving" along linear time, I could think about a function value improving along some tree or DAG - e.g. in a heap data structure, we have a tree where the "function value" always "improves" as we move from any leaf toward the root. There, any path from a leaf to the root could be considered "time" (but the whole set of nodes at the "same level" can't be considered a time-slice, because we don't have a meaningful way to compare whole sets of values; we could invent one, but it wouldn't actually reflect the tree structure).
  • The example from the earlier comment: a one-shot non-iterative optimizer
  • A distributed optimizer: the system fans out, tests a whole bunch of possible choices in parallel, then selects the best of those.
  • Various flavors of constraint propagation, e.g. the simplex algorithm (and markets more generally)

Another big thing to note in examples like e.g. iteratively computing a square root for the quadratic formula or iteratively computing eigenvalues to solve a matrix: the optimization problems we're solving are subproblems, not the original full problem. These crucially differ from most of the examples in the OP in that the system's objective function (in your sense) does not match the objective function (in the usual intuitive sense). They're iteratively optimizing a subproblem's objective, not the "full" problem's objective.

That's potentially an issue for thinking about e.g. AI as an optimizer: if it's using iterative optimization on subproblems, but using those results to perform some higher-level optimization in a non-iterative manner, then aligning the sobproblem-optimizers may not be synonymous with aligning the full AI. Indeed, I think a lot of reasoning works very much like this: we decompose a high-dimensional problem into coupled low-dimensional subproblems (i.e. "gears"), then apply iterative optimizers to the subproblems. That's exactly how eigenvalue algorithms work, for instance: we decompose the full problem into a series of optimization subproblems in narrower and narrower subspaces, while the "high-level" part of the algorithm (i.e. outside the subproblems) doesn't look like iterative optimization.

I think this is covered in my view of optimization via selection, where "direct solution" is the third option. Any one-shot optimizer is implicitly relying on an internal model completely for decision making, rather than iterating, as I explain there. I think that is compatible with the model here, but it needs to be extended slightly to cover what I was trying to say there.

I shared this essay with a colleague where I work (Johns Hopkins University Applied Physics Lab). Here are her comments, which she asked me to share:

This essay proposes a very interesting definition of optimization as the manifestation of a particular behavior of a closed, physical system. I haven’t finished thinking this over, but I suspect it will be (as is suggested in the essay) a useful construct. The reasoning leading to the definition is clearly laid out (thank you!), with examples that are very useful in understanding the concept. The downside of being clearly laid out, however, is that it makes critique easier. I have a few thoughts about the reasoning in the essay.

The first thing I will note is that the essay gives three definitions for an optimizing system. These definitions are close, but not exactly equivalent. The nuances can be important. For example, that the target configuration set and the basin of attraction cannot be equal is obvious; that is made explicit in definition 3, but only implied in definitions 1 and 2. A bigger issue is that there are no criteria or rationale for their extent and relative size.

For example, the essay offers two reasons why the posterchild of non-optimizers - the bottle with a cap - is not an optimizing system; they both arise from the rather arbitrary definition of the basin of attraction as equal to the target configuration set. I see no necessary reason why the basin of attraction couldn’t be defined as the set of all configurations of water molecules both inside and outside the bottle. That way, the definitional requirement of a target configuration set smaller than the basis of attraction is met. The important point is: will water molecules in this new, larger basin of attraction tend to the target configuration set?

Let’s suppose that capped bottle is in a sealed room (not necessary but easier to think about), and that the cap is made of a special material that allows water molecules to pass through it in only one direction: from outside the bottle to inside. The water molecules inside the bottle stay inside the bottle, as for any cap. The water molecules inside the room, but outside the bottle, are zooming about (thermodynamic energy), bouncing off the walls, each other, and the bottle. Although it will take some time, sooner or later all the molecules outside the bottle will hit the bottle cap, go through, and be trapped in the bottle. Voila!

Originally, the bottle-with-a-cap system was a non-optimizing system by definition; the bottle cap type was irrelevant and could have been the rather special one I described. Simply by changing the definition of the basin of attraction, we could turn it into an optimizing system. Further, the original, “non-optimizing” system (with the original definitions of the basin of attraction and target set) would have behaved exactly the same as my optimizing system. On the other hand, changing the bottle cap from our special one to a regular cap will change the system into a non-optimizing system, regardless of the definitions of the basin of attraction and the target configuration set. Perhaps, we should insist that a properly formed system description has a basin of attraction that is larger than the target set, and count on the system behavior to make the optimizing/non-optimizing distinction.

Definitions 1 and 2 both contain the phrase “a small set of target configurations” which implies that the target set << than the basin of attraction. This is a problem for the notion of the universe as a system with maximum entropy as the target configuration set because the target set is most of the possible configurations. For this reason, the essay’s author concludes that universe-with-entropy system is not an optimizing system, or at best, a weak one. Stars, galaxies, black holes – there are strong forces that pull matter into these structures. I would say that any system that has succeeded in getting nearly everything within the basin of attraction into the target configuration is a strong optimizer!

Regardless of the way we chose to think about strong or weak, the universe is a system that tends to a set of configurations smaller than the set of possible configurations despite perturbations (the occasional house-building project for example!). Personally, I see no value in a definitional limitation. The behavior of the system (tending toward a smaller set of configurations out of a larger set) should govern the definition of an optimizing system, regardless of relative sizes of the sets.

Between the universe-with-entropy and bottle-with-a-cap systems, I question the utility of the “all configurations >= basin of attraction >> target set configuration” structure in the definition of optimizing systems. I believe it is worth thinking about what the necessary relationships among these configurations are, and how they are chosen.

The example of the billiards system raised another (to me) interesting question. The essay did not offer a system description but says “Consider a billiard table with some billiard balls that are currently bouncing around in motion. Left alone, the balls will eventually come to rest in some configuration…. If we reach in while the billiard balls are bouncing around and move one of the balls that is in motion, the system will now come to rest in a different configuration. Therefore this is not an optimizing system, because there is no set of target configurations towards which the system evolves despite perturbations.”

This example has some odd features. Friction between the balls and the table surface, along with the loss of energy during non-elastic collisions, cause the balls to slow down and stop. The minutia of their travels determines where they stop. The final arrangement is unpredictable (ok, it could be modeled given complete information, but let’s skip that as beside the point), and any arrangement is as likely as another. This suggests that the billiards system is a non-optimizing system even without the proposed perturbation of moving the balls around while the balls are in motion.

Looked at another way, billiards system does tend to a certain target configuration set, while friction and the non-elasticity of the collisions are perturbations. If we make the surface frictionless and the collisions perfectly elastic, the balls will bounce around the table without stopping. Much like the water molecules in the bottle-with-a-cap example, each will eventually fall into one pocket or another during its travels. Once in the pocket, the ball cannot get out, and thus eventually all will end up in the pockets. So, this system tends to a target configuration set of all balls in pockets.

Adding back in the perturbing friction and energy loss does not mean that this system is not tending to the target configuration set. Reaching in and moving a ball to a different point, or even redirecting any ball heading for a pocket, will not keep this system from tending towards the target configuration. It seems as though the billiards system was an optimizing system all along! The larger point is that it seems, by definition, an optimizing system is an optimizing system even if there are a set of perturbations that prevent it from ever reaching the target configuration! “Tending toward”, not “reaching”, a target configuration set is in all three definitions. It is worth thinking about an optimizing system that never actually optimizes. This may have some bearing on the AGI question.

[And for you readers who, like me, would say, whoa - it is possible that the balls will enter some repeating pattern of motion where some do not enter pockets. Maybe we need a robot to move the balls around randomly if they seem stuck, just like the ball-in-valley+robot system where the robot moves the ball over barriers. I maintain that the point is the same.]

The satellite system illustrates (perhaps an obvious point) that the definition of the target configuration set can change a single system from optimizing and to non-optimizing. What is a little more subtle is that the definition of the system boundaries is essential to the characterization of the system as optimizing or non-optimizing, even if the behavior of the system is the same under both definitions. In particular, what we consider to be part of the system and what is considered to be a perturbation can flip a system between characterizations. [This latter point is illustrated by the billiards system as well, as I will explain below.]

The essay says that a satellite in orbit is a non-optimizing system because if its position or velocity is perturbed, it has no tendency to return to its original orbit; that is, the author defines the target configuration as a particular orbit. With respect to another target configuration that may be described as “a scorched pile of junk on the surface of the Earth”, a satellite in orbit is an optimizing system exactly like a ball in a valley. As soon as the launch rocket stops firing, a satellite starts falling to the center of the earth because atmospheric drag and solar radiation pressure continuously decrease the component of the satellite’s velocity perpendicular to the force of gravity. So, unless a perturbation is big enough to send it out of orbit altogether, a satellite tends towards a target configuration of junk located on Earth’s surface.

Since a particular orbit is usually the desired target configuration (!), many satellites incorporate a rocket system to force them to stay in a chosen orbit. If a rocket system is included in the system definition, then the satellite is an optimizing system relative to the desired orbit. What is a little more interesting, with respect to the junk-on-the-Earth target set, drag and solar pressure are the part of the optimizing system; an orbit correction system is a perturbation. If the target set is the particular orbit the satellite started in, these definitions swap.

This observation has bearing on the billiards system example. If we include drag and non-elastic collisions as part of the billiards system, then the system is non-optimizing. If we see them as perturbations outside the system, then the billiards system is optimizing. I find this flexibility as a little curious, although I haven’t completely thought through the implications.

A completely different sort of question is suggested by the section on Drexler. There the essay sets out a hierarchy of all AI systems, optimizing systems, and goal-directed agent systems. This makes sense with respect to AI systems, but I do not see how optimizing systems, as defined, can be wholly contained within the category of AI systems, unless you define AI systems pretty broadly. For example, I think that pretty much any control system is an optimizing system by the definition in the essay. If we accept this definition of optimizing system, and hold that all optimizing systems are a subset of AI systems, do we have to accept our thermostats as AI systems? What about the program that determined the square root of 2? Is that AI? Is this an issue for this definition, or does its broadness matter in an AI context?

And a nitpick: The first example of an optimizing system offered in the essay is a program calculating the square root of 2. It meets the definition of an optimizing system, but it seems to contradict the earlier assertion that “… optimizing systems are not something that are designed but are discovered.” The algorithm and the program were both designed. I’m not sure why this point is necessary. Either I do not understand something fundamental, or the only purpose of the statement of discovery is to give people like me something to argue about!

In summary, the definition in the essay suggests a few questions that could have a bearing on its application:

  • How do we choose the basis of attraction relative to the target configuration set, if our choice can change the status of the system from optimizing to non-optimizing and vice versa?
  • Is it an issue that an optimizing system may never actually optimize?
  • How do we choose what is part of the system versus a perturbation outside the system when our choice changes the status of the system as optimizing or non-optimizing?
  • All control systems are optimizing systems by the definition, but are all control systems AI systems? Does it matter? If it does matter, how do we tell the difference?
  • For any of these, how do they affect our thinking for AI?

Finally, it might be better to have one, consistent definition that covers all the possibilities, including (in my opinion) that perturbations may be confined to certain dimensions.

This was actually part of a conversation I was having with this colleague regarding whether or not evolution can be viewed as an optimization process. Here are some follow-up comments to what she wrote above related to the evolution angle:

We could define the natural selection system as:

All configurations = all arrangements of matter on a planet (both arrangements that are living and those that are non-living)

Basis of attraction = all arrangements of matter on a planet that meet the definition of a living thing

Target configuration set = all arrangements of living things where the type and number of living things remains approximately stable.

I think that this system meets the definition of an optimizing system given in the Ground for Optimization essay. For example, predator and prey co-evolve to be about “equal” in survival ability. If a predator become so much better than its prey that it eats them all, the predator will die out along with its prey; the remaining animals will be in balance. I think this works for climate perturbations, etc. too.

HOWEVER, it should be clear that there are numerous ways in which this can happen – like the ball on bumpy surface with a lot of convex “valleys” (local minima), there is not just one way that living things can be in balance. So, to say that “natural selection optimized for intelligence” is quite not right – it just fell into a “valley” where intelligence happened. FURTHER, it’s not clear that we have reached the local minimum! Humans may be that predator that is going to fall “prey” to its own success. If that happened (and any intelligent animals remain at all), I guess we could say that natural selection optimized for less-than-human intelligence!

Further, this definition of optimization has no connotation of “best” or even better – just equal to a defined set. The word “optimize” is loaded. And its use in connection with natural selection has led to a lot of trouble in terms of human races, and humans v. animal rights.

Finally, in the essay’s definition, there is no imperative that the target set be reached. As long as the set of living things is “tending” toward intelligence, then the system is optimizing. So even if natural selection was optimizing for intelligence there is no guarantee that it will be achieved (in its highest manifestation). Like a billiards system where the table is slick (but not frictionless) and the collisions are close to elastic, the balls may come to rest with some of the balls outside the pockets. The reason I think this is important for AI research, especially AGI and ASI, is perhaps we should be looking for those perturbations to prevent us from ever reaching what we may think of as the target configuration, despite our best efforts.

Curated. Come on dude, stop writing so many awesome posts so quickly, it's too much.

This is a central question in the science of agency and optimization. The proposal is simple, you connected it to other ideas from Drexler and Demski+Garrabrant, and you gave a ton of examples of how to apply the idea. I generally get scared by the academic style, worried that the authors will fill out the text and make it really hard to read, but this was all highly readable, and set its own context (re-explaining the basic ideas at the start). I'm looking forward to you discussing it in the comments with Ricraz, Rohin and John.

Please keep writing these posts!

Thank you Ben. Reading this really filled me with joy and gives me energy to write more. Thank you for your curation work - it's a huge part of why there is this place for such high quality discussion of topics like this, for which I'm very grateful.

You’re welcome :-)

Thanks for the post, this is my favourite formalisation of optimisation so far!

One concern I haven't seen raised so far, is that the definition seems very sensitive to the choice of configuration space. As an extreme example, for any given system, I can always augment the configuration space with an arbitrary number of dummy dimensions, and choose the dynamics such that these dummy dimensions always get set to all zero after each time step. Now, I can make the basin of attraction arbitrarily large, while the target configuration set remains a fixed size. This can then make any such dynamical system seem to be an arbitrarily powerful optimiser.

This could perhaps be solved by demanding the configuration space be selected according to Occam's razor, but I think the outcome still ends up being prior dependent. It'd be nice for two observers who model optimising systems in a systematically different way to always agree within some constant factor, akin to Kolmogorov complexity's invariance theorem, although this may well be impossible.

As a less facetious example, consider a computer program that repeatedly sets a variable to 0. It seems again we can make the optimising power arbitrarily large by making the variable's size arbitrarily large. But this doesn't quite map onto the intuitive notion of the "difficulty" of an optimisation problem. Perhaps including some notion of how many other optimising systems would have the same target set would resolve this.

This seems great, I'll read and comment more thoroughly later. Two quick comments:

It didn't seem like you defined what it meant to evolve towards the target configuration set. So it seems like either you need to commit to the system actually reaching one of the target configurations to call it an optimiser, or you need some sort of metric over the configuration space to tell whether it's getting closer to or further away from the target configuration set. But if you're ranking all configurations anyway, then I'm not sure it adds anything to draw a binary distinction between target configurations and all the others. In other words, can't you keep the definition in terms of a utility function, but just add perturbations?

Also, you don't cite Dennett here, but his definition has some important similarities. In particular, he defines several different types of perturbation (such as random perturbations, adversarial perturbations, etc) and says that a system is more agentic when it can withstand more types of perturbations. Can't remember exactly where this is from - perhaps The Intentional Stance?

It didn't seem like you defined what it meant to evolve towards the target configuration set.

+1 for swapping out the target configuration set with a utility function, and looking for a robust tendency for the utility function to increase. This would also let you express mild optimization (see this thread).

Would this work for highly non-monotonic utility functions? 

It would work at least as well as the original proposal, because your utility function could just be whatever metric of "getting closer to the target states" would be used in the original proposal.

Two examples which I'd be interested in your comments on:

1. Consider adding a big black hole in the middle of a galaxy. Does this turn the galaxy into a system optimising for a really big black hole in the middle of the galaxy? (Credit for the example goes to Ramana Kumar).

2. Imagine that I have the goal of travelling as fast as possible. However, there is no set of states which you can point to as the "target states", since whatever state I'm in, I'll try to go even faster. This is another argument for, as I argue below, defining an optimising system in terms of increasing some utility function (rather than moving towards target states).

On the topic of the black hole...

There’s a way of viewing the world as a series of ”forces”, each trying to control the future. Eukaryotic life is one. Black holes are another. We build many things, humans, from chairs to planes to AIs. Of those three, turning on the AI feels the most like “a new force has entered the game”. 

All these forces are fighting over the future, and while it’s odd to think of a black hole as an agent, sometimes when I look at it it does feel natural to think of physics as another optimisation force that’s playing the game with us.

Great examples! Thank you.

  1. Consider adding a big black hole in the middle of a galaxy. Does this turn the galaxy into a system optimising for a really big black hole in the middle of the galaxy?

Yes this would qualify as an optimizing system by my definition. In fact just placing a large planet close to a bunch of smaller planets would qualify as an optimizing system if the eventual result is to collapse the mass of the smaller planets into the larger planet.

This seems to me to be a lot like a ball rolling down a hill: a black hole doesn't seem alive or agentic, and it doesn't really respond in any meaningful way to hurdles put in its way, but yes it does qualify as an optimizing system. For this reason my definition isn't yet a very good definition of what agency is, or what post-agency concept we should adopt. I like Rohin's comment on how we might view agency in this framework.

  1. Imagine that I have the goal of travelling as fast as possible. However, there is no set of states which you can point to as the "target states", since whatever state I'm in, I'll try to go even faster. This is another argument for, as I argue below, defining an optimising system in terms of increasing some utility function (rather than moving towards target states).

Yes it's true that using a set of target states rather than an ordering over states means that we can't handle cases where there is a direction of optimization but not a "destination". But if we use an ordering over states then we run into the following problem: how can we say whether a system is robust to perturbations? Is it just that the system continues to climb the preference gradient despite perturbations? But now every system is an optimizing system, because we can always come up with some preference ordering that explains a system as an optimizing system. So then we can say "well it should be an ordering over states with a compact representation" or "it should be more compact than competing explanations". This may be okay but it seems quite dicey to me.

It actually seems quite important to me that the definition point to systems that "get back on track" even when you push them around. It may be possible to do this with an ordering over states and I'd love to discuss this more.

But now every system is an optimizing system, because we can always come up with some preference ordering that explains a system as an optimizing system.

Hmmm, I'm a little uncertain about whether this is the case. E.g. suppose you have a box with a rock in it, in an otherwise empty universe. Nothing happens. You perturb the system by moving the rock outside the box. Nothing else happens in response. How would you describe this as an optimising system? (I'm assuming that we're ruling out the trivial case of a constant utility function; if not, we should analogously include the trivial case of all states being target states).

As a more general comment: I suspect that what starts to happen after you start digging into what "perturbation" means, and what counts as a small or big perturbation, is that you run into the problem that a *tiny* perturbation can transform a highly optimising system to a non-optimising system (e.g. flicking the switch to turn off the AGI). In order to quantify size of perturbations in an interesting way, you need the pre-existing concept of which subsystems are doing the optimisation.

My preferred solution to this is just to stop trying to define optimisation in terms of *outcomes*, and start defining it in terms of *computation* done by systems. E.g. a first attempt might be: an agent is an optimiser if it does planning via abstraction towards some goal. Then we can zoom in on what all these words mean, or what else we might need to include/exclude (in this case, we've ruled out evolution, so we probably need to broaden it). The broad philosophy here is that it's better to be vaguely right than precisely wrong. Unfortunately I haven't written much about this approach publicly - I briefly defend it in a comment thread on this post though.

suppose you have a box with a rock in it, in an otherwise empty universe [...]

Yes you're right, this system would be described by a constant utility function, and yes this is analogous to the case where the target configuration set contains all configurations, and yes this should not be considered optimization. In the target set formulation, we can measure the degree of optimization by the size of the target set relative to the size of the basin of attraction. In your rock example, the sets have the same size, so it would make sense to say that the degree of optimization is zero.

This discussion is updating me in the direction that a preference ordering formulation is possible, but that we need some analogy for "degree of optimization" that captures how "tight" or "constrained" the system's evolution is relative to the size of the basin of attraction. We need a way to say that a constant utility function corresponds to a degree of optimization equal to zero. We also need a way to handle the case where our utility function assigns utility proportional to entropy, so again we can describe all physical systems as optimizing systems and thermodynamics ensures that we are correct. This utility function would be extremely flat and wide, with most configurations receiving near-identical utility (since the high entropy configurations constitute the vast majority of all possible configurations). I'm sure there is some way to quantify this - do you know of any appropriate measure?

The challenge here is that in order to actually deal with the case you mentioned originally -- the goal of moving as fast as possible -- we need a measure that is not based on the size or curvature of some local maxima of the utility function. If we are working with local maxima then we are really still working with systems that evolve towards a specific destination (although there still may be advantages to thinking this way rather than in terms of a binary set).

My preferred solution to this is just to stop trying to define optimisation in terms of outcomes, and start defining it in terms of computation done by systems

Nice - I'd love to hear more about this

I think this is great.

I would want to relate it to a few key points out which I tried to address in a few earlier posts. Principally, I discussed selection versus control, which is about the difference between what optimization does externally, and how it uses models and testing. This related strongly to your conception of an optimizing system, but focused on how much of the optimization process occurs in the system versus in the agent itself. This is principally important because of how it relates to misalignment and Goodharting of various types.

I had hopes to further apply that conceptual model to meas-optimization, but I was a bit unsure how to think about it, and have been working on other projects. At this point, I think your discussion is probably a better conceptual model than the one I was trying to build there - it just needs to be slightly extended to cover the points I was trying to work out in those posts. I'd like to think about how it relates to mesa-optimization as well, but I'm unlikely to actually work on that

This post reminds me of thinking from 1950s when people taking inspiration from Wiener's work on cybernetics tried to operationalize "purposeful behavior" in terms of robust convergence to a goal state: 

https://heinonline.org/HOL/Page?collection=journals&handle=hein.journals/josf29&id=48&men_tab=srchresults

> When an optimizing system deviates beyond its own rim, we say that it dies. An existential catastrophe is when the optimizing system of life on Earth moves beyond its own outer rim.

I appreciate the direct attention to this process as an important instance of optimization.  The first talk I ever gave in the EECS department at UC Berkeley (to the full EECS faculty) included a diagram of Earth drifting out of the region of phase spare where humans would exist.  Needless to say, I'd like to see more explicit consideration of this type of scenario.

Very good. A lot of potential there, I feel.

This is excellent! Very well done, I would love to see more work like this.

I have a whole bunch of things to say along separate directions so I'll break them into separate comments. This first one is just a couple minor notes:

  • For the universe section, the universe doesn't push "toward" maxent, it just wanders around and usually ends up in maxent states because that's most of the states. The basin of attraction includes all states.
  • Regarding "whether dynamical systems theory explicitly studies attractors that operate along a subset of the system’s dimensions", I believe there's an old theorem that the long-term behavior of dynamical systems on a compact space is always ergodic on some manifold within the space. That manifold has a name which I don't remember, which is probably what you want to look for.

Planned summary for the Alignment Newsletter:

Many arguments about AI risk depend on the notion of “optimizing”, but so far it has eluded a good definition. One natural approach is to say that an optimizer causes the world to have higher values according to some reasonable utility function, but this seems insufficient, as then a <@bottle cap would be an optimizer@>(@Bottle Caps Aren't Optimisers@) for keeping water in the bottle.
This post provides a new definition of optimization, by taking a page from <@Embedded Agents@> and analyzing a system as a whole instead of separating the agent and environment. An **optimizing system** is then one which tends to evolve toward some special configurations (called the **target configuration set**), when starting anywhere in some larger set of configurations (called the **basin of attraction**), _even if_ the system is perturbed.
For example, in gradient descent, we start with some initial guess at the parameters θ, and then continually compute loss gradients and move θ in the appropriate direction. The target configuration set is all the local minima of the loss landscape. Such a program has a very special property: while it is running, you can change the value of θ (e.g. via a debugger), and the program will probably _still work_. This is quite impressive: certainly most programs would not work if you arbitrarily changed the value of one of the variables in the middle of execution. Thus, this is an optimizing system that is robust to perturbations in θ. Of course, it isn’t robust to arbitrary perturbations: if you change any other variable in the program, it will probably stop working. In general, we can quantify how powerful an optimizing system is by how robust it is to perturbations, and how small the target configuration set is.
The bottle cap example is _not_ an optimizing system because there is no broad basin of configurations from which we get to the bottle being full of water. The bottle cap doesn’t cause the bottle to be full of water when it didn’t start out full of water.
Optimizing systems are a superset of goal-directed agentic systems, which require a separation between the optimizer and the thing being optimized. For example, a tree is certainly an optimizing system (the target is to be a fully grown tree, and it is robust to perturbations of soil quality, or if you cut off a branch, etc). However, it does not seem to be a goal-directed agentic system, as it would be hard to separate into an “optimizer” and a “thing being optimized”.
This does mean that we can no longer ask “what is doing the optimization” in an optimizing system. This is a feature, not a bug: if you expect to always be able to answer this question, you typically get confusing results. For example, you might say that your liver is optimizing for making money, since without it you would die and fail to make money.
The full post has several other examples that help make the concept clearer.

Planned opinion:

I’ve <@previously argued@>(@Intuitions about goal-directed behavior@) that we need to take generalization into account in a definition of optimization or goal-directed behavior. This definition achieves that by primarily analyzing the robustness of the optimizing system to perturbations. While this does rely on a notion of counterfactuals, it still seems significantly better than any previous attempt to ground optimization.
I particularly like that the concept doesn’t force us to have a separate agent and environment, as that distinction does seem quite leaky upon close inspection. I gave a shot at explaining several other concepts from AI alignment within this framework in this comment, and it worked quite well. In particular, a computer program is a goal-directed AI system if there is an environment such that adding the computer program to the environment transforms it into a optimizing system for some “interesting” target configuration states (with one caveat explained in the comment).

This seems like a good definition of optimization for algorithmic systems, but I don't see how it works for physical systems. Going by the primary definition,

An optimizing system is a system that has a tendency to evolve towards one of a set of configurations that we will call the target configuration set, when started from any configuration within a larger set of configurations, which we call the basin of attraction.

But in the physical world, there are literally zero closed systems with this property. Entropy always increases*, and the target configuration set will never be smaller than the basin of attraction. The dirt-plus-seed-plus-sunlight system has a vastly smaller configuration space than the dirt-plus-tree-plus-heat system. Perhaps one could object that one should discount the incoming sunlight and outgoing heat since the system isn't really closed, but then consider a very similar system consisting of only dirt, air, and fungal spores. Surely if a growing tree is an optimizing system, then a growing mushroom in a closed system is an optimizer too. But the entropy increase in the latter case is unambiguous: the number of ways to arrange atoms into a fully grown mushroom is again vastly larger than the number of ways to configure atoms into dirt without mushrooms but with the nutrients to grow them.

It may be possible to get around this by redefining configuration spaces that better match our intuition (it does seem like a mushroom is more special than dirt), but I don't see any way to do this rigorously.

*or, at least, entropy always tends to increase.

The set of optimizing systems is smaller than the set of all AI services, but larger than the set of goal-directed agentic systems.

...

A tree is an optimizing system but not a goal-directed agent system.

I'm not sure this is true, at least not in the sense that we usually think about "goal-directed agent systems".

You make a case that there's no distinct subsystem of the tree which is "doing the optimizing", but this isn't obviously relevant to whether the tree is agenty. For instance, the tree presumably still needs to model its environment to some extent, and "make decisions" to optimize its growth within the environment - e.g. new branches/leaves growing toward sunlight and roots growing toward water, or the tree "predicting" when the seasons are turning and growing/dropping leaves accordingly.

One to think about whether "the set of optimizing systems is smaller than the set of all AI services, but larger than the set of goal-directed agentic systems" is that it's equivalent to Scott's (open) question does agent-like behavior imply agent-like architecture?

At first I particularly liked the idea of identifying systems with "an optimizer" as those which are robust to changes in the object of optimization, but brittle with respect to changes in the engine of optimization.

On reflection, it seems like a useful heuristic but not a reliable definition. A counterexample: suppose we do manage to build a robust AI which maximizes some utility function. One desirable property of such an AI is that it's robust to e.g. one of its servers going down or corrupted data on a hard drive; the AI itself should be robust to as many interventions as possible. Ideally it would even be robust to minor bugs in its own source code. Yet it still seems like the AI is the "engine", and it optimizes the rest of the world.

Yeah I agree that duality is not a good measure of whether a system contains something like an AI. There is one kind of AI that we can build that is highly dualistic. Most present-day AI systems are quite dualistic, because they are predicated on having some robust compute infrastructure that is separate from and mostly unperturbed by the world around it. But there is every reason to go beyond these dualistic designs, for precisely the reason you point to: such systems do tend to be somewhat brittle.

I think it's quite feasible to build highly robust AI systems, although doing so will likely require more than just hardening (making it really unlikely for the system to be perturbed). What we really want is an AI system where the core AI itself tends to evolve back to a stable configuration despite perturbations to its core infrastructure. My sense is that this will actually require a significant shift in how we think about AI -- specifically moving from the agent model to something that captures what is good and helpful in the agent model but discards the dualistic view of things.

Is a metal bar an optimizer?  Looking at the temperature distribution, there is a clear set of target states (states of uniform temperature) with a much larger basin of attraction (all temperature distributions that don't vaporize the bar).

I suppose we could consider the second law of thermodynamics to be the true optimizer in this case.  The consequence is that any* closed physical system is trivially an optimizing system towards higher entropy.

In general, it seems like this optimization criterion is very easy to satisfy if we don't specify what exactly we care about as a meaningful aspect of the system.  Even the bottle cap 'optimizes' for trivial things like maintaining its shape (against the perturbation of elastic deformation).  

Do you think this will become a problem when using this definition for AI?  For example, we might find that a particular program incidentally tends to 'optimize' certain simple measures such as the average magnitude of network weights, or some other functions of weights, loss, policy, etc. to a set point/range.  We may then find slightly more complex things being optimized that look like sub-goals (which could in a certain context be unwanted or dangerous).  How would we know where to draw the line?  It seems like the definition would classify lots of things as optimization, and it would be up to us to decide which kinds are interesting or concerning and which ones are as trivial as the bottle cap maintaining its shape.

That being said, I really like this definition.  I just think it should be extended to classify the interestingness of a given optimization.  An AI agent which competently pursues complex goals is a much more interesting optimizer than a metal bar, even though the bar seems more robust (deleting a tiny piece of metal won't stop it from conducting; deleting a tiny piece in the AI's computer could totally disable it).

Also a nitpick on the section about whether the universe is an optimizing system:

I don't think it is correct to say that the target space is almost as big as the basin of attraction.  Either:

  • We use area to represent the number of macroscopic states -- in this case, the target space is extremely small (one state only(?) -- an ultra-low-density bath of particles with uniform temperature).  The universe is an extremely powerful optimizer from this perspective, with the caveat that it takes almost forever to achieve its target.
  • We use area to represent the number of microscopic states (as I think you intended).  In this case, I think the target space is exactly identical to the basin of attraction.  Low entropy microstates are not any less likely than high entropy microstates -- there just happen to be astronomically fewer of them.  There is no 'optimizing force' pushing the universe out of these states.  From the microstate perspective, there is no reason to exclude them from the target zone, since any small and unremarkable subset of the target space will display the property that the system tends to stumble out of it at random.

I would say that the first lens is almost always better than the second, since macro-states are what we actually care about and how we naturally divide the configuration space of a system.

Finally, just want to say this is an amazing post!  I love the style as well as the content.  The diagrams make it really easy to get an intuitive picture.

 

*Unsure about the existence of exceptions (can an isolated system be contrived that fails to reach the global max for entropy?)

systems that have a tendency to evolve towards a narrow target configuration set when started from any point within a broader basin of attraction, and continue to do so despite perturbations.


When determining whether a system "optimizes" in practice, the heavy lifting is done by the degree to which the set of states that the system evolves toward -- the suspected "target set" -- feels like it forms a natural class to the observer.

The issue here is that what the observer considers "a natural class" is informed by the data-distribution that the observer has previously been exposed to.

It's worse, even, in a certain way, than that: the existence of optimizing systems organized around a certain idea of "natural class" feeds back into more observers observing data that is distributed according to this idea of "natural class", leading to more optimizing systems being built around that idea of "natural class", and so on.

Once a certain idea of "natural class" gains a foothold somewhere, observers will make real changes in the world that further suggest this particular idea of "natural class" to others, and this forms a feedback loop.

But Filan would surely agree on this point and his question is more specific: he is asking whether the liver is an optimizer.

FYI, it seems pretty clear to me that a liver should be considered an optimiser: as an organ in the human body, it performs various tasks mostly reliably, achieves homeostasis, etc. The question I was rhetorically asking was whether it is an optimiser of one's income, and the answer (I claim) is 'no'.

Truly a joy to read! Thank you.

To what extent can we identify subsets of the system corresponding to "that which is being optimized" and "that which is doing the optimization"?

The information theoretic measure of individuality attempts to answer exactly this type of question.

From this view, a set of components (the system) is decomposed into two subsets (subsystem + environment). The proposed subsystem is assigned a degree of individuality by measuring the amount of information it shares with its future state, optionally conditioned on its environment. This leads to 2 types of individuality. The first type says that a proposed subsystem is individualistic to the degree that the subsystem is predictive of its future state after accounting for the information in the environment. The second type captures the notion of inseparability by assigning a high degree of individuality to subsystems that are strongly coupled with their environment in such a way that neither the subsystem nor environment alone are predictive of the next state of the subsystem.

For example, considering the set of atoms making up the space containing the robot-optimizer and vase, the set of robot-atoms retains the desired properties of an optimizer, and is also highly individualistic in the first sense since knowing the state of the robot atoms tells you a lot about their next state, but knowing about the set of non-robot atoms tells you very little about the state of the robot. On the other hand, considering the set of atoms making up the tree, the system as a whole is an optimizing system, but no individual subset of atoms accomplishes the target of the larger optimizing system.

Thank you for the pointer to this terminology. It seems relevant and I wasn't aware of the terminology before.

the exact same answer it would have output without the perturbation.

It always gives the same answer for the last digit?

Well we could always just set the last digit to 0 as a post-processing step to ensure perfect repeatability. But point taken, you're right that most numerical algorithms are not quite as perfectly stable as I claimed.

You said your definition would not classify a bottle cap with water in it as an optimizer. This might be really nit-picky, but I'm not sure it's generally true.

I say this because the water in the bottle cap could evaporate. Thus, supposing there is no rain, from a wide range of possible states of the bottle cap, it would tend towards no longer having water in it.

I know you said you make an exception for tendencies towards increased entropy being considered optimizers. However, this does not increase the entropy of the bottlecap, It could potentially increase the entropy of the water that was in the bottle cap, but this is not necessarily the case. For example, if the bottle cap is kept in a sealed container, the water vapor could potentially condense into a small puddle with the same entropy as it had in the bottle cap.

If my memory of physics is correct, water evaporating would still increases the total entropy of the total system in which the bottle cap is located, by virtue of releasing some heat into the environment . However, note that humans and robots also, merely by doing mechanical work and thus forming heat which is then dispersed into the environment, result in increased entropy of the system they're in. So you can't rule out any system that makes its environment tend towards increased entropy from being an optimizer, because that's what humans and robots do, too.

That said, if you clarify that the bottle cap is not in any such contained system, I think the water would result in a higher-entropy state.

Thank you for this comment Chantiel. Yes, a container that engineered to evaporate water poured anywhere into it and condense it into a central area would be an optimizing system by my definition. That is a bit like a ball rolling down a hill, which is also an optimizing system and also has nothing resembling agency. I am

The bottle cap example was actually about putting a bottle cap onto a bottle and asking whether, since the water now stays inside the bottle, it should be considered an optimizer. I pointed out that this would not qualify as an optimizing system because if you moved a water molecule from the bottle and place it outside the bottle, the bottle cap would not act to put it back inside.