Very very good question IMO. Thank you for this.
Consider a person who comes to a very clear understanding of the world, such that they are extremely capable of designing things, building things, fundraising, collaborating, and so on. Consider a moment where this person is just about to embark on a project but has not yet acquired any resources, perhaps has not even made any connections with anyone at all, yet is highly likely to succeed in their project when they do embark. Would you say this person has “resources”? If not, there is a kind of continuous trading-in that will take place as this person exchanges their understanding of things for resources, then later for final outcomes. Is there really a line between understanding-of-things, resources, and outcomes? The interesting part is the thing that gives this person power, and that seems to be their understanding-of-things.
Another way to put it: coherence theorems assume the existence of some resources (e.g. money), and talk about systems which are pareto optimal with respect to those resources - e.g. systems which “don’t throw away money”. Implicitly, we're assuming that the system generally "wants" more resources (instrumentally, not necessarily as an end goal), and we derive the system's "preferences" over everything else (including things which are not resources) from that. The agent "prefers" X over Y if it expends resources to get from Y to X. If the agent reaches a world-state which it could have reached with strictly less resource expenditure in all possible worlds, then it's not an expected utility maximizer - it "threw away money" unnecessarily. We assume that the resources are a measuring stick of utility, and then ask whether the system maximizes any utility function over the given state-space measured by that measuring stick.
That is the best explanation of "what the tool of coherence theorems does and is useful for" that I've ever read, including discussions with you. Thanks for that!
Let’s start with the simplest coherence theorem: suppose I’ll pay to upgrade pepperoni pizza to mushroom, pay to upgrade mushroom to anchovy, and pay to upgrade anchovy to pepperoni. This does not bode well for my bank account balance. And the only way to avoid having such circular preferences is if there exists some “consistent preference ordering” of the three toppings - i.e. some ordering such that I will only pay to upgrade to a topping later in the order, never earlier. That ordering can then be specified as a utility function: a function which takes in a topping, and gives the topping’s position in the preference order, so that I will only pay to upgrade to a topping with higher utility.
More advanced coherence theorems remove a lot of implicit assumptions (e.g. I could learn over time, and I might just face various implicit tradeoffs in the world rather than explicit offers to trade), and add more machinery (e.g. we can incorporate uncertainty and derive expected utility maximization and Bayesian updates). But they all require something-which-works-like-money.
Money has two key properties in this argument:
These are the conditions which make money a “measuring stick of utility”: more money is better (all else equal), and money adds. (Indeed, these are also the key properties of a literal measuring stick: distances measured by the stick along a straight line add, and bigger numbers indicate more distance.)
Why does this matter?
There’s a common misconception that every system can be interpreted as a utility maximizer, so coherence theorems don’t say anything interesting. After all, we can always just pick some “utility function” which is maximized by whatever the system actually does. It’s the measuring stick of utility which makes coherence theorems nontrivial: if I spend $3 trading anchovy -> pepperoni -> mushroom -> anchovy, then it implies that either (1) I don’t have a utility function over toppings (though I could still have a utility function over some other silly thing, like e.g. my history of topping-upgrades), or (2) more money is not necessarily better, given the same toppings. Sure, there are ways for that system to “maximize a utility function”, but it can’t be a utility function over toppings which is measured by our chosen measuring stick.
Another way to put it: coherence theorems assume the existence of some resources (e.g. money), and talk about systems which are pareto optimal with respect to those resources - e.g. systems which “don’t throw away money”. Implicitly, we're assuming that the system generally "wants" more resources (instrumentally, not necessarily as an end goal), and we derive the system's "preferences" over everything else (including things which are not resources) from that. The agent "prefers" X over Y if it expends resources to get from Y to X. If the agent reaches a world-state which it could have reached with strictly less resource expenditure in all possible worlds, then it's not an expected utility maximizer - it "threw away money" unnecessarily. We assume that the resources are a measuring stick of utility, and then ask whether the system maximizes any utility function over the given state-space measured by that measuring stick.
Ok, but what about utility functions which don’t increase with resources?
As a general rule, we don’t actually care about systems which are “utility maximizers” in some trivial sense, like the rock which “optimizes” for sitting around being a rock. These systems are not very useful to think of as optimizers. We care about things which steer some part of the world into a relatively small state-space.
To the extent that we buy instrumental convergence, using resources as a measuring stick is very sensible. There are various standard resources in our environment, like money or energy, which are instrumentally useful for a very wide variety of goals. We expect a very wide variety of optimizers to “want” those resources, in order to achieve their goals. Conversely, we intuitively expect that systems which throw away such resources will not be very effective at steering parts of the world into relatively small state-space. They will be limited to fewer bits of optimization than systems which use those same resources pareto optimally.
So there’s an argument to be made that we don’t particularly care about systems which “maximize utility” in some sense which isn’t well measured by resources. That said, it’s an intuitive, qualitative argument, not really a mathematical one. What would be required in order to make it into a formal argument, usable for practical quantification and engineering?
The Measuring Stick Problem
The main problem is: how do we recognize a “measuring stick of utility” in the wild, in situations where we don’t already think of something as a resource? If somebody hands me a simulation of a world with some weird physics, what program can I run on that simulation to identify all the “resources” in it? And how does that notion of “resources” let me say useful, nontrivial things about the class of utility functions for which those resources are a measuring stick? These are the sorts of questions we need to answer if we want to use coherence theorems in a physically-reductive theory of agency.
If we could answer that question, in a way derivable from physics without any “agenty stuff” baked in a priori, then the coherence theorems would give us a nontrivial sense in which some physical systems do contain embedded agents, and other physical systems don’t. It would, presumably, allow us to bound the number of bits-of-optimization which a system can bring to bear, with more-coherent-as-measured-by-the-measuring-stick systems able to apply more bits of optimization, all else equal.