AI ALIGNMENT FORUM
AF

Counting-down vs. counting-up coherence — AI Alignment Forum

[Metadata: crossposted from https://tsvibt.blogspot.com/2022/10/counting-down-vs-counting-up-coherence.html. First completed 25 October 2022.]

Counting-down coherence is the coherence of a mind viewed as the absence of deviation downward in capability from ideal, perfectly efficient agency: the utility left on the table, the waste, the exploitability.

Counting-up coherence is the coherence of a mind viewed as the deviation upward in capability from a rock: the elements of the mind, and how they combine to perform tasks.

What determines the effects of a mind?

Supranormally capable minds can have large effects. To control those effects, we'd have to understand what determines the effects of a mind.

Pre-theoretically, we have the idea of "values", "aims", "wants". The more capable a mind is, the more it's that case that what the mind wants, is what will happen in the world; so the mind's wants, its values, determine the mind's effect on the world.

A more precise way of describing the situation is: "Coherent decisions imply consistent utilities". A mind like that is incorrigible: if it knows it will eventually be more competent than any other mind at pushing the world towards high-utility possibilities, then it does not defer to any other mind. So to understand how a mind can be corrigible, some assumptions about minds and their values may have to be loosened.

The question remains, what are values? That is, what determines the effects that a mind has on the world, besides what the mind is capable of doing or understanding? This essay does not address this question, but instead describes two complementary standpoints from which to view the behavior of a mind insofar as it has effects.

Counting-down coherence

Counting-down coherence is the coherence of a mind viewed as the absence of deviation downward in capability from ideal, perfectly efficient agency: the utility left on the table, the waste, the exploitability.

Counting-down coherence could also be called anti-waste coherence, since it has a flavor of avoiding visible waste, or universal coherence, since it has a flavor of tracking how much a mind everywhere conforms to certain patterns of behavior.

Some overlapping ways of describing counting-down incoherence:

Exploitable, Dutch bookable, pumpable for resources. That is, someone could make a set of trades with the mind that leaves the mind worse off, and could do so repeatedly to pump the mind for resources. See Garrabrant induction.
VNM violating. Choosing between different outcomes, or different probabilities of different outcomes, in a way that doesn't satisfy the Von Neumann–Morgenstern axioms, leaves a mind open to being exploited by Dutch books. See related LessWrong posts.
Doesn't maximize expected utility. A mind that satisfies the VNM axioms behaves as though it maximizes the expected value of a fixed utility function over atomic (not probabilistic) outcomes. So deviating from that policy exposes a mind to Dutch books.
Missed opportunities. Leaving possible gains on the table; failing to pick up a $20 bill lying on the sidewalk.
Opposing pushes. Working at cross-purposes to oneself; starting to do X one day, and then undoing X the next day; pushing and pulling on the door handle at the same time.
Internal conflict. At war with oneself; having elements of oneself that try to harm each other or interfere with each other's functioning.
Inconsistent beliefs, non-Bayesian beliefs. Sometimes acting as though X and sometimes acting as though not-X, where X is something that is either true or false. Or some more complicated inconsistency, or more generally failing to act as though one has a Bayesian belief state and belief revisions. Any of these also open one up to being Dutch booked.
Inefficient allocation. Choosing to invest resources in something that gives less benefit than another available investment.
Burning resources. Using up resources without getting any benefit. E.g. lighting a pile of negentropy on fire for no gain.
Suffering, spinning wheels. Pouring resources into strategies that aren't working; running through thoughts over and over without making progress.
Waste, inefficiency. Doing something that could be done with the same outputs / benefits and fewer inputs / costs. E.g. thrashing: elements of the mind switching back and forth between tasks in a way that incurs switching costs with no benefit over just finishing one task first.

The universal absence of all of these constitutes (one might argue) an ideal of coherence, and the presence of any of these constitutes a deviation from that ideal, counting down from it.

Counting-up coherence

Counting-up coherence is the coherence of a mind viewed as the deviation upward in capability from a rock: the elements of the mind, and how they combine to perform tasks.

Counting-up coherence could also be called adding-up coherence, since it's about how the incremental changes to the world made by a mind's (internal or external) actions add up to push the world far in some directions. Or it could be called existential coherence, since it talks about the existence within a mind of capabilities for specific tasks or contexts. Not necessarily narrow tasks, just specific, rather than non-specific like the counting-down "be indistinguishable from maximally capable for all tasks and contexts". (And more poetically, "existential" because counting-up coherence starts from the ground of entities coping with the world to keep existing, accreting capabilities as novel challenges demand, like evolution by natural selection.)

Some overlapping instances and descriptions of counting-up coherence:

Elements of a mind. See here. E.g. skills, concepts, knowledge.
Capabilities. E.g. the ability to type, to lift objects, to describe something, to sing.
Machines. Elements that operate on something (convert, extract, synthesize, direct, inhibit, store, retrieve, send). E.g. a digestive system that extracts glucose from food, muscle that consumes ATP to forcefully contract, and retinal parasol cells that pool signals from a couple dozen rods and cones.
Pushing in a direction. Having many effects that push the world in one direction. There's some feature of the world that is affected by many actions; those actions might follow many causal routes through the world to ultimately affect the feature, but mostly affect the feature in the same direction. E.g. a farmer cycles crops, rearranges irrigation schedules, breaks up the soil, spreads fertilizer, sprays pesticide, and so on, each of which increases crop yields compared to not doing so.
Potentiation, temporal interoperability, design. The mind takes multiple actions across time that individually don't do anything, but together have some large effect. That is, the mind incrementally pushes the world towards a state by potentiating future actions and effects, i.e. decreasing the remaining optimization power needed to bring about that world state, in some space of counterfactually possible actions. That is, the mind possibilizes for its future self.
- E.g. a Risk player attacks to discourage one opponent from breaking their continent, which allows them to build up a large force over a few turns; and then manipulates another opponent to get out of the way, exposing a third opponent; and then waits a turn to collect another card, so that when they eliminate the exposed opponent and collect their forfeited cards, it will force another big trade-in on the same turn, which chains into eliminating the other opponents without giving them an opportunity to strike back.
- E.g. a construction crew digs foundation trenches, pours concrete, nails down studs and crossbeams, and so on, moving up to the second floor, then to the roof with its shingles, finally creating a habitable home, whereas most intermediate stages were mostly useless as a home.
Non-contingently causing something. Making something happen in the world across different environments and mental states. E.g. a gust of wind might cause a branch to snap and fall, but this is contingent, because if the gust were positioned or directed slightly differently, or the branch were slightly less rotted, the branch wouldn't break. On the other hand, a human can somewhat non-contingently cause the branch to break, even if it's solidly connected to the trunk; even if sleep-deprived, hungry, mocked, and pelted by water balloons, the human can home in on the tree's location using eyes and legs and direction and size (or if blindfolded then by the sound of the leaves in the wind and then by the feel of the tree's roots), reach the branch using dynamically-shifted set points to homeostatically decrease the hand-branch distance, and pull on the branch, or hang their weight on it, or wiggle back and forth to iteratively loosen it, or even recurse and find a saw.
Interoperability. Components work well together, are tuned to make good use of each other and be useful to each other, give output and take input in formats that have reliable reference in some contexts. E.g. the visual object detectors take input from the 2.5D sketch constructors.
Orchestration, organization. The mind selects and combines elements in a manner such that the combination performs tasks that process and produce the information and action patterns that make something cool happen.
- E.g. the hand and the cortex talk to each other to manipulate the object while also tracking where it is and whether something unexpected is happening, and the two hand controllers talk to each other to push at the same time in the same direction, so that the pushes add up rather than canceling out or creating torque or dropping the object.
- E.g. a mathematician, when in a mathematical situation with a mental state including questions, examples, propositions, ideas, definitions, images, etc., applies heuristic behaviors to the mental state--distilling the basic intuition, sanity checks against other facts, careful logic, analogizing, guessing, checking simple cases in detail, checking extreme cases, reproving different ways, tweaking the definitions and assumptions, seeing if too much is proven, asking simpler questions--to understand the mathematical situation, e.g. to construct definitions and propositions and to verify conclusions.
- E.g. literally an orchestra and literally an organism, with specialized elements ordered and connected.
- E.g. bone hardens, muscle fibers attach to tendons, nerves and blood vessels connect to muscle, tendons pass through bone tunnels, ATP flows, control systems keep shoulders in place, set points flicker from the home row to target keys and back, all of which leads to typing.

In between counting-up and counting-down

Deduction

In Pierce's scheme, we have abduction, deduction, and induction. Abduction is counting-up flavored: it involves creating novel ideas, which constitute newly potentiated capabilities. Induction (narrowly construed as updating probabilities on hypotheses) is counting-down flavored: it reallocates weight between world-models, which is interpretable as avoiding waste, misallocation, non-Bayesian belief revision, and exploitability.

Deduction, on the other hand, is inter-mediate, in that it mediates between abduction and induction. To do induction, i.e. to update probabilities of hypotheses based on observation, a mind has to know what observation supports what hypotheses how much, which by Bayes's theorem is the same as asking what each hypothesis predicts about the observation. To know what a hypothesis predicts about observations, the mind has to deduce the consequences of the hypothesis. For example, in a Solomonoff inductor, computing out what each hypothesis predicts is the deduction component.

(Aside: In e.g. a universal Garrabrant inductor, in addition to computing out what trading strategy each trader plays, there's the feature of the setup whereby the bitstrings each trader bets on are all identified with each other, allowing traders's "revealed beliefs" to be compared. This identification is also present in Solomonoff induction and any hypothetico-deductive scheme. But the role of the identification in making hypothesis-like elements comparable is most foregrounded in a universal inductor. In Solomonoff induction we might intuitively say that hypotheses are comparable because they all make predictions "about the observations"; likewise a Bayesian's hypotheses make predictions "about observations"; a Levin search's hypothesis-like elements make "attempts at solving a problem"; and a logical inductor's traders make predictions about "logical propositions". In a universal inductor, however, there is no a priori interpretation for the "objects of belief" at all; it's just a bitstring. There's something like maximum ambiguity between all computable languages, or dually, something like a minimal amount of common language (just positions in a sequence).)

So deduction bridges abduction and induction, on the one hand filling out abducted concepts and hypotheses so that they are more applicable, more counting-up effective; and on the other hand making the hypotheses comparable to each other and hence subject to induction, more interpretable as being in a counting-down context.

Internal sharing of elements

A concept talked about a bit a few years ago around MIRI is "internal sharing of logical facts" (ISOLF). More generally, to what extent does a mind share / connect / interface / translate elements with other elements when that would be useful?

Examples:

If a mind is holding in its hand an object with an asymmetric weight distribution, it has some familiarity with that fact because it has to modulate its flexor muscle contractions to keep the object from turning and slipping. But that doesn't preclude the mind from later pushing the object, on the basis of its symmetrical visual appearance, too far out off the edge of the table, allowing the protruding concentration of weight to pull the object over and down to its demise. The mind can't be accurately summarized either as knowing or not knowing that the object has an asymmetric weight distribution; what knowledge the mind has, is not made use of in all contexts where it would be suitably used.
If a mind knows how to picture a linear map that undoes another given linear map (x-expand = un-x-contract, y-expand = un-y-contract, rotate clockwise = unrotate counterclockwise, shear left = unshear right), does that imply that when the mind is trying to solve a linear equation like , it will know how to compute $(A - λ I)^{- 1}$ ?
Does Aynsley know where she's standing? Does Madeleine know that the toy car is tiny? She knows she can pick it up with one hand, but still tries to get into it. (A commenter claims that in the actual study, many infants make similar errors without the verbal prompt ("Why don't you pretend like it's the big one?") we hear from the experimenter in this video. Here is one of the studies from Judy DeLoache's lab. Does the infant girl know how big the tiny slide is? She knows where it is, but tries to slide down it.)
AFAIK when the immune system finds an effective antibody for a pathogen, the stored antibodies don't also confer the ability to smell the complementary antigen; this fails to share the "knowledge" that some antibody binds well to the pathogen. As another example, we are not born knowing what a liver or spleen is, or what the mitral heart valve does. To elaborate on the second example: in a developing embryo, the pattern of regulatory gene activations, epigenomic markings, and diffusing signaling molecules interacting in activator-inhibitor-like dynamics generates the spatially articulated structure of organs and tissues and the locally differentiated cell types with specialized capabilities for molecular processing and other work. This pattern is determined in some way by the genome, so in some sense the genome contains a map, however implicit, of the adult organism's body and the capabilities and interactions of its organs. This map, however, is unlike mental maps; it is not explicit, and it can't be reused in other places. In particular, most of this information isn't installed into the developing brain (and what is installed, is installed in an indirect way, e.g. relying on the "territory itself" in the case of wiring up the motor cortex to muscles across the body).

From one side, internal sharing of elements is like interoperability, i.e. making elements useful and connected to each other, and orchestrating their combinations to perform tasks; that's a kind of counting-up coherence. From the other side, internal sharing of elements is a kind of counting-down coherence, i.e. avoiding incoherence: avoiding wasting opportunities to apply knowledge and skills, avoiding believing one thing in one context and believing another contradictory thing in another context.

Analogies are bridges for transferring understanding between contexts. As precursors to internal sharing of elements, analogies can be viewed from above as counting-down coherence: they avoid waste by taking the available opportunities to reuse skills, and they avoid inconsistency by taking beliefs acted on in one context and also acting on them in the other context. Analogies can also be viewed from below as counting-up coherence: they unlock the transfer of capabilities, thus gaining capabilities, and they aggregate multiple borders of single nexi, thus gaining understanding of things.

Search

Where does search fit? Trial and error, SGD, beam search, and evolution by natural selection are counting-up flavored, in that they start with something that doesn't work and mutate it locally to increase capabilities, and they can stop the search when something good enough has been found. On the other hand, Ariadne's thread is counting-down flavored (at least within a search space), in that you're exhausting the search space and finding all solutions, or the best solution, or verifying the absence of a solution.

Searches can be viewed as lying on a spectrum of how powerful they are. This doesn't account for the idea of a "search space". Searches can be near optimal within their search space, but very far from optimal in an expanded, richer search space.

For example, evolution is a very strong search process in the following sense. Consider any single change to the modal genome of a species. Suppose the change is a type of mutation that occasionally happens in the course of reproduction, and suppose the change would significantly increase expected inclusive genetic fitness in carriers. Then, that variant will likely soon be by far the most common variant in the population. This is a kind of optimality, and evolution has a kind of coherence in that sense: even with perfect understanding, there might be few mutations, relatively speaking, that could occur in the wild and also significantly improve the expected inclusive genetic fitness of a carrier in the environment of evolutionary adaptedness. (There are some limitations to this: if the fitness advantage is too weak, noise from random mutation mostly drowns out the selective signal, so an observer might be able to select that advantaged variant even though it doesn't become fixed. If fitness depends on the population frequency of the variant and other genes, then there can be dynamic equilibria, i.e. no time-independent notion of optimality.) However, there are changes to a genome that would never evolve but would be very fit. Famously, God could rewrite the giraffe's genome so that its laryngeal nerve didn't go all the way down its neck, around aortic arches, and back up the neck to the larynx. Further, God could give the lions retractable steel claws distilled from ingested dirt, give the humans built-in silicon mental calculators, and so on. Within a larger search space, evolution is limited, and can't be understood as coherent in the counting-down sense.

In some theoretical sense at least, we can expand the search space to a maximum by considering all possible computations. See universal computation, Solomonoff induction, logical induction, Levin search. So we can put search on a spectrum of how much of the space of algorithms is covered, with counting-up and counting-down at opposite ends. (Though, this still makes some assumptions about the form of what's being searched for; e.g., what Solomonoff induction most explicitly finds is only computations with a certain type signature.)

Since the full meaning of the "search space" involves the implications for the rest of the mind that come along with the things being searched for, it may not be sensible to globally compare searches: a fixed search space may be cut off from novelty, in the full sense of novelty that includes implication of structure for action according to the criteria provided by the context that called the search. That is, the relevant search space isn't just determined by what possible computations are admitted, but by what is drawn from those computations for use by the rest of the mind. For example, a Solomonoff inductor can't simply be taken as an ideal predictor, just because it searches over a maximally rich space to find good predictors; if it's used to make decisions, it might give malignant outputs.

Why call both of these "coherence"?

See also the discussion of "plans that lase" here.

Efficiency

The more counting-up coherence a mind has--the more machines and strategies it brings to bear to tasks, the more it succeeds at pushing the world in specific directions--the fewer errors it makes, and the more efficient it is: the harder it is for an onlooker to point out errors and inefficiencies. So counting-down coherence could be stated as the absence of missed opportunities for counting-up coherence.

Potentiation, optimization

The more counting-up coherence a mind has, the more that further skills are made potentially graspable. This constitutes pushing the world (including the mind) further along whatever directions the mind would push the world if it grasped those further skills. In other words, increasing either counting-up or counting-down coherence constitutes optimizing the world.

Growing towards integrated unity

A mind begins like an octopus. (That is, a maybe-fictional octopus as described by some pop-sci.) A mind begins as a creature with many separate threads of attention and activity and pursuit, tentacles dealing with their own local world, only communicating when necessary to perform some special larger task or to resolve greatly conflicting pulls. As the mind grows and gains skills and understanding, it unifies: it draws analogies between activities, objects, and ideas; it unifies conceptual Doppelgängers; it integrates novel understanding into its activity; it follows multiple borders of a nexus of reference deeper into the inductively tightly connected nexus, uncovering the elephant. Progress towards integration is coherentifying, from beginning to end.

Intentional stance

Both kinds of coherence contribute to behavior that is usefully viewed as intentional, goal-directed, in pursuit of something, by gemini modeling.

Induction on "finds coherences"

Difficult tasks require more counting-up coherence. If a mind can perform extremely difficult tasks, then it must have gained a lot of counting-up coherence. It would be a coincidence if the methods the mind used to gain that coherence were to stop working right at the level of capability to perform just that extremely difficult task. Rather, those methods can probably keep producing more counting-up coherence, so that a strong mind continues its trajectory of creativity. Enough counting-up coherence starts to look like counting-down coherence.

AI ALIGNMENT FORUM
AF

AI ALIGNMENT FORUM
AF

9

Counting-down vs. counting-up coherence

9

What determines the effects of a mind?

Counting-down coherence

Counting-up coherence

In between counting-up and counting-down

Deduction

Internal sharing of elements

Search

Why call both of these "coherence"?

Efficiency

Potentiation, optimization

Growing towards integrated unity

Intentional stance

Induction on "finds coherences"