Synthesizing Standalone World-Models, Part 1: Abstraction Hierarchies

This is part of a series covering my current research agenda. Refer to the linked post for additional context.

Suppose we have a dataset consisting of full low-level histories of various well-abstracting universes. For simplicity, imagine them as -vectors, corresponding e. g. to the position of each particle at a given moment, or denoting the coordinates of that state in a classical state-space.

Suppose we wanted to set up a system which maps any such sample to its minimal-description-length representation; i. e., to a well-structured world-model that takes advantage of its abstract regularities. How would it work? What are the algorithms that implement search for such representations?

I'll be building the model piece by piece, incrementally introducing complications and proposed solutions to them.

Part 1 covers the following problem: if we're given a set of variables over which an abstraction hierarchy is defined, what is the type signature of that hierarchy, and how can we learn it?

1.1. The Bare-Bones Setup

Suppose we have a set of random variables $X = {x_{1}, \dots, x_{n}}$ . What is the most compact way to represent it without loss of information?

In the starting case, let's also assume there's no synergistic information.

Formally, we can define the task as follows: define a set of deterministic functions $Q$ of the variables $X$ such that $\sum_{q \in Q} H (q)$ is minimized under the constraint of $H (Q) = H (X)$ .

Intuitively, we need to remove all redundancies. If two variables have some redundant information, we need to factor it out a separate variable, and only leave them with the information unique to them. But if two different redundant-information variables $r_{1}$ , $r_{2}$ produced by two subsets $X_{1}$ , $X_{2}$ also have some shared information, we need to factor it out into an even "higher-level" redundant-information variable as well, leaving $r_{1}$ , $r_{2}$ with only whatever information is "uniquely shared" within subsets $X_{1}$ and $X_{2}$ respectively. This already hints at a kind of hierarchy...

A natural algorithm for the general case is:

Generate all subsets of $X$ .
For every subset $X_{i}$ , define a variable $q_{i}$ containing all information redundantly represented in every variable in $X_{i}$ .
If there's a pair of $q_{i}$ , $q_{k}$ such that $X_{i} \subset X_{k}$ , then $q_{i}$ necessarily contains all information in $q_{k}$ . Re-define $q_{i}$ , removing all information present in $q_{k}$ .
If a given $q_{i}$ ends up with zero (or some $ϵ$ ) information, delete it.
If there's a pair of surviving $q_{i}$ , $q_{k}$ such that $X_{i} \subset X_{k}$ , define their relationship as $q_{i} ≻ q_{k}$ .

Intuitively, each subset becomes associated with a variable containing all information uniquely shared among the variables in that subset – information that is present in all of them, but nowhere else. Many subsets would have no such information, their $q$ -variables deleted.

The $q$ -variables would then form a natural hierarchy/partial ordering. The highest-level $q$ -variables would contain information redundant across most $x$ -variables, and which was initially present in most intermediate-level $q$ -variables. The lowest-level $q$ -variables would contain information unique to a given $x$ -variable.

A way to think about it is to imagine each $x$ -variable as a set of "atomic" random variables – call them "abstraction factors" – with some of those factors reoccurring in other $x$ -variables. What the above procedure does is separating out those factors. (The term "abstraction factors" is chosen to convey the idea that those factors, in separation, may not actually constitute independent abstractions. I'll elaborate on that later.)

(All this assumes that we can just decompose the variables like this via deterministic functions... But that assumption is already largely baked into the idea that this can be transformed into a well-structured world-model. In practice, we'd use terms like "approximately extract", "approximately deterministic", et cetera.)

Example: Suppose that we have $x_{1} = {U_{1}, A, B, C}$ , $x_{2} = {U_{2}, A, B, C}$ , $x_{3} = {U_{3}, A, B, C, D}$ , $x_{4} = {U_{4}, A, B, D}$ , $x_{5} = {U_{5}, A, B, D}$ , $x_{6} = {U_{6}, A}$ . The resultant hierarchy would be:

Consider picking any given node, such as $C$ , and "restoring" all information from the factors that are its ancestors, getting ${A, B, C}$ here. This procedure recovers the redundant-information variables that existed prior to the factorization/"extraction" procedure (Step 3 in the above algorithm).

In this case, we get:

To me, this structure already vaguely feels like the right "skeleton" of an abstraction hierarchy, although it's of course rudimentary so far. Each "restored" node would represent a valid abstract object, and the factors are the "concepts" it's made of, from the most-abstract (e. g., $A$ being "what animal this is") to most concrete ( $U_{2}$ being "details about this specific animal").

Further, there's a simple connection to natural latents! If you take any given factor (diagram 1), and condition the "restored" variables corresponding to its children-factors (diagram 2) on all of their other ancestral factors, this factor becomes the natural latent for those variables. It makes them independent (because variables are independent if conditioned on all of their abstraction factors, since those represent all shared information between them), and they all tell the same information about it (namely, they all contain that factor).

For example, take $C$ and ${U_{1}, A, B, C}$ , ${U_{2}, A, B, C}$ , ${U_{3}, A, B, C, D}$ . If we condition those variables on $A$ , $B$ , and $D$ , then $C$ would become the natural latent for that set. (Specifically, what we do here is: pick a node on the graph, go down to its children, then iteratively follow the arrows pointing to them upwards, and condition on all of the variables you encounter except the one you picked.)

In fact, this works with any set of factors with a total ordering (i. e., without "siblings"/incomparable elements); or, equivalently stated, for any subset of factors present in a given "restored" variable. For example, any subset of ${A, B, C}$ is a natural latent over $x_{1}$ , $x_{2}$ , $x_{3}$ conditional on $D$ and that subset's complement. By contrast, the set ${C, D}$ , or any of its supersets, is not a natural latent for any set of variables. (Well, technically, it'd fill the role for ${x_{1}, x_{2}}$ conditioned on $A$ and $B$ , but then $D$ is effectively just random noise and the actual latent is $C$ alone.)

Examples

Some examples regarding how it all might work in concrete terms, highlighting important aspects of the picture. All assume the above hierarchy. I'll copy it here for convenience:

1. Toy store. Imagine that we have a list of products sold at a toy store. ${A}$ is "a product", ${A, B}$ is "a toy", ${A, B, C}$ is "a car toy", ${A, B, D}$ is "a constructor toy", $U$ -variables specify the details of individual products. $x_{3}$ is a LEGO car, and $x_{6}$ is some non-toy product being sold on the side.

Note that it's specifically the "restored" variables that allow this kind of interpretation, not the individual abstraction factors. ${A, B}$ is "a toy", not $B$ . $B$ is some "essence of toy-ness" such that, when combined with $A$ , it transforms $A$ from an unspecified "product" to a "toy product". Taken in isolation, $B$ 's values may not actually have a clear interpretation! They're "correction terms", which may be meaningless without knowing what we're correcting from. This is the sense in which abstraction factors function as "ingredients" for abstractions, rather than as full abstractions in themselves.

Note also the "uneven stacking": there are no "globally synchronized" abstraction levels, some variables/regions have more layers of organization than others, like $x_{6}$ versus all other variables. This makes broader intuitive sense (the tower of abstractions on Earth is taller than on Mars, since Earth has cells/organisms/populations/societies/civilization).

2. Photo. Imagine that we're looking at a photo. ${A}$ is "the lighting conditions", ${A, B}$ is "specific objects in the photo", ${A, B, C}$ is "dogs in the photo", ${A, B, D}$ is "sculptures in the photo". $x_{1}$ and $x_{2}$ are dogs, $x_{3}$ is a dog sculpture, $x_{4}$ and $x_{5}$ are some non-dog sculptures, $x_{6}$ is the background.

Consider the idea of "abstract edits". If we have this decomposition, we could: (1) decompose the picture into factors, (2) modify a specific factor, e. g. "lighting conditions", then (3) re-compose the picture, but now with modified lighting.

3. Machine-Environment System. Consider a system involving the interactions between some machine and the environment. ${A, B}$ is the machine's state (e. g., "releasing energy" or "consuming energy"), the factors $C$ and $D$ add specific details about the states of its two modules, $U$ -variables from 1 to 5 contain specifics about the states of individual components of the machine, $U_{6}$ is information about the state of the external environment, and $A$ is the system's overall highest-level state (e. g., "there's currently a positive-feedback loop between the machine and the environment"). The component $x_{3}$ serves as the interface between the two modules; other components belong to the modules separately.

Notes:

Drawing on what I talked about in the first example: if we want to know the full state of a given component, we have to "restore" the variables: add information about the overall system-state and the state of modules into it. Absent that information, the factor may lack clear interpretation. What would "the individual state of the detail, without considering the overall state of the system" mean? It would be a "correction term" for the overall state, potentially meaningless without knowing that state.
Potentially obvious, but: note that conditioning on a factor and "restoring" a factor are very different procedures. "Restoration" involves, to wit, restoring the model to its full complexity, while conditioning on a factor removes complexity. Example: conditioning on ${A, B}$ removes the mutual information between the modules corresponding to $C$ and $D$ , making them independent, while "restoring" ${A, B}$ returns mutual information to them, recovering whatever causal entanglement they had.
Suppose we wanted to model the interactions between some set of high-level systems. What factors would we need to "restore" to model them at full fidelity? The set of the ancestors of those systems' specifics: meaning we can disregard lower levels and any "sibling" subgraphs. You can imagine it as a kind of "abstraction cone".

4. Machine-Environment System, #2. Consider a machine-environment system made of gears and pulleys. ${A, B, C}$ is "a gear", ${A, B, D}$ is "a pulley", ${A, B}$ is "human manufacturing practices", $x_{6}$ is "this external environment", ${A}$ is "the overall type of this system". $x_{1}$ and $x_{2}$ are specific gears, $x_{4}$ and $x_{5}$ are specific pulleys, $x_{3}$ is a toothed pulley.

This system can be the same system as in example (3): it's just that here, we focus on a different subgraph in our partial-order abstraction diagram, the "sibling" of the one in (3). (3) focuses on the system as an abstraction over its constituent parts, while (4) focuses on representing the constituents as instances of commonly reoccurring types of objects.

Note how all of the tools described above can be combined. We can take this system, decompose it into the representation in (4), make some abstract edit to the gears' manufacturing practices, re-assemble the system, re-disassemble it into the (3) representation, take the nodes $C$ and $D$ , condition on $A$ (to remove interactions with the environment), and restore $B$ to them (to recover the dynamics between the modules). Intuitively, this corresponds to evaluating how the interactions between the modules change if we switch up our gear manufacturing practices; perhaps running the corresponding simulation.

5. People. Suppose that the $x$ -variables are six different people living in the same city. ${A}$ is "the city's state", ${A, B}$ is "the state of the city's socioeconomic sphere", ${A, B, C}$ is "the state of a specific company", ${A, B, D}$ is "the state of a popular political movement". Three people work at the company, three people participate in the movement, one person is part of both the company and the movement, and one person is unemployed and not part of the movement.

Focus on $x_{3}$ specifically here: the person that's part of both the corporate dynamics and political dynamics. Note that (a) those dynamics can be largely non-interacting, and (b) both of those dynamics "share" the substrate on which they run: the third person. This is "polysemanticity" of a sort: a given low-level system can simultaneously implement several independent high-level systems. (This is essentially just another way of looking at "sibling subgraphs", but it conveys different intuitions.)

Distilling, some notable features are:

"Uneven stacking": there are no "globally synchronized" abstraction levels: some regions have more levels of organization than others. (So we can't travel up the levels by tuning some global acceptable-approximation-error dial.)
"Abstract editing": decomposing and re-composing a given sample of a well-abstracting system lets us edit its abstract features.
"Sibling subgraphs": a given sample can have several complex abstraction-hierarchy subgraphs that don't "interact" at all.
- (Another illustrative example: a given historical event having several narratives that are simultaneously true, but focus on different aspects of it.)
"Generalized polysemanticity": a given low-level system can run several independent (or sparsely interacting) high-level systems.
- (Another illustrative example: the literal polysemantic neurons in language models, perhaps.)
"Abstraction cones": if we want to model a given system, we only need to pay attention to its ancestral factors.
- (Another illustrative example: if we want to model Earth, we don't need to model the rest of the universe in detail: high-level summaries suffice.)

I think it's pretty neat that all of this expressivity falls out of the simple algorithm described at the beginning. Granted, there's a fair amount of "creative interpretation" going on on my end, but I think it's a promising "skeleton". However, major pieces are still missing.

1.2. Synergistic Information

We'd like to get rid of the "no synergistic information" assumption. But first: what is synergistic information?

Usually, it's described as "the information the set of variables $X$ gives us about some target variable $Z$ , which we can't learn by inspecting any strict subset of $X$ in isolation". The typical example is a XOR gate: if $XOR (X, Y) = Z$ , and $X$ and $Y$ are independent random bits, looking at either bit tells us nothing about $Z$ , while both bits taken together let us compute $Z$ exactly.

But note that synergistic information can be defined by referring purely to the system we're examining, with no "external" target variable. If we have a set of variables $X = {x_{1}, \dots, x_{n}}$ , we can define the variable s such that $I (X; s)$ is maximized under the constraint of $\forall X_{i} \in (P (X) ∖ X) : I (X_{i}; s) = 0$ . (Where $P (X) ∖ X$ is the set of all subsets of $X$ except $X$ itself.)

That is: s conveys information about the overall state of $X$ without saying anything about any specific variable (or set of variables) in it.

The trivial example are two independent bits: $s$ is their XOR.

A more complicated toy example: Suppose our random variable $X$ is a 100-by-100 grid of binary variables $x_{i}$ , and each sample of $X$ is a picture where some 8 adjacent variables are set to 1, and all others to 0. $s$ can then return the shape the activated variables make. Across all realizations, it tells us approximately zero information about any given subset (because the number of active variables is always the same), but we still learn something about $X$ 's overall state.

This is the "true nature" of synergistic information: it tells us about the "high-level" features of the joint samples, and it ignores which specific low-level variables implement that feature.

Another example: emergent dynamics. Consider the difference between the fundamental laws of physics and Newtonian physics:

Fundamental laws mediate the interactions between lowest-level systems.
Newtonian physics mediate the interactions between low-velocity macro-scale objects. To determine whether Newtonian physics apply, you have to look at large segments of the system at once, instead of at isolated fundamental parts.

I. e.: unlike with the fundamental laws, "does this system approximately implement Newtonian physics, yes/no?" depends on synergistic information in large sets of fundamental particles, and many conditions need to be met simultaneously for the answer to be "yes".

Note also that synergistic information is effectively the "opposite" of redundant information. Conditioning lower-level variables on synergistic information creates, not removes, mutual information between them. (Consider conditioning the XOR setup on the value of the input $Y$ . Suddenly, there's mutual information between the other input $X$ and the output $Z$ ! Or: consider conditioning a bunch of fundamental particles on "this is a Newtonian-physics system". Suddenly, we know they have various sophisticated correspondences!)

1.3. Incorporating Synergistic Information

How can we add synergistic information into our model from 1.1?

One obvious approach is to just treat synergistic variables as... variables. That is:

For every subset of $X$ , compute its (maximal) synergistic variable.
Define $X^{*} = X \cup S$ , where $S$ is the set of all synergistic variables.
Treat $X^{*}$ as a set of variables with no synergistic information, and run the algorithm from 1.1 on it.

As far as I can tell, this mostly just works. Synergistic variables can have shared information with other synergistic variables, or with individual $x_{i}$ variables; the algorithm from 1.1. handles them smoothly. They always have zero mutual information with their underlying $x$ -variables, and with any synergistic variables defined over subsets of their underlying $x$ -variables, but that's not an issue.

Note that no further iteration is needed: we don't need to define synergistic variables over sets of synergistic variables. They would just contain parts of the information contained in a "first-iteration" higher-level synergistic variable, and so the algorithm from 1.1 would empty out and delete them.

Important caveats:

This, again, assumes that synergistic information is (approximately) deterministically extractible.
We might actually want to have some notion of "synergistic variable over synergistic variables", so some tinkering with the algorithm may be needed. (I haven't thought in depth about the relevant notion yet, though.)
The resultant representation is not the shortest possible representation. Consider a two-bit XOR: the overall entropy is 2 bits, but if we include the synergistic variable, we'll end up with a three-bit description.

(3) is kind of very majorly inconvenient. It's clear why it happens: as stated, the synergistic variable does not actually "extract" any entropy/information from the variables on which it's defined (the way we do when we factor out redundant information), so some information ends up double-counted. I do have some ideas for reconciling this with the overall framework...

The current-best one (which may be fundamentally confused, this reasoning is relatively recent) is:

Once we add more details to this setup, we would notice that the "random variables" it's working to decompose are not just the within-the-model inputs it's currently treating at random variables, but also the inferred model itself.
Example: If we have a series of random variables which represent the evolution of a Newtonian-physics system, and we infer their "joint probability distribution" in the form of Newtonian laws and the system's Lagrangian, we won't stop there. The next step would involve looking at the library of systems whose laws we've inferred, and trying to abstract over those laws.
I. e., we would treat parts of the previous level's probabilistic structure as the next level's random variables to be decomposed.
And there's a sense in which synergistic information is information about the system/JPD, not about the individual variables in it. Some more about it in 1.5 and Part 2.
- Briefly: conditioning on synergistic information creates mutual information, which means any structure with entangled variables can be considered a structure with independent-by-default variables conditioned on a synergistic variable.
Conditioning on the synergistic information, thus, may be semantically equivalent to saying which type of high-level system a given low-level system implements/which class of objects this object is an instance of/etc.
Also: consider what "removing" synergistic information from the system would mean. It would, essentially, mean removing the high-level coordination of the variables' values; treating a joint sample of several variables as independent samples of those variables.
So, from the larger point of view, if we expand our view of what counts as a "random variable", and therefore the total entropy to which the sum of entropies of our unique/redundant/synergistic variables should add up, it may not actually be "overcounting" at all!

Again, this is a relatively fresh line of argument, and it may be obviously flawed from some angle I didn't yet look at it from.

1.4. Partial Information Decomposition

As it turns out, the picture we now have cleanly maps to a known concept from information theory: partial information decomposition (PID); or partial entropy decomposition.

This paper provides a basic overview of it. (It uses a very adhockish definition for redundant information, but is otherwise fine.)

PID's steps mirror pretty much all of the steps we've gone through so far. Starting from some set of variables $X$ with entropy $H (X)$ , PID:

Quantifies synergistic information for all subsets of variables.
Places those synergistic-information "atoms" on equal standing with the initial $x$ -variables.
Quantifies redundant information across all subsets of variables in the combined set $X \cup S$ .
Subtracts things around the place, ensuring that each "atom" only contains the information unique to it. (E. g., information "uniquely shared" between two variables: present in them and nowhere else.)
The entropies of the atoms then add up to $H (X)$ .

There's a clear correspondence between PID's "atoms" and my abstraction factors, the pre-subtraction atoms are equivalent to my "restored" nodes, there's a procedure isomorphic to deleting "emptied" variables, et cetera.

Major difference: PID does not define any variables. It just expands the expression $H (X)$ quantifying the total entropy into the sum of entropies of those atoms. Which is why the entropy of all atoms is able to add up to the total entropy with no complications, by the way: we have no trouble subtracting synergistic-information entropy.

One issue with PID worth mentioning is that they haven't figured out what measure to use for quantifying multivariate redundant information. It's the same problem we seem to have. But it's probably not a major issue in the setting we're working in (the well-abstracting universes).

And if we're assuming exact abstraction in our universe, I expect we'd get exact correspondence: every PID information atom would correspond to an abstraction factor, and the factor's entropy would be the value of that PID atom.

Overall, the fact that the framework incrementally built up by me coincidentally and near-perfectly matches another extant framework is a good sign that I'm getting at something real. In addition, PID fundamentally feels like the "right" way to do things, rather than being some arbitrary ad-hoc construction.

Also: this offers a new way to look at the problem. The goal is to find a correct "constructive" way to do partial entropy decomposition. The functions for learning the abstract-factor variables may, in fact, directly correspond to the functions defining the "correct" way to do partial entropy decomposition.

1.5. Abstractions and Synergistic Variables

Let's see what adding synergistic variables to the broader framework lets us do.

I'll be focusing on natural latents here. As stated in 1.1, in my framework, a natural latent $Λ$ can be defined as any set of abstraction factors with a total order (considered over the variables in the appropriate set conditioned on their other ancestral factors).

Let's consider a dog at the atomic scale. Intuitively, the correct theory of abstraction should be able to define the relationship between the dog and the atoms constituting it. However, there are complications:

We can't learn that we're looking at a dog by inspecting any given atom constituting it. The answer to "what animal is this?" is not redundantly encoded in each atom; it is not a natural latent (or an abstraction factor) over those atoms.
We don't need to look at every atom constituting the dog at once to conclude we're looking at a dog. Very small subsets (the head, the paw, the heart, a DNA string) would suffice. Which means "a dog" is not the value of the synergistic variable over all of those atoms.
It's also not the value of the synergistic variable over any specific subset of those atoms. Looking either at the head or at the paw or at any one of the dog's DNA strings would suffice.
We can't use any "unsophisticated" measures, such as "this is a dog if we can learn that it's a dog from looking at any random sufficiently large subset of the atoms constituting it". The size of the subset changes depending on which part we're looking at. We'd need dramatically fewer atoms if we happened to sample a contiguous volume containing a DNA string, than if we sample individual atoms all around the dog's body. We need something more "context-aware".
We don't necessarily need to look at "atoms" at all. High-level abstract features would also suffice: the "shape" of the dog's macro-scale "head", or the sample of its molecular-scale "DNA", or the "sound" of its "bark"...
And the "dog-ness" is invariant under various perturbations, but those perturbations are also "context-aware". For example, different dog-skulls may have significant macro-scale differences while still being recognizable as dog-skulls, whereas some comparatively tiny modifications to a DNA string would make it unrecognizable as dog DNA.

What I propose is that, in this context, "a dog" is the value of the natural latent over (functions of) specific synergistic variables. "A DNA string", "the shape of this animal's skull", "the sounds this animal makes", "the way this animal thinks" all let us known what animal we're looking at; and they're exactly the features made independent by our knowing what animal this is; and, intuitively, they contain some synergistic information.

However, that seems to just push the problem one level lower. Isn't "a DNA string" itself in same position as the dog relative to the atoms (or subatomic particles) constituting it, with all the same complications? I'll get to that.

First, a claim: Every natural latent/abstraction $Λ$ is a function of the synergistic variable over the set of "subvariables" ${x_{i}^{1}, \dots x_{i}^{m}}$ constituting the variables $x_{i} = f_{i} ({x_{i}^{1}, \dots, x_{i}^{m}})$ in the set of variables $X = {x_{1}, \dots, x_{n}}$ for which $Λ$ is a natural latent. (Degenerate case: the synergistic variable s over a one-variable set ${x}$ is $x$ .)

Let me unpack that one.

We have some set of variables $X = {x_{1}, \dots, x_{n}}$ over which a natural latent $Λ$ is defined – such as a set of n dogs, or a set of n features of some object. $Λ$ is a function which can take any $x_{i}$ as an input, and return some information, such as properties of dogs (or e. g. nuclear reactors).

But those individual $x_{i}$ are not, themselves, in the grand scheme of things, necessarily "atomic". Rather, they're themselves (functions of) sets of some other variables. And relative to those lower-level variables – let's dub them "subvariables" $x_{i}^{j}$ , with $x_{i} = f_{i} ({x_{i}^{j}})$ – the function $Λ : {x_{i}^{j}} \to label$ is a function whose value is dependent on the synergistic variable.

Example: Consider a set of programs $P = {P_{1}, \dots, P_{n}}$ which all implement some function F, but which implement it in different ways: using different algorithms, different programming languages, etc. The set of programs is independent given $F$ : it's a natural latent/abstraction over it. But each program $P_{i}$ itself consists of a set of lower-level operations ${p_{i}^{j}}$ , and relative to those, $F$ is a synergistic variable: all ${p_{i}^{j}}$ must be in the correct places for the emergent behavior of "implements $F$ " to arise. Simultaneously, the presence of any specific basic operation $p_{i}^{j}$ , especially in a specific place in the program's code, is not required, so $F$ provides little information about them.

Another example: Consider the difference between "this specific dog" and "the concept of a dog". What we'd been analyzing above is the "this specific dog" one: an abstraction redundantly represented in some synergistic features/subsystems forming a specific physical system.

But "the concept of a dog" is a synergistic variable over all of those features/subsystems. From the broader perspective of "what kind of object is this?", just the presence of a dog's head is insufficient. For example, perhaps we're looking at a dog's statue, or at some sort of biotech-made chimera?

Here's what I think is going on here. Before we went "what type of animal is this?" $\to$ "this is a dog", there must have been a step of the form "what type of object is this?" $\to$ "this is a ordinary animal". And the mutual information between "dog's head", "dog DNA", and all the other dog-features, only appeared after we conditioned on the answer to "what object is this?"; after we conditioned on "this is an ordinary animal".

If we narrow our universe of possibilities to the set of animals, then "a dog" is indeed deducible from any feature of a dog. But in the whole wide world with lots of different weird things, "a dog" is defined as an (almost) exhaustive list of the properties of dog-ness. (Well, approximately, plus/minus a missing leg or tail, some upper/lower bounds on size, etc. I'll explain how that's handled in a bit.)

Those descriptions are fairly rough, but I hope it's clear what I'm pointing at. Conditioning some variables on some latent while treating that latent as encoding synergistic information over those variables may create redundant information corresponding to a different valid natural latent. There's also a natural connection with the idea I outlined at the end of 1.3, about synergistic variables creating probabilistic structures.

This is also how we handle the "downwards ladder". Conditional on "this is an animal", "this is a dog" is redundantly represented in a bunch of lower-level abstract variables like "DNA" or "skull". Whether a given structure we observe qualifies as one of those variables, however, may likewise be either information redundant across some even-lower-level abstractions (e. g., if we observe a distinct part of dog DNA conditional on "this is part of a whole DNA string"), or it may be synergistic information (if the corresponding universe of possibilities isn't narrowed down, meaning we need to observe the whole DNA string to be sure it's dog DNA).

(Note: In (1.1), I described situations where natural latents are valid over some set only conditional on other natural latents. The situation described here is a different way for that to happen: in (1.1), conditioning on other latents removed redundant information and let our would-be natural latent induce independence; here, conditioning on a latent creates redundant information which the new natural latent is then defined with respect to.)

Moving on: I claim this generalizes. Formally speaking, every natural latent/abstraction $Λ$ is a function of the synergistic variable over the set of subvariables ${x_{i}^{1}, \dots x_{i}^{m}}$ constituting the variables $x_{i} = f_{i} ({x_{i}^{1}, \dots, x_{i}^{m}})$ in the set of variables $X = {x_{1}, \dots, x_{n}}$ for which $Λ$ is a natural latent. Simultaneously, $Λ$ is the redundant information between the $x_{i}$ variables. (And, again, the synergistic variable $s$ over a one-variable set ${x}$ is just $s = x$ .)

General justification: Suppose that $x_{i} = f_{i} ({x_{i}^{1}, \dots x_{i}^{m}})$ contains all necessary information about $Λ$ even without some subvariable $x_{i}^{j}$ . That is,

I (Λ; {x_{i}^{1}, \dots, x_{i}^{m}} ∖ {x_{i}^{j}}) = I (Λ; {x_{i}^{1}, \dots, x_{i}^{m}})

In that case... Why is $x_{i}^{j}$ there? Intuitively, we want $x_{i}$ to be, itself, the minimal instance of the latent $Λ$ , and yet we have some random noise $x_{i}^{j}$ added to it. Common-sense check: our mental representation of "a dog" doesn't carry around e. g. random rocks that were lying around.

On the contrary, this would create problems. Suppose that we feel free to "overshoot" regarding what subvariables to put into $x_{i}$ , such that sometimes we include irrelevant stuff. Then variables in some subset $X_{1} \subset X$ may end up with the same unrelated subvariables added to them (e. g., dogs + rocks), and variables in a different subset $X_{2} \subset X$ may have different unrelated subvariables (e. g., blades of grass). "A dog" would then fail to make all variables in $X$ independent of each other.

Indeed, $X$ would not have a natural latent at all. We'd end up with two abstractions, "dog + rocks" and "dog + grass"... Except more likely, there'd be much more different types of unrelated subvariables added in, so we won't be able to form a sufficiently big dataset for forming a "dog" abstraction to begin with.

So the "nuisance" subvariables end up "washed out" at the abstraction-formation stage. Meaning all subvariables constituting each $x_{i}$ are necessary to compute $Λ$ .

Admittedly, this still leaves the possibility that $Λ$ is a function of the unique informations in each $x_{i}^{j}$ , not the synergistic information. I think it makes less intuitive sense, however:

Abstract objects seem to be distinct entities, rather than just the sums of their parts.
This would mean that learning that some set of subvariables $x_{i}^{j}$ is of the type $Λ$ would not create mutual information between all of them: we would not know/suspect what specific "pieces" there are.

Nitpick: Suppose that the set ${x_{i}^{1}, \dots, x_{i}^{m}}$ is such that every $k$ -sized subset of it has a nontrivial synergistic variable which contains the same information.

Example: Shamir's scheme, where knowing any $k$ shares out of $n$ , with $n$ arbitrarily larger than $k$ , lets you perfectly decrypt the ciphertext, but knowing even one share fewer gives you zero information about the plaintext. Intuitively, all those $n$ shares should be bundled together into one abstraction... but the above argument kicks out all but some random subset k of them, right?

And we can imagine similar situations arising in practice, with some abstractions defined over any small subset of some broader set of e. g. features. See the previous dog example: dogs usually have four legs, but some of them have three; most have tails, but some don't; etc.

But the framework as stated handles this at a different step. Recall that we compute the synergistic variables for all subsets, add them to the pool of variables, and then try defining redundant-information variables for all subsets. Since the synergistic variables for all subsets of Shamir shares contain the same information, they would be bundled up when we're defining redundant-information variables.

Similar with dogs: we'd end up with sets of features sufficient for a dog, without there necessarily being a minimal set of features which is both (1) sufficient for "this is a dog", and (2) is a subset of each of those sets.

You may argue it is clunky. It very much is; a better, more general way will be outlined in the next part.

Clarification: some things which I wasn't necessarily arguing above.

The latent $Λ$ is not necessarily only the synergistic information over ${x_{i}^{1}, \dots x_{i}^{m}}$ . There may be some mutual information between $Λ$ and the individual $x_{i}^{j}$ , meaning $Λ$ may contain unique or redundant informations as well. Though note that it gets tricky in terms of the wider framework we've built up from (1.1) onward. To restate, in it, natural latents are (sets of) abstraction factors, which are redundant informations in the set of variables ${x_{1}, \dots, x_{n}}$ . Therefore...
- If a natural latent also includes some information redundant across a given ${x_{i}^{1}, \dots x_{i}^{m}}$ , this implies this information is redundant across all $x_{i}^{j}$ , for all $i \in [1 : n]$ and all $j \in [1 : m]$ . Which raises a question: shouldn't it be an abstraction over all $x_{i}^{j}$ s instead, not $x_{i}$ s?
- And if it includes unique information from a specific $x_{i}^{j}$ , this implies that there's a subvariable containing this type of information in every $x_{i}$ .
The latent $Λ$ does not necessarily contain all information in the synergistic variable. Some synergistic variables may contain information unique to them.
The sets of subvariables ${x_{i}^{1}, \dots x_{i}^{m}}$ does not necessarily have a nontrivial natural latent itself.
- (It works in the dog/animal case, and I struggle to come up with a counterexample, but this isn't ruled out. Maybe there's an intuitive case for it I don't see yet...)
- Note that if we claim that, it would create a full "ladder to the bottom": if every set of sub-variables forms the well-abstracting set for the lower level, then each of those subvariables is made up of sub-subvariables representing even-lower-level abstraction, etc. Which is an interesting claim, and it fits with the dog $\to$ DNA example...
  - This is the reason we might want to modify our algorithm to allow "synergistic variables over synergistic variables", by the way, as I'd briefly mentioned in 1.3.
Similarly, the set ${x_{1}, \dots, x_{n}}$ does not necessarily have a nontrivial synergistic variable.
- (Again, it works in the dog case, organs/dog-features over which "this specific dog" abstracts synergistically defining "the concept of a dog". But what's the synergistic information in "this bunch of randomly selected dogs" over which "the concept of a dog" abstracts?)

Aside: I think there's also a fairly straightforward way to generalize claims from this section to causality. Causality implies correlation/mutual information between a system's states at different times. The way this information is created is by conditioning a history (a set of time-ordered system-states) on a synergistic variable defined over it, with this synergistic variable having the meaning of e. g. "this is a Newtonian-mechanics system". This likewise naturally connects with/justifies the end of 1.3, about interpreting synergistic variables as information about the distribution, rather than about specific joint samples.

Summing up:

Natural latents seem to be functions of the synergistic information of the "subvariables" of the variables they abstract over. Information about latents is redundantly represented in those synergistic variables. (In a way, you can think about those synergistic variables as "markers" for the abstractible objects; or perhaps as defining their boundaries.)
Conditioning subvariables on the natural latent synergistic with respect to them may lead to the formation of a different natural latent, by way of creating information redundant across them with respect to which that different natural latent may be defined.
This lets us handle some tricky cases where a natural latent, intuitively, has the properties of both a synergistic and a redundant-information variable.
- In the dog example, "this is a dog" is either synergistic over all dog-features, or redundant across those same features if those features are conditioned on the synergistic information "this is an ordinary animal".
- In turn, the features across which "this is a dog" is redundant are themselves lower-level natural latents, which may also be either synergistic over some subvariables, or redundant across some subvariables (if the subvariables are also conditioned on some appropriate abstraction).
It's all (partially) theoretically justified by the demands of forming good abstractions to begin with.

1.6. Summary

Part 1 outlined my high-level model of how we can learn an abstraction hierarchy given a clean set of low-level variables $X$ out of which it "grows".

Namely: we can cast it as a "constructive" cousin of partial information decomposition. The algorithm goes as follows:

Compute the synergistic variable for each subset of the set of initial variables $X$ .
Define $X^{*} = X \cup S$ , where $S$ is the set of all synergistic variables, "forgetting" that the synergistic variables are synergistic.
For every subset $X_{i}$ of the expanded pool of variables $X^{*}$ , we compute the variable recording information redundant across that subset, $q_{i}$ .
If there's a pair of $q_{i}$ , $q_{k}$ such that $X_{i} \subset X_{k}$ , re-define $q_{i}$ , removing all information present in $q_{k}$ .
If a given $q_{i}$ ends up with zero (or some $ϵ$ ) information, delete it.
If there's a pair of surviving $q_{i}$ , $q_{k}$ such that $X_{i} \subset X_{k}$ , define their relationship as $q_{i} ≻ q_{k}$ .

Each $q$ -variable can then be considered an "abstraction factor". Any set of abstraction factors with a total ordering functions as a natural latent for the lowest-level factor's children conditional on that children's other ancestral variables.

This setup has a rich set of features and functionalities, in line with what we'd expect the theory of abstraction to provide, and it seems to easily handle a wide variety of thought experiments/toy models/case studies. Notably, it offers a way to deal with situations in which abstractions seem to share the properties of redundant-information variables and synergistic-information variables: by defining abstractions as natural latents of functions of synergistic variables.

One notable unsolved fatal problem/small inconvenience is that the presence of synergistic variables directly opposes the idea of producing the minimal representation of the initial set of variables. I've not dealt with this yet, but I expect there to be a simple conceptual explanation. A promising line of argument involves treating the probabilistic structure as just higher-abstraction-level random variables, which increases the total entropy to which our decomposition should add up to.

Another problem is, of course, the computational intractability. The above algorithm features several steps steps where we learn a specific variable for every subset of a (likely vast) amount of low-level variables we're given. Obviously, a practical implementation would use some sort of heuristics-based trick to decide on variable-groupings to try.

All those caveats aside, the problem of learning an abstraction hierarchy (the way it's defined in my framework) is now potentially reducible to a machine-learning problem, under a bunch of conditions. That is:

If we can figure out what exact definitions to use for synergistic and redundant variables,
And there are no other underlying fundamental problems,
And we're given a set of variables on which the abstract hierarchy is defined, plus a sufficiently big dataset for training,
Then we can set up the ML training process for learning the abstraction hierarchy.

(3) is actually a deceptively major problem. Next part is about that.

What I'm particularly interested in for the purposes of the bounties: Is there some better way to handle synergistic information here? A more "correct" definition for synergistic-information variables? A different way to handle the "overcounting" problem?

AI ALIGNMENT FORUM
AF