Synthesizing Standalone World-Models, Part 2: Shifting Structures

This is part of a series covering my current research agenda. Refer to the linked post for additional context.

Let's revisit our initial problem. We're given the lowest-level representation of a well-abstracting universe, and we want to transform it into its minimal representation / the corresponding well-structured world-model. The tools introduced in Part 1 are insufficient for that: there are two more problems left. This part focuses on one of them.

Key example: Imagine looking at a glider in Conway's Game of Life. At the first time-step, it may occupy the coordinates . As time goes on, it would gradually migrate, diagonally moving in the bottom-right direction at the speed of one cell per three time-steps.

If we take the individual cells to be the random variables to abstract over, in what sense is the glider an abstraction over them? It's a "virtual object", drifting across them.

But if it's not an abstraction over the cells, then over... what? There aren't really any other elements in the picture! Maybe it's an abstraction over sets of cells, with cells being "subvariables" over which the glider is synergistic? But how exactly do we define those sets?

(Perhaps it's a synergistic variable over the whole grid/world? I've tried that. That approach was instructive, but it got very clunky very fast.)

Once you start looking for it, you start spotting problems of this sort all over the place.

2.1. Wall O' Examples

Consider a human mind in a high-tech civilization. At one moment, it may be implemented on biological neurons; at another, it may jump to some artificial substrate; then the lower-level algorithms on which it functions may be changed without change in high-level behavior; then it may be transferred to another star system encoded in a bunch of photons. Is there any stable set of lower-level random variables over which it is an abstraction?
- Now imagine if we try to model that as a synergistic variable over the whole world. That synergistic variable would have to be defined as drastically different, basically custom-built functions of every specific world-state. It'd have to be able to "process"/"attach to" any possible representation the upload may have.
- I've tried variants on that approach. It was instructive, but got very clunky very fast. What we need is some general framework for allowing synergistic variables to "see through" changes in representation…
The much-abused dog example fares no better. We say that "this specific dog" is an abstraction over some lower-level subsystems "marked" by synergistic variables. But the actual positioning of those subsystems isn't stable, nor even their number. How many DNA-containing cells does "a dog" have?
- This is somewhat handled by the thing about Shamir's scheme in 1.5, but it gets very clunky very fast. Especially if we try to apply it to "virtual objects" in general, see the upload case above. It's a very "low-tech" solution.
If we consider "the concept of an animal", it gets even worse. Some animals don't have eyes, some have echolocators; their digestive, circulatory, and nervous systems may function in completely different ways. What are the features/subvariables in the "default representation" of "an animal" which, if we condition their values on a higher-level variable, would transform into any possible animal?
- We can model it using some sort of sufficiently high-dimensional space, of course. But then it more or less becomes an $m$ -dimensional space where $m$ is the number of animal types, and where the animal definitions are one-hot vectors. Again, what's the actual form of the low-level variables the abstraction is defined over, here? What form could it be?
As outlined in a previous section, the function $F$ a program implements can be considered an abstraction over all possible implementations of this function $P_{i}$ , with each $P_{i}$ being a construction ${p_{i}^{1}, \dots, p_{i}^{m}}$ made up of some combinations of basic functions. Except, uh, both the number of the functions and their dictionary changes between the implementations. Again, what's the implementation-agnostic default program template that we're implicitly using here, the subvariables of the variable over which we abstract (and which doesn't reduce to just one-hot encodings for each possible program)?
- I suppose we can project to e. g. the parameter space of neural networks. But the mapping has many free parameters, and even if we use a fixed-size fixed-architecture representation, neuron#42312 doesn't necessarily implement the same higher-level operation in every sample, so we can't use some fixed way to abstract up.
Or consider your eyes. We can model them, or their constituent "pixels"/cone cells, as some random variables. How are you supposed to use them to find the world's abstractions? The abstract structures mapped to them constantly change: they "drift" like the Conway glider, they jump representations like the upload (looking at an event vs. seeing its video vs. reading a newspaper article about it...), they get stochastically resampled (switching TV channels, losing consciousness, boarding a random train and ending up in an unfamiliar place)…
LLMs deal with the same issue: they always get fed a vector of a pre-defined size, mapped to the same set of input-variables, and they somehow know to connect that to different in-world structures depending on what's in it. How, in information-theoretic terms?
Similar happens for various in-the-world systems. A newspaper's text, for example: much like observation-variables, it "connects" to different structures in the world, depending on the newspaper issue. How do we process that data, infer what it's connected to?
- (I'd tried modeling those switches as, well, literal "switch-variables", which control at what part of the world we look, and which get resampled as well. That was instructive, but got very clunky very fast.)
Similar issue occurs if we're looking at an $n$ -vector representing a low-level universe-state from a god's-eye perspective and watch it get resampled (either randomly, or by a state transition to the next time-step). It's full of such virtual objects and changing structures. We can't start "unraveling" it in any white-box way without figuring out how to deal with this.
- (We can just say "we have a hypercomputer so we just try all possible representations until finding the minimal one", but that's not helpful. Indeed, that's basically just another one-hot encoding / lookup table mapping the whole $n$ -vector to different values. We're not doing any "abstracting" at all here, information-theoretically.)

2.2. Grasping the Problem

The program example actually hints at what the issue is. The function $F$ is very, very agnostic regarding its lower-level implementation. There's an infinite number of programs that implement it: a two-bit adder can be implemented as a bunch of NAND gates, or XOR gates, or Lisp functions, or as transistors implementing NAND gates, or as an analog computer made up of water streams, or as a set of star systems exchanging asteroids, or as the simulation of a vast civilization tasked with summing the bits...

Same with the abstractions in general. The higher level of abstraction is largely agnostic regarding the nature of the lower level implementing it. As long as the lower level passes some minimal threshold of expressivity, it can implement a dynamic with ~any emergent behavior. Meaning, in a vacuum, there's ~zero mutual information between the higher level and the laws of the lower level.

An "animal" could be represented using a wide variety of "programs" of different compositions; same with "a human mind"; same with "a market". Once we lock down a specific "abstraction operator" – the map from the lower to the higher level – the lower level is fixed, and the higher level is a valid natural latent over it. But if we go top-down, holding the higher-level abstract structure fixed but not receiving any information about the lower level, the lower-level's structure (not just values) isn't fixed; it's only constrained to the infinite number of structures over which the higher level would be a valid natural latent.

Symmetrically, if we go bottom-up, there's not any sense in which a specific set of labeled lower-level variables necessarily has the same relationship with any given higher-level variable; or that any given higher-level variable survives a resampling. Depending on the nature of the resampling (random like in "god's-eye view", or deterministic like in Conway's, or stochastic-but-not-independent like in head movements), the structures may drift, switch representations, or change outright. So what we need is to... somehow... learn new abstractions from one sample?

Summing up: Usually, when we're facing a structure-learning problem, we assume that the structure is fixed and we get many samples for it. Here, the structure itself gets resampled. (See also: the bit in 1.3 regarding how we'll need to figure out how to deal with things that "look like" probabilistic structures from one angle, and like random variables from a different one.)

2.3. Exploiting the JPD Over Abstractions

Let's suppose we do have that hypercomputer after all. Further, suppose that we've used it to compute the joint probability distribution over the set of all possible abstractions.

That is: we've developed the ability to learn, by looking at a specific type of variable, what other abstractions are typically present in a universe containing this type of variable. Each corresponding joint sample would be weighed by e. g. the simplicity prior, meaning the abstractions' sheer agnosticism regarding lower levels would be ameliorated by us defining a probability distribution over them. Some examples:

If we see a human, we'd consider it likely that there are other humans around, and perhaps that "government" abstractions are active. Depending on the human's clothes, we can update towards or against various types of technology existing. Seeing a human would also make it very unlikely that e. g. any hostile-to-humans superintelligent entities are around.
If we see the "democracy" abstraction, we would upweight the possibility that there's a set of some sort of agents at the lower level, with non-orthogonal but also not-exactly-the-same values, implementing a civilization. It's possible that a bunch of star systems lobbing asteroids at each other spontaneously assembled into a system for which "a democracy" is a good abstraction, but that's pretty unlikely.
If we see a sapient mind of a particular form, we'd consider it likely that this universe's history involved some sort of evolutionary process. It's possible that it's a Boltzmann brain, but, conditioning on a well-abstracting universe, that's unlikely.
If we see a dog at one moment, we'd consider it likely that the same dog will be present at the next moment, assuming the state-transition function mapping this moment to the next one implements some sort of simple physics.
Seeing a system implementing a spiral galaxy would presumably nontrivially upweight the probability of a lower level containing stars and planets.

Et cetera. By conditioning on "we are in a universe that is overall well-abstracting", we can derive posterior distributions over probabilistic structures, rather than just the values of fixed variables in fixed structures.

Incidentally, this can be viewed as a generalization of the general abstraction machinery, rather than some new addendum. When we condition on "we're looking at a dog", we don't actually deterministically deduce that it has a head, four legs, a heart, and a tail. We merely update our distribution over lower-level structures/features to strongly expect this set of features. We also place a nontrivial amount on "three legs, one tail", yet assign epsilon probability to "no head" (assuming alive dog) or "twenty heads". (This is the promised better way to handle the nitpick in 1.5.)

But there's also a bunch of other, softer inferences we make. For example, if we're searching for a specific type of virus that tends to infect dogs, our probability distribution over the locations of this virus' instances would shift towards the volume the dog's body occupies. We'd also infer some distribution over the number of "dog cell" abstractions present at the lower level, and the atomic contents of the volume in question, et cetera. Those are all, again, best modeled as inferences about structures, not values (or, well, as about structure-valued variables).

In terms of "upwards" inferences, we'd expect the "dog concept" abstraction (not only this-dog) to be present in that world, and e. g. allocate nontrivial amount of probability mass to "Earth-like biosphere and evolutionary history".

In terms of "sideways" inferences (abstractions at the "same level" as the dog), we'd expect more dogs or the dog's owner to be nearby (though, hmm, this might actually route through conditioning on the higher-level "dog concept" variable, which both creates new variables and synergistically creates mutual information between them and the dog).

So, summarizing: In practical cases, it's impossible to deterministically deduce / uniquely constrain the structures you're trying to learn. However, due to the fact that the "well-abstractibility prior" is a very specific entity, it's possible to define posterior probability distributions over those structures.

2.4. The Need for Truesight

Let's take a look at the current problem from a different angle. Consider the following:

Conway's glider vs. the same glider shifted some distance bottom-right.
A human mind switching computational substrates and lower-level algorithms without change in function.
The dog that's the same at the high level, but with slightly different numbers of cells, or slightly different organ placements.
The concept of an animal, applicable to systems that function in wildly different ways.
The concept of a dog, applicable to different pictures of different dogs.
The function $F$ implemented in a variety of different ways on different programming languages.
The same real-world structure perceived through…
- ... your eyes.
- ... your eyes, but after you walked a few meters to the left and slightly rotated your head.
- ... your ears.
- ... someone's verbal description of it.
- ... a news article.
- ... a video.
- ... a metaphorical poem.
A set of mathematical theorems supported by the same set of axioms.
A high-level system vs, the corresponding low-level system plus some simplifying assumptions.
Technically correct-and-complete descriptions of the same event by two newspapers with different political biases.
The plaintext message vs. the ciphertext and the key to it.

Each of those examples involves different representations of what's ultimately the same thing, but transformed in a way such that the similarity is very non-trivial to spot.

What this immediately reminds me of, with all the photo examples, are CNNs.

CNNs' architecture has translation symmetry built-in: their learned features are hooked up to machinery that moves them all across the image, ensuring that their functionality is invariant under changes in the specific position of the features in the image. What we want is a generalization of this trick: a broad "suite of symmetries" which we "attach" to the inputs of our learned abstraction-recognition functions, to ensure they don't get confused by the abstract equivalent of shifting a dog in a picture slightly to the left.

I. e., we want to give our abstraction operators "truesight": empower them to zero-shot the recognition of already-known abstractions under changes in representation.

In the general case, that's impossible. After all, there's a function for mapping anything to anything else, more or less. No free lunch.

But we're not working with the general case: we're working with the samples from the set of simple well-abstracting universes. For any given set of abstractions that we're already seeing, there's a very specific posterior distribution over what abstractions we should expect to discover next. Which corresponds to a probability distribution over what "lens" we should try putting onto our feature-detectors to spot the familiar abstractions in new data.

In other words: we have a probability distribution over sequences of transformations of the data we should attempt to try, and we keep sampling from it until we spot a familiar abstraction; until we find a way of looking at the data under which it resolves into a known, simple pattern. (Where "simplicity" is defined relative to our learned library of abstractions.)

Which is to say, we engage in qualitative research. We switch between various representations until finding one in which the compression task looks trivial to us.

I think this problem/process is, indeed, literally the underlying "type signature" of a large fraction of human science and research efforts.

Some other examples of when it happens:

Reading an ambiguous sentence, and trying to pick one interpretation out of several, by trying each of them on "for size" and figuring out which would "make the most sense" within the context (i. e., simplify your model of that context).
Looking at a confusing painting, and figuring out what you're looking at by quickly flipping through several explanations ("maybe it's a face? maybe that's an arm?") until the picture snaps together.
Analyzing a table of numbers in search of a pattern, and eyeballing potential causal connections and trends ("do these two variables seem to change together? does this variable follow a periodic pattern? is this an exponential?").
Self-psychoanalysis (noticing a weird, illegible emotional response, and reverse-engineering it by iteratively generating verbal explanations and sounding them out/testing them, until one rings true).
Having a specific word/concept "at the tip of your tongue" that you think perfectly describes a situation, and searching for it by repeatedly sampling concepts/words that come to mind.
Seeing a number of clues, in e. g. a murder-mystery story, and generating a tree of potential explanations whose branches you eliminate until you're left with a single reality that generated all of those clues.

What I'm particularly interested in for the purposes of the bounties:

What ways are there of formalizing the machinery described? (There's one obvious way, but I'm wondering whether that framework could be avoided...)
Is there any extant research covering setups with such "resampled structures"?

AI ALIGNMENT FORUM
AF