Scarce Channels and Abstraction Coupling

Epistemic Status: mental model and intuitive story

Scarce Channels vs Scarce Modules

Let’s distinguish between two kinds of system-regimes: “scarce channels” and “scarce modules”.

A prototypical “scarce modules” system would be one of those 19th-century families living with 12 people in a 500 square foot (46 square meter) home. When at home, everyone knows what everyone else is doing all the time; there is zero privacy. Communication channels are highly abundant - everyone has far more information than they want about what everyone else is doing. Indeed, communication channels exist by default. Conversely, though, modules are scarce - it’s hard for one or more family members to carve out a part of the space which is isolated from the rest of the family, and interacts only through some limited channels.

A prototypical “scarce channels” system, by contrast, would be a few hundred 19th-century fur trappers spread out over half of Montana. Most of the time, none of them are anywhere near each other; nobody has any idea what’s going on with anyone else. Communication channels are scarce - getting information to another person is difficult and expensive. Conversely, though, modules are highly abundant - it’s very easy for one or a few trappers to carve out a space which is isolated from the rest, and which interacts only through some limited channels (like e.g. occasionally visiting the nearest town). Indeed, modules exist by default.

I want to use this as a mental model for complex adaptive systems, like neural nets or brains.

Key hypothesis: neural nets or brains are typically initialized in a “scarce channels” regime. A randomly initialized neural net generally throws out approximately-all information by default (at initialization), as opposed to passing lots of information around to lots of parts of the net. A baby’s brain similarly throws out approximately-all information by default, as opposed to passing lots of information around to lots of parts of the brain. I’m not particularly going to defend that claim here; rather, I raise it as a plausible hypothesis for how such systems might look, and next we’ll move on to an intuitive story for how an adaptive system in the “scarce channels” regime interacts with natural abstractions in its environment.

The upshot is that, when an adaptive system is in the “scarce channels” regime, lots of optimization pressure is required to induce an information channel to form. For instance, picture such a system as a bunch of little pieces, which initially don’t talk to each other at all:

The system is initially composed of many little parts, which are all approximately independent; approximately zero information flows between them.

In order for an information channel to form from one end to the other, each of the individual pieces along the line-of-communication need to be individually optimized to robustly pass along the right information:

Optimization pressure causes an information channel to form, by optimizing each piece along the path to pass along the relevant information.

So, intuitively, the number of bits-of-optimization required to form that information channel should scale roughly with the number of pieces along the line-of-communication.

Furthermore, when information channels do form, they should be approximately as small as possible. Optimization pressure will tend to induce as little information passing as the system can get away with, while still satisfying the optimization criterion.

Abstraction Coupling

Next question: what sort of patterns-in-the-environment could induce communication channels to form?

Well, here’s a situation where communication channels probably won’t form: train a neural net in an environment where the reward/loss its output receives is independent of the input. Or, for a generative net, an environment where the tokens/pixels are all independent.

More generally, suppose our adaptive system interfaces with the environment in two different places (and possibly more, but we’re choosing two to focus on). Think two token or pixel positions for a generative net, or a particular observation and action taken by a human. If those two different parts of the environment are independent, then presumably there won’t be optimization pressure for the adaptive system to form an information channel between the corresponding interface-points.

In other words: in order for an information channel to form inside the system between two interface points, presumably there needs to be some mutual information between the corresponding parts of the environment just outside the two interface points - and presumably it’s that information which the internal channel will be selected to carry.

X and X’ have a lot of mutual information, passed around through the environment. It’s that mutual information (or functions of it) which, intuitively, the internal information channel is selected to carry. If there were no mutual information between the two, then intuitively, the internal information channel wouldn’t be selected for.

This is especially interesting insofar as the interface-points are “far apart” in both the environment and the adaptive system - for instance, two pixel-locations far apart in standard-sized images, and also far apart in the internal network topology of a net trained to generate those images.

When the interface-points are “far apart” in the environment, insofar as we buy the information at a distance version of natural abstraction, the only mutual information between those points will be mediated by natural abstractions in the environment.

On the internal side, scarce channels means that a lot of optimization pressure is needed to induce an information channel to form over the long internal distance between the two interface points. So, such channels won’t form by accident, and when they do form they’ll tend to be small.

Put those two together, and we get a picture where the only information passed around long-range inside the system is the information which is passed around long-range outside the system. Insofar as information passed around long-range is synonymous with natural abstract summaries: the only abstract summaries inside the system are those which match abstract summaries outside the system.

We could also tell this story in terms of redundant information, instead of information at a distance. Internally, the only information passed around to many interface-points will be that which matches information which redundantly appears on the environment side of all those interface-points. High redundancy in the environment is required to select for high redundancy internally. So, information which is redundantly represented in many places inside the system, will match information redundantly represented in many places outside. Again, natural abstractions inside the system match those outside.

(Though note that there may still be abstract summaries in the external environment which do not show up internally; the story only says that abstract summaries which do show up internally must also be in the environment, not vice-versa.)

Potential Failure Modes of This Argument

Imagine that my system has two potential-information-channels, and parameters are coupled in such a way that if we select for one of the channels, the other “comes along for the ride”. There’s an “intended” channel, and an “unintended” channel, but if those two channels can structurally only form or not-form together, then selection pressure for one also selects for the other.

More generally: our story in the previous section is all about optimization pressure on the system selecting for a particular information channel to form - the “intended” channel. But the internal structure/parameters of the system might be arranged in such a way that the channel we’re selecting for is “coupled to” other “unintended” channels, so that they’re all selected for at once. For instance, maybe in some evolving biological organism, some information-passing optimization pressure ends up selecting for general-purpose information-passing capability like e.g. neurons, and those neurons pass a bunch of extra information along by default.

Or, worse, there might be some kind of optimization demon inside the system - something which can “notice” the external optimization pressure selecting for the intended channel to form, and couple itself to the intended channel, so that the demon is also selected-for.

In all these cases, I’d intuitively expect that there’s some kind of “derivative” of how-strong-the-intended-channel-is with respect to how-strong-the-unintended-channel/demon is, maybe averaged over typical parameter-values. If that derivative is negative, then the unintended channel/demon should not be selected for when optimization pressure selects for the intended channel. That criterion would, ideally, provide a way to measure whether a given system will form unintended channels/demons. Or, in other words, that criterion would ideally allow us to check whether the “internal abstractions match external abstractions” argument actually applies to the system at hand. I don’t yet know how to operationalize the criterion, but it seems like a useful open problem. Ideally, the right operationalization would allow a proof that optimization on systems which satisfy the criterion will only amplify internal information-channels matching the external abstractions, thereby ruling out demons (or at least demons which pass information over long distances within the system).

Summary

The basic story:

We suppose/claim that most adaptive complex systems (like brains and neural nets) operate in a scarce channels regime, i.e. it takes a bunch of optimization pressure to make the system pass around more information internally.
Intuitively, we then expect that the only information passed around over long distances internally, will be the information which is relevant externally to far-away interface points between the system and environment.
Thus, internal abstractions should be selected to match external abstractions.

One major loophole to this argument is that channels might be structurally coupled, such that selecting for one selects for both. But I’d expect that to be measurable (in principle) for any particular system, via some operationalization of the derivative of the strength of one channel with respect to the strength of another. Ideally, operationalizing that would allow us to prove that certain systems will only tend to develop information channels matching natural abstractions in the environment, thereby ruling out both spurious internal abstractions and any demons which pass information over long internal distances.

Thank you to David Lorell for helping develop these ideas.

AI ALIGNMENT FORUM
AF