Calculating Natural Latents via Resampling

johnswentworth; David Lorell

So you’ve read some of our previous natural latents posts, and you’re sold on the value proposition. But there’s some big foundational questions still unanswered. For example: how do we find these natural latents in some model, if we don’t know in advance what they are? Examples in previous posts conceptually involved picking some latent out of the ether (like e.g. the bias of a die), and then verifying the naturality of that latent.

This post is about one way to calculate natural latents, in principle, when we don’t already know what they are. The basic idea is to resample all the variables once simultaneously, conditional on the others, like a step in an MCMC algorithm. The resampled variables turn out to be a competitively optimal approximate natural latent over the original variables (as we’ll prove in the post). Toward the end, we’ll use this technique to calculate an approximate natural latent for a normal distribution, and quantify the approximations.

The proofs will use the graphical notation introduced in Some Rules For An Algebra Of Bayes Nets.

Some Conceptual Foundations

What Are We Even Computing?

First things first: what even is “a latent”, and what does it even mean to “calculate a natural latent”? If we had a function to “calculate natural latents”, what would its inputs be, and what would its outputs be?

The way we use the term, any conditional distribution

defines a “latent” variable $Λ$ over the “observables” $X$ , given the distribution $P [X]$ . Together $P [X]$ and $P [Λ | X]$ specify the full joint distribution $P [Λ, X]$ . We typically think of the latent variable as some unobservable-to-the-agent “generator” of the observables, but a latent can be defined by any extension of the distribution over $X$ to a distribution over $Λ$ and $X$ .

Natural latents are latents which (approximately) satisfy some specific conditions, namely that the distribution $P [X, Λ]$ (approximately) factors over these Bayes nets:

Natural latent conditions over 3 variables. Left: The components of $X$ are independent conditional on $Λ$ . Right: Each component of $X$ mediates between $Λ$ and the rest of $X$ . Epsilons indicate degree of approximation, measured by KL-divergence between the full distribution and the distribution factored according to the graph - see here for details.

Intuitively, the first says that $Λ$ mediates between the $X_{i}$ ’s, and the second says that any one $X_{i}$ gives approximately the same information about $Λ$ as all of $X$ . (This is a stronger redundancy condition than we used in previous posts; we’ll talk about that change below.)

So, a function which “calculates natural latents” takes in some representation of a distribution $(x \mapsto P [X])$ over “observables”, and spits out some representation of a conditional distribution $(λ, x \mapsto P [Λ = λ | X = x])$ , such that the joint distribution (approximately) factors over the Bayes nets above.

For example, in the last section of this post, we’ll compute a natural latent for a normal distribution. The function to compute that latent:

Takes in a covariance matrix $Σ_{X X}$ for $X$ , representing a zero-mean normal distribution $P [X]$ .
Spits out a covariance matrix $Σ_{Λ Λ}$ for $Λ$ and a cross-covariance matrix $Σ_{Λ X}$ , together representing the conditional distribution of a latent $Λ$ which is jointly zero-mean normal with X.
… and the joint normal distribution over $Λ, X$ represented by those covariance matrices approximately factors according to the Bayes nets above.

Why Do We Want That, Again?

Our previous posts talk more about the motivation, but briefly: two different agents could use two different models with totally different internal (i.e. latent) variables to represent the same predictive distribution $P [X]$ . Insofar as they both use natural latents, there’s a correspondence between their internal variables - two latents over the same $P [X]$ which both approximately satisfy the naturality conditions must contain approximately the same information about $X$ . So, insofar as the two agents both use natural latents internally, we have reason to expect that the internal latents of one can be faithfully translated into the internal latents of the other - meaning that things like e.g. language (between two humans) or interpretability (of a net’s internals to a human) are fundamentally possible to do in a robust way. The internal latents of two such agents are not mutually alien or incomprehensible, insofar as they approximately satisfy naturality conditions and the two agents agree predictively.

Approximate “Uniqueness” and Competitive Optimality

There will typically be more than one different latent which approximately satisfies the naturality conditions (i.e. more than one conditional distribution $(λ, x \mapsto P [Λ = λ | X = x])$ such that the joint distribution of $Λ$ and $X$ approximately factors over the Bayes nets in the previous section). They all “contain approximately the same information about $X$ ”, in the sense that any one approximate natural latent approximately mediates between $X$ and any other approximate natural latent. In that sense, we can approximately talk as though the natural latent is unique, for many purposes. But that still leaves room for better or worse approximations.

When calculating, we’d ideally like to find a natural latent which is a “best possible approximate natural latent” in some sense. Really we want a pareto-best approximation, since we want to achieve the best approximation we can on each of the naturality conditions, and those approximations can trade off against each other.

… but there’s a whole pareto surface, and it’s a pain to get an actual pareto optimum. So instead, we’ll settle for the next best thing: a competitively optimal approximate natural latent. Competitive optimality means that the natural latent we’ll calculate approximates the naturality conditions to within some bounds of any pareto-best approximate natural latent; it can only do so much worse than “the best”. Crucially, competitive optimality means that when we don’t find a very good approximate natural latent, we can rule out the possibility of some better approximate natural latent.

Strong Redundancy

Our previous posts on natural latents used a relatively weak redundancy condition: all-but-one $X_{i}$ gives approximately the same information about $Λ$ as all of $X$ . (Example: 999 rolls of a biased die give approximately the same information about the bias as 1000 rolls.) The upside of this condition is that it’s relatively general; the downside is that it gives pretty weak quantitative bounds, and in practice we’ve found that a stronger redundancy condition is usually more useful. So in this post, we’ll require “strong redundancy”: any one $X_{i}$ must give approximately the same information about $Λ$ as all of $X$ . (Example: sticking a thermometer into any one part of a bucket of water at equilibrium gives the same information about the water’s temperature.)

If we want to turn weak redundancy into strong redundancy, e.g. to apply the methods of this post to the biased die example, the usual trick is to chunk together the $X_{i}$ ’s into two or three chunks. For instance, with 1000 die rolls, we could chunk together the first 500 and the second 500, and either of those two subsets gives us roughly the same information about the bias (insofar as 500 rolls is enough to get a reasonably-precise estimate of the bias).

Conceptually, with strong redundancy, all of the $X$ -relevant information in a natural latent is represented in every single one of the $X_{i}$ ’s. For purposes of establishing that e.g. natural latents of two different agents contain the same information about $X$ , that means strong redundancy gives us “way more than we need” - we only really need strong redundancy over two or three variables in order to establish that the latents “match”.

The Resampling Construction

We start with a distribution $P [X]$ over the variables $X_{1} \dots X_{n}$ . We want to construct a latent which is competitively optimal - i.e. if any latent exists over $X_{1} \dots X_{n}$ which satisfies the natural latent conditions to within some approximation, then our latent satisfies the natural latent conditions to within some boundedly-worse approximation (with reasonable bounds). We will call our competitively optimal latent $X^{'}$ (pronounced “X prime”), for reasons which will hopefully become clear shortly.

Here’s how we construct $X^{'}$ . ~~Take “X”, then add an apostrophe, “‘“, like so -> X’~~ … and that was how David died. Anyway, to construct $X^{'}$ :

For each $i$ , obtain $X_{i}^{'}$ by sampling from the distribution of $X_{i}$ given all the other $X_{j}$ ’s.
Stack those up to obtain $X^{'}$ .

Mathematically, that means the defining distribution of the latent $X^{'}$ is

$P [X^{'} = x^{'} | X = x] = \prod_{i} P [X_{i} = x_{i}^{'} | X_{¯ i} = x_{¯ i}]$

The second graph indicates that $P [X_{i}^{'} = x_{i}^{'} | X_{¯ i} = x_{¯ i}] = P [X_{i} = x_{i}^{'} | X_{¯ i} = x_{¯ i}]$ for all $i$

Conceptually, we can think of this as a single resample step of the sort one might use for MCMC, in which we resample every variable simultaneously conditional on all other variables.

Example: suppose $X_{1}$ is 500 rolls of a biased die, $X_{2}$ is another 500 rolls of the same die, and $X_{3}$ is yet another 500 rolls of the same die. Then to calculate $X_{1}^{'}$ . I sample the bias of the die conditional on the 1000 rolls in $X_{2}$ and $X_{3}$ , then generate 500 new rolls of a die with my newly-sampled bias, and those new rolls are $X_{1}^{'}$ . Likewise for $X_{2}^{'}$ and $X_{3}^{'}$ (noting that I’ll need to sample a new bias for each of them). Then, I put all those 1500 rolls together to get $X^{'}$ .

Why would $X^{'}$ be a competitively optimal natural latent? Intuitively, if there exists a natural latent (with strong redundancy), then each $X_{i}$ encodes the value of the natural latent (approximately) as well as some “noise” independent of all the other $X_{i}$ ’s. When we resample, the natural latent part is kept the same, but the noise is resampled to be independent of the other $X_{i}$ ’s. So, the only information which $X^{'}$ contains about $X$ is the value of the natural latent. Of course, that story doesn’t give approximation bounds; that’s what we’ll need all the fancy math for.

In the rest of this section, we’ll show that $X^{'}$ satisfies the naturality conditions competitively optimally: if there exists any latent $Λ$ which is natural to within some approximation, then $X^{'}$ is natural to within a boundedly-worse approximation.

Theorem 1: Strong Redundancy => Naturality

Normally, a latent must approximately satisfy two (sets of) conditions in order to be natural: mediation, and redundancy. The latent must encode approximately all the information correlating the $X_{i}$ ’s (mediation), and each $X_{i}$ must give approximately the same information about the latent (redundancy). Theorem 1 says that, for $X^{'}$ specifically, the approximation error on the (strong) redundancy conditions upper bounds the approximation error on the mediation condition. So, for $X^{'}$ specifically, “redundancy is all you need” in order to establish naturality.

Some of the proof will be graphical, but we’ll need to start with one key algebraic step. The key step is this:

$D_{K L} (P [X, X^{'}] | | P [X^{'}, X_{j}] P [X_{¯ j} | X_{j}]) = E [l n P [X_{¯ j} | X_{j}, X^{'}] - l n P [X_{¯ j} | X_{j}]]$

$= E [l n P [X_{¯ j} | X_{j}, X^{'}] - l n P [X_{¯ j} | X_{j}^{'}]]$

$= D_{K L} (P [X, X^{'}] | | P [X^{'}, X_{j}] P [X_{¯ j} | X_{j}^{'}])$

The magic piece is the replacement of $E [l n P [X_{¯ j} | X_{j}]]$ with $E [l n P [X_{¯ j} | X_{j}^{'}]]$ ; this is allowed because, by construction, $(X_{¯ j}, X_{j})$ have the exact same joint distribution as $(X_{¯ j}, X_{j}^{'})$ . Graphically, that tells us:

Note that the left diagram is the strong redundancy condition for $X_{i}$ .

The rest of the proof is just a bookkeeping step:

So $X^{'}$ mediates between $X_{i}$ and $X_{¯ i}$ , for all $i$ .

Theorem 2: Competitive Optimality

To prove competitive optimality, we first assume that there exists some latent $Λ$ over $X$ which satisfies the (strong) natural latent conditions to within some bounds. Using that assumption, we want to prove that $X^{'}$ satisfies the (strong) natural latent conditions to within some not-much-worse bounds. And since Theorem 1 showed that, for $X^{'}$ specifically, the strong redundancy approximation error bounds the mediation approximation error, all that’s left is to bound the strong redundancy approximation error for $X^{'}$ .

Outline:

First, assuming a strong approximate natural latent $Λ$ exists, we’ll show that any $X_{i}$ approximately mediates between any other $X_{j}$ , $X_{k}$ . (This is a handy property in its own right!)
Second, we’ll show that $X^{'}$ satisfies a weak redundancy condition over X.
Third: since a natural latent is the “maximal” latent which satisfies redundancy over X, the previous condition is in-principle sufficient to show that $Λ$ approximately mediates between $X^{'}$ and $X$ . However, we’ll inline the proof of the maximality condition and tweak it a bit to get slightly better bounds.
Since we can sample $Λ$ equally well with any one $X_{i}$ (i.e. strong redundancy), and $Λ$ approximately tells us everything $X$ tells us about $X^{'}$ , we can sample $X^{'}$ equally well with any one $X_{i}$ : strong invariance holds for $X^{'}$ .

Step 1: $X_{i}$ Mediates Between $X_{j}$ and $X_{i}^{'}$

The two naturality conditions (just one of the $N$ redundancy conditions) of $Λ$ over $X$ easily show that $X_{i}$ mediates between $X_{j}$ and $X_{k}$ ( $i \neq j \neq k$ ). The equivalence of $P [X]$ and $P [X_{i}, X_{j}, X_{k}^{'}]$ (by construction of $X^{'}$ ) allows for replacing $X_{k}^{'}$ in the factorization with the $X_{k}^{'}$ version. Then, we get the result we were looking for.

Step 2: $X^{'}$ has Weak Redundancy over $X$

In the first line, we use the definition of $X^{'}$ and the result from Step 1 to establish mediation of $X_{1}$ between $X_{2}$ and $X_{3}^{'}$ and so we can remove the outgoing edge $X_{2} \to X_{3}^{'}$ . In the second line, we do the same thing for the remaining outgoing edge of $X_{2}$ , establishing $X_{2}$ as unconditionally independent of $X^{'}$ . Having done so, $(X_{1}, X_{3})$ trivially mediates between $X_{2}$ and $X^{'}$ .

Step 3: $Λ$ Mediates between $X$ and $X^{'}$

The intermediates here are much more easily understood in graphical form, but in words: In lines 1 and 2, we combine result of Step 2 with the mediation condition of $Λ$ and the definition of $X^{'}$ to stitch together a combined factorization of the joint distribution of $X$ , $X^{'}$ , and $Λ$ where $X$ mediates between $Λ$ and $X^{'}$ , and in particular it’s the components $(X_{1}, X_{2})$ which mediate while $X_{3}$ is independent conditional on $Λ$ . With some minor bookkeeping, we flip the arrow between $(X_{1}, X_{2})$ and $X^{'}$ , and add an arrow from $X^{'}$ to $Λ$ . Since this produces no cycles nor colliders, this is a valid move.

In line three, we use the result of line 2 in all 3 permutations of the $X$ components and Frankenstein the graphs together to show that, since each component of $X$ has $Λ$ mediating between it and $X^{'}$ , $Λ$ mediates between all of $X$ and $X^{'}$ .

Step 4: Strong Redundancy of $X^{'}$

Using the result from Step 3, along with the strong redundancy of $Λ$ allows us to stitch the graphs together and finally obtain our desired result: Strong Redundancy of $X^{'}$ .

The full proof of (Approximate) Natural Latent => (Approximate) Strongly Redundant $X^{'}$ in one picture:

Can You Do Better?

Note that the bounds derived here are fine in a big-O sense, but a little… unaesthetic. The numbers 9 and 7 are notably not, like, 1 or 2 or even 3. Also, we had to assume a strong approximate natural latent over at least three variables in order for the proof to work; the proof actually doesn’t handle the 2-variable case!

Could we do better? In particular, a proof which works for two variables would likely improve on the bounds considerably. We haven’t figured out how to do that yet, but we haven’t spent that much time on it, and intuitively it seems like it should work.

So if you’re good at this sort of thing, please do improve on our proof!

Empirical Results (Spot Check)

As an empirical check, we coded up relevant calculations for normal distributions. We are not going to go through all the linear algebra in this post, but you can see the code here, if for some reason you want to inflict that upon yourself. The main pieces are:

Methods to calculate covariance matrices for various conditionals and marginals
A method to construct the joint covariance of $X$ and $X^{'}$
A method to calculate the $D_{K L}$ between a normal distribution and its factorization over a DAG
Methods to calculate the $D_{K L}$ ’s for the mediation and redundancy conditions specifically

The big thing we want to check here is that $X^{'}$ in fact yields approximations within the bounds proven above, when we start from a distribution with a known approximate natural latent.

The test system:

$θ$ is a single-dimensional standard normal variable
$X$ consists of $N$ independent noisy measurements of $θ$ , i.e. $X_{i} = θ + α Z_{i}$ for independent standard normal noises $Z_{i}$ .

$θ$ itself is the known approximate natural latent, with strong redundancy when $α$ is relatively small. We compute $X^{'}$ from only the distribution $P [X]$ , and then the table below shows how well the naturality bounds compare to the bounds for our known natural latent $θ$ .

N=24, alpha=0.5

Approximation Errors ( $ϵ$ ’s)	Known Latent	X’
Mediation	5.125482e-15	0.001046
Strong Redundancy	2.138992	2.130707
Weak Redundancy	0.030377	0.030057

(All numbers in the above table are $D_{K L}$ ’s, measured in bits.)

Testing the actual summary stats / parameters (Known Latent) which generated the distributions as a natural latent, we see that the mediation condition is satisfied perfectly (numerically zero), while a strong redundancy condition (just for one $X_{i},$ randomly chosen) is ~2.139. So it looks like there is indeed at least one approximate natural latent in this system.

Calculating $X^{'}$ and then testing it against the naturality conditions, we see that the mediation condition is no longer numerically zero but remains small. The strong redundancy condition (again, for one randomly chosen $X_{i}$ ) is ~2.131 which is a hair better than the known latent. Overall, naturality of $X^{'}$ is well within the bounds given by the theorems. Note that the theorems now allow us to rule out any approximate natural latent for this system with 9 $ϵ_{m e d}$ + 7 $ϵ_{r e d}$ < 2.13 bits.

Nice. 😎

AI ALIGNMENT FORUM
AF

26

Calculating Natural Latents via Resampling

26

Some Conceptual Foundations

What Are We Even Computing?

Why Do We Want That, Again?

Approximate “Uniqueness” and Competitive Optimality

Strong Redundancy

The Resampling Construction

Theorem 1: Strong Redundancy => Naturality

Theorem 2: Competitive Optimality

Step 1: $X_{i}$ Mediates Between $X_{j}$ and $X_{i}^{'}$

Step 2: $X^{'}$ has Weak Redundancy over $X$

Step 3: $Λ$ Mediates between $X$ and $X^{'}$

Step 4: Strong Redundancy of $X^{'}$

Can You Do Better?

Empirical Results (Spot Check)

26

Calculating Natural Latents via Resampling

26

Some Conceptual Foundations

What Are We Even Computing?

Why Do We Want That, Again?

Approximate “Uniqueness” and Competitive Optimality

Strong Redundancy

The Resampling Construction

Theorem 1: Strong Redundancy => Naturality

Theorem 2: Competitive Optimality

Step 1: Xi Mediates Between Xj and X′i

Step 2: X′ has Weak Redundancy over X

Step 3: Λ Mediates between X and X′

Step 4: Strong Redundancy of X′

Can You Do Better?

Empirical Results (Spot Check)

Step 1: $X_{i}$ Mediates Between $X_{j}$ and $X_{i}^{'}$

Step 2: $X^{'}$ has Weak Redundancy over $X$

Step 3: $Λ$ Mediates between $X$ and $X^{'}$

Step 4: Strong Redundancy of $X^{'}$