AI ALIGNMENT FORUM
AF

All of János Kramár's Comments + Replies

Concise Open Problem in Logical Uncertainty

9y00

Ah, I think I can stymy $M$ with 2 nonconstant advisors. Namely, let $A_{1} (n) = \frac{1}{2} - \frac{1}{n + 3}$ and $A_{2} (n) = \frac{1}{2} + \frac{1}{n + 3}$ . We (setting up an adversarial $E$ ) precommit to setting $E (n) = 0$ if $p (n) \geq A_{2} (n)$ and $E (n) = 1$ if $p (n) \leq A_{1} (n)$ ; now we can assume that $M$ always chooses $p (n) \in [A_{1} (n), A_{2} (n)]$ , since this is better for $M$ .

Now define $b_{i}^{'} (j) = | A_{i} (j) + E (j) - 1 | - | p (j) + E (j) - 1 |$ and $b_{i} (n) = \sum_{j < n} b_{i}^{'} (j)$ . Note that if we also define ${bad}_{i} (n) = \sum_{j < n} (log | A_{i} (j) + E (j) - 1 | - log | p (j) + E (j) - 1 |)$ then $\sum_{j < n} | 2 b_{i} (j) - {bad}_{i} (j) | \leq \sum_{j < n} (2 A_{1} (j) - 1 - log (2 A_{1} (j)))) = \sum_{j < n} O ({(\frac{1}{2} - A_{1} (j))}_{1}^{2})$ is bounded; therefore if we can force $b_{1}$

János Kramár

9y00

I don't yet know whether I can extend it to two nonconstant advisors, but I do know I can extend it to a countably infinite number of constant-prediction advisors. Let $(P_{i})_{i = 0, \dots}$ be an enumeration of their predictions that contains each one an infinite number of times. Then:

def M(p, E, P):
    prev, this, next = 0, 0, 1
    def bad(i):
        return sum(log(abs((E[k] + P[i] - 1) /
                           (E[k] + p[k] - 1)))
                   for k in xrange(prev))
    for k in xrange(this, next): p[k] = 0.5
    prev, this, next = this, next, floor(exp

János Kramár

9y40

def M(p, E):
    p1, p2 = 1./3, 2./3
    prev, this, next = 0, 0, 1

bad1 and bad2 compute log-badnesses of M relative to p1 and p2, on E[:prev]; the goal of M is to ensure neither one goes to $\infty$ . prev, this, next are set in such a way that M is permitted access to this when computing p[this:next].

    def bad(advisor):
        return lambda:
            sum(log(abs((E[i] + advisor(i) - 1) /
                        (E[i] + p[i] - 1)))
                for i in xrange(prev))
    bad1, bad2 = bad(lambda i: p1), bad(lambda i: p2)
    for i in xrange(this, next)

... (read more)

0Tsvi Benson-Tilsen9y

Could you spell out the step every iteration where mean(𝙴[𝚙𝚛𝚎𝚟:𝚝𝚑𝚒𝚜])≥2/5 will cause bound - bad1() to grow exponentially (by a factor of 11/10=1+(1/2)(−1+2/5𝚙𝟷)) a little more? I don't follow. (I think I follow the overall structure of the proof, and if I believed this step I would believe the proof.) We have that eps is about (2/3)(1-exp([bad1() - bound]/(next-this))), or at least half that, but I don't see how to get a lower bound on the decrease of bad1() (as a fraction of bound-bad1() ).

0Scott Garrabrant9y

Nice! I think I believe your claim, and I would like to chat with you to verify stuff and talk about future directions. I have thought about algorithms very similar to this, and using such an algorithm got an M which is either good, or bad in the first sense and outputting probabilities converging to 2/3, or bad in the second sense and outputting probabilities converging to 1/3. I had thought that if epsilon was shrinking quickly enough as to not have bad1 go to infinity, it would be shrinking so quickly that you could get locked in the while loop with bad2 increasing. I don't think I actually checked this claim carefully, so I guess maybe I was wrong. If this algorithm works as claimed, I wonder if you can extend it to three advisors (which may not be constant).

Stationary algorithmic probability

János Kramár

10y10

These results are still a bit unsatisfying.

The first half constructs an invariant measure which is then shown to be unsatisfactory because UTMs can rank arbitrarily high while only being good at encoding variations of themselves. This is mostly the case because the chain is transient; if it was positive recurrent then the measure would be finite, and UTMs ranking high would have to be good at encoding (and being encoded by) the average UTM rather than just a select family of UTMs.

The second half looks at whether we can get better results (ie a probability

János Kramár

10y10

There is a lot more to say about the perspective that isn't relaxed to continuous random variables. In particular, the problem of finding the maximum entropy joint distribution that agrees with particular pairwise distributions is closely related to Markov Random Fields and the Ising model. (The relaxation to continuous random variables is a Gaussian Markov Random Field.) It is easily seen that this maximum entropy joint distribution must have the form $log Pr (1_{φ_{1}}, \dots, 1_{φ_{n}}) = \sum_{i < j} θ_{i j} 1_{φ_{i} \land φ_{j}} + \sum_{i} θ_{i} 1_{φ_{i}} - log Z$ where $log Z$ is the normalizing constant, or partition funct

... (read more)

Stationary algorithmic probability

János Kramár

10y10

In order to understand what the measure $μ$ that was constructed from $d$ will reward, here's the sort of machine that comes close to ${sup}_{M} μ (M) = 3$ :

Let $M_{0}$ be an arbitrary UTM. Now consider the function $r (n) = n - 2^{⌊ lg n ⌋}$ (or, really, any function $r : N^{+} \to N^{0}$ with $r (n) < n$ that visits every nonnegative integer infinitely many times), and let $L = {x \in {0, 1}^{*} : | x | > 2, x_{| x | - 1} = x_{r (| x | - 1)}, x_{| x | - 2} = x_{r (| x | - 2)}}$ . (The indices here are zero-based.) Choose $x_{0} \in L$ such that $x_{0}$ has no proper prefix in $L$ . Then, construct the UTM $M$ that does:

rep

... (read more)

Stationary algorithmic probability

János Kramár

10y10

Consider the function $a (M_{1}, M_{2}) = 2^{- d (M_{1}, M_{2}) - d (M_{2}, M_{1})}$ where $d (M_{1}, M_{2}) = min (| x | | x \in {0, 1}^{*} : \forall y \in {0, 1}^{*} : M_{1} (x y) = M_{2} (y) unless neither of these halts)$ . The reversible Markov chain with transition probabilities $p (M_{1}, M_{2}) = \frac{a (M_{1}, M_{2})}{\sum_{M_{2}^{'}} a (M_{1}, M_{2}^{'})}$ has a bounded positive invariant measure $μ (M) = \sum_{M^{'}} a (M, M^{'})$ . Of course, as the post showed, the total measure is infinite. Also, because the chain is reversible and transient, the invariant measure is far from unique - indeed, for any machine $M_{0}$ , the measure $μ (M) = p^{(0)} (M, M_{0}) + 2 \sum_{n = 1}^{\infty} p^{(n)} (M, M_{0})$ will be a bounded positive invariant meas

... (read more)

1János Kramár10y

These results are still a bit unsatisfying. The first half constructs an invariant measure which is then shown to be unsatisfactory because UTMs can rank arbitrarily high while only being good at encoding variations of themselves. This is mostly the case because the chain is transient; if it was positive recurrent then the measure would be finite, and UTMs ranking high would have to be good at encoding (and being encoded by) the average UTM rather than just a select family of UTMs. The second half looks at whether we can get better results (ie a probability measure) by restricting our attention to output-free "UTMs" (though I misspoke; these are not actually UTMs but rather universal semidecidable languages (we can call them USDLs)). It concludes that we can't if the measure will be continuous on the given digraph - however, this is an awkward notion of continuity: a low-complexity USDL whose behavior is tweaked very slightly but in a complex way may be very close in the given topology, but should have measure much lower than the starting USDL. So I consider this question unanswered.

1János Kramár10y

In order to understand what the measure μ that was constructed from d will reward, here's the sort of machine that comes close to supMμ(M)=3: Let M0 be an arbitrary UTM. Now consider the function r(n)=n−2⌊lgn⌋ (or, really, any function r:N+→N0 with r(n)<n that visits every nonnegative integer infinitely many times), and let L={x∈{0,1}∗:|x|>2,x|x|−1=xr(|x|−1),x|x|−2=xr(|x|−2)}. (The indices here are zero-based.) Choose x0∈L such that x0 has no proper prefix in L. Then, construct the UTM M that does: repeat: s := "" while s not in L: # if there is no next character, halt s := s + readchar() if s == x0: break M0() This M will have μ(M)>3−2−|x0|+d(M0,M)2−|x0|−d(M0,M). M here is optimized for building up internal states (that are then UTMs that are efficiently encoded), while also being very easy to reset from these internal states; in other words being easy to "encode" from the UTMs it efficiently encodes, using at most 2 bits (an average of 1+√52). This is somewhat interesting, but clearly doesn't capture the kind of computational expressivity we're primarily interested in.

A tractable, interpretable formulation of approximate conditioning for pairwise-specified probability distributions over truth values

János Kramár

10y00

Actually, on further thought, I think the best thing to use here is a log-bilinear distribution over the space of truth-assignments. For these, it is easy to efficiently compute exact normalizing constants, conditional distributions, marginal distributions, and KL divergences; there is no impedance mismatch. KL divergence minimization here is still a convex minimization (in the natural parametrization of the exponential family).

The only shortcoming is that 0 is not a probability, so it won't let you eg say that $Pr (φ_{1} \to φ_{2}) = 1$ ; but this can be remedied using a

János Kramár

10y10

An easy way to get rid of the probabilities-outside-[0,1] problem in the continuous relaxation is to constrain the "conditional"/updated distribution to have $Var (1_{φ_{i}} ∣ ∣ \dots) \leq E (1_{φ_{i}} ∣ ∣ \dots) (1 - E (1_{φ_{i}} ∣ ∣ \dots))$ (which is a convex constraint; it's equivalent to $Var (1_{φ_{i}} ∣ ∣ \dots) + {(E (1_{φ_{i}} ∣ ∣ \dots) - \frac{1}{2})}^{2}$ ), and then minimize KL-divergence accordingly.

The two obvious flaws are that the result of updating becomes ordering-dependent (though this may not be a problem in practice), and that the updated distribution will sometimes have $Var (1_{φ_{i}} ∣ ∣ \dots) < E (1_{φ_{i}} ∣ ∣ \dots) (1 - E (1_{φ_{i}} ∣ ∣ \dots))$ , and it's not clear how

... (read more)

Stationary algorithmic probability

János Kramár

10y20

It may still be possible to get a unique (up to scaling) invariant measure (with infinite sum) over the UTMs by invoking something like the Krein-Rutman theorem and applying it to the transition operator. I haven't yet verified that all the conditions hold.

This measure would then be an encoding-invariant way to compare UTMs' "intrinsic complexity" in the sense of "number of bits needed to simulate".

1János Kramár10y

Consider the function a(M1,M2)=2−d(M1,M2)−d(M2,M1) where d(M1,M2)=min(|x||x∈{0,1}∗:∀y∈{0,1}∗:M1(xy)=M2(y) unless neither of these halts). The reversible Markov chain with transition probabilities p(M1,M2)=a(M1,M2)∑M′2a(M1,M′2) has a bounded positive invariant measure μ(M)=∑M′a(M,M′). Of course, as the post showed, the total measure is infinite. Also, because the chain is reversible and transient, the invariant measure is far from unique - indeed, for any machine M0, the measure μ(M)=p(0)(M,M0)+2∑∞n=1p(n)(M,M0) will be a bounded positive invariant measure. It seems tempting (to me) to try to get a probability measure by modding out the output-permutations (that the post uses to show this isn't possible for the full set of UTMs). To this end, consider the set of UTMs that have no output. (These will be unaffected by the output-permutations.) We can try to use the induced sub-digraph on these to build a probability measure μ. The measure of each UTM should be a function of the rooted edge-labeled digraph GM rooted at that UTM. The most natural topology on rooted edge-labeled infinite digraphs is the one generated by the sets {G:G′ is isomorphic to an induced rooted edge-labeled subgraph of G} where G′ ranges over finite rooted edge-labeled digraphs - we could hope that μ is continuous according to this topology. Unfortunately, this can't work: if μ(M)>0 then μ−1((12μ(M),∞)) must be open, and so it must contain some finite intersection of the generating sets; however, every such intersection that's nonempty (as this one is) contains infinitely many UTMs, so the total measure must be infinite as well.

No Good Logical Conditional Probability

János Kramár

10y20

This is interesting! I would dispute, though, that a good logical conditional probability must be able to condition on arbitrary, likely-non-r.e. sets of sentences.

1Benya Fallenstein10y

Hm; we could add an uninterpreted predicate symbol Q(n) to the language of arithmetic, and let s≡Q(0) and rn≡Q(¯¯¯¯¯¯¯¯¯¯¯¯¯n+1). Then, it seems like the only barrier to recursive enumerability of T∞ is that P's opinions about Q(⋅) aren't computable; this seems worrying in practice, since it seems certain that we would like logical uncertainty to be able to reason about the values of computations that use more resources than we use to compute our own probability estimates. But on the other hand, all of this makes this sound like an issue of self-reference, which is its own can of worms (once we have a computable process assigning probabilities to the value of computations, we can consider the sentence saying "I'm assigned probability <12" etc.).

Modal Bargaining Agents

János Kramár

10y00

What's the harm in requiring prior coordination, considering there's already a prior agreement to follow a particular protocol involving $A_{i}$ s? (And something earlier on in the context about a shared source of randomness to ensure convexity of the feasible set.)

0orthonormal10y

The actual problem we want to work toward is one where all the prior coordination info is in the environment independent of the particular agents (e.g. the existence of Schelling points), and the agents are just deducing things about each other. For instance, two FairBots work in a source code swap Prisoner's Dilemma against one another even if written in different programming languages. I'm willing to accept "accepting a natural ordering on the payoff set" and "accepting a natural set of outcome products" as things that could conceivably be Schelling points in a simple environment, but "know the shape of each others' fairness sets" looks like an infinite pre-exchange of information that cannot be gleaned from the environment. (And "generate mutual random bits" is a cooperative thing that can be viewed as an atomic action in the environment.)

Modal Bargaining Agents

János Kramár

10y10

If the fairness constraints are all pairwise (ie each player has fairness curves for each opponent), then the scheme generalizes directly. Slightly more generally, if each player's fairness set is weakly convex and closed under componentwise max, the scheme still generalizes directly (in effect the componentwise max creates a fairness curve which can be intersected with the $x y z = A_{i}$ surfaces to get the $(x_{i}, y_{i}, z_{i})$ points.

In order to generalize fully, the agents should each precommunicate their fairness sets. In fact, after doing this, the algorithm is very si

... (read more)

0orthonormal10y

I'd like for this to work with as little prior coordination as possible, so I'm not keen on assuming the agents precommunicate their fairness sets. But the generalization with only the Ai pre-coordinated is neat.

Modal Bargaining Agents

János Kramár

10y10

You did miss something: namely from PA+2 X wants to show feasibility of $(\frac{m}{2} x_{i_{0}}, y)$ , not $(\frac{m}{2}, y)$ . In your example, $x_{i_{0}} = 3$ , so the Löbian circle you describe will fail.

I'll walk through what will happen in the example.

The $A_{i}$ are just areas (ie $x_{i} y_{i}$ ), not rectangles. In this example, $A_{1} = 6$ is enough to contain $(2, 3)$ and $(3, 2)$ . For conciseness let's have $A_{1} = 6$ , $A_{2} = 4$ , and $m = 3$ (so $A_{3} = 0$ ).

Both X and Y have $i_{0} = 1$ . According to X, $(x_{1}, y_{1}) = (3, 2)$ , $(x_{2}, y_{2}) = (2, 2)$ , and $(x_{3}, y_{3}) = (0, 0)$ .

First the speculative phase will happen:

X will try to prove in PA+1 that $y \leq 2$ and that $(\frac{3}{1} \cdot 3, y)$

... (read more)

0orthonormal10y

OK! You were right, I misinterpreted, and I do like this proposal! I concur that in the limit as m→∞, this does hit the intersection of the fairness curves if the agents are biased in their own favor, and hits the Pareto curve linear multiple of each agent's request if they're biased in each others' favor. Moreover, both of these behaviors are invariant under rescalings of utility functions, so we're good on that front too! I haven't yet thought about whether this works analogously in three-player games, but it feels like it should...

Modal Bargaining Agents

János Kramár

10y00

How about a gridless scheme like:

The agents agree that they will each output how much utility they will get, and if they fail to choose a feasible point they both get 0.

Now discretize the possible "rectangle areas": let them be $A_{1} > . . . > A_{m} = 0$ . (This requires a way to agree on that, but this seems easier than agreeing on grid points; the finer the better, basically. Perhaps the most obvious way to do it is to have these be evenly spaced from $A_{m} = 0$ to $A_{1}$ ; then only $A_{1}$ and $m$ need to be agreed upon.)

X will do the following:

let $A_{i_{0}}$ be the first area

... (read more)

Modal Bargaining Agents

János Kramár

10y10

How about a gridless scheme like:

The agents agree that they will each output how much utility they will get, and if they fail to choose a feasible point they both get 0.

X will do the following:

let $A_{i_{0}}$ be the first area

... (read more)

1orthonormal10y

I don't think I get this. Let's walk though an example that looks absurd to me, and let me know if I'm misinterpreting something along the way: We start with the feasible set as the convex hull of (0,0), (2,3), and (3,2). X thinks that (3,2) is fair, while Y thinks that (2,3) is fair. By Eliezer's algorithm (and by the modal instantiation), they would end up at (2,2). Let's say that A1 includes (3,2), and A2 includes (2,3); then A1 is the first rectangle considered fair by X, and A2 is the first rectangle considered fair by Y. Then X, running the algorithm above, first tries to show in PA+1 that y≤2 and that (3, y) is in the feasible set; if it succeeds, it outputs 3. Meanwhile, Y tries to show that x≤2 and that (x,3) is in the feasible set; if it succeeds, it outputs 3. Neither of these proof searches can succeed (since PA+1 doesn't know its own consistency, each agent thinks the other might output 3 by the Principle of Explosion). Now we move to the next stage. X tries to show in PA+2 that y≤2 and that (3/2, y) is feasible; if so, it outputs 3/2. Y likewise tries to show in PA+2 that x≤2 and that (x, 3/2) is feasible; if so, it outputs 3/2. Now we have a Lobian circle; both successfully prove that the other outputs 3/2, and thus output 3/2 themselves. And so the agents coordinate at (3/2, 3/2), rather than at (2,2) or anything better. Did I miss something?