AI ALIGNMENT FORUM
AF

Logical UncertaintyLogical InductionSolomonoff induction
Personal Blog

9

Asymptotic Logical Uncertainty: Concrete Failure of the Solomonoff Approach

by Scott Garrabrant
22nd Jul 2015
2 min read
0

9

Logical UncertaintyLogical InductionSolomonoff induction
Personal Blog
New Comment
Moderation Log
More from Scott Garrabrant
View more
Curated and popular this week
0Comments

This post is part of the Asymptotic Logical Uncertainty series. In this post, I give a concrete example of how the Solomonoff Induction Inspired Aproach fails to converge to the correct probabilities when predicting the output of a simple Turing machine.

Let f:N→{0,1} be a function such that (for large n) f(n) can be computed in time 2n, but there is no algorithm that can predict f(n) in time less than 2n consistently better than a coin which outputs 1 with probability 1/3.

Construct a Turing machine E as follows:

E(2k)=f(2k), and

E(2k+1)=f(2k+1) XOR f(10k).

Imagine trying to predict E(k) with an algorithm which takes time at most k2. If k is even, then it is determined by an event indistinguishable in time k2 from a 1/3 coin, so the predictor should output 1 or 0 with probability 1/3. If k is odd, then we have the XOR of two bits that are indistinguishable from independently 1 with probability 1/3, so the predictor should output 1 with probability 4/9.

Therefore, if we let T(k)=k3 and R(k)=k2, if SIA worked correctly, then we would have that the limit of the probability that SIA(E,T,R) outputs 1 on the even bits would be 1/3.

However, this is not the case. In particular, the the probability that the 10kth bit output by SIA(E,T,R) is a 1 will not converge to 1/3, but rather switch between 4/9 and 5/9.

This is because at the time that a predictor needs to predict E(10k), it will have already outputted a 1 with probability 4/9 for E(2k+1), and had enough time to compute f(2k+1). Therefore, if f(2k+1)=0, then any predictor that does make its prediction for E(10k) match its prediction for E(2k+1) will contradict itself.

The machines that will do well when sampled in SIA(E,T,R) will be the ones that output 1 independently with probability 1/3 for all even bits and 1 with probability 4/9 for all odd bits, EXCEPT when predicting a bit in a location that is a power of 10, in which case, it will copy (or negate) a previous bit, and output 1 with probability either 4/9 or 5/9, both of which are greater than the desired 1/3.

Further, a machine that gives a 1 with probability of 1/3 on E(10k) for all k will have a likelyhood going to 0 relative to the machines which just copy or negate their old answers.

Note that the correct probability with which to believe E(10k)=1 really is 1/3, not 4/9 or 5/9. The prediction given to E(2k+1) was calculated from a state of ignorance about f(2n+1), and this prediction should have no effect on the prediction for E(10k).