AI ALIGNMENT FORUM
AF

Personal Blog

4

No Good Logical Conditional Probability

by Scott Garrabrant
2nd Sep 2015
2 min read
6

4

Personal Blog
No Good Logical Conditional Probability
2Scott Garrabrant
2Charlie Steiner
2János Kramár
1Benya_Fallenstein
0orthonormal
New Comment
5 comments, sorted by
top scoring
Click to highlight new comments since: Today at 4:47 PM
[-]Scott Garrabrant10y20

Charlie: Your proposal to remove condition 3 works when T is a finite theory, but you cannot do that when T is infinite. Indeed, we use 3 for the infinite extension form Tn to T∞. I suspect that you cannot remove condition 3.

Reply
[-]Charlie Steiner10y20

It's interesting that this is basically the opposite of the Gaifman condition - clearly there are conflicting intuitions about what makes a 'good' conditional logical probability.

On the open problem; In order to prove 3 from 2, all you need is that P(s∧T|R)=P(s|R) when s proves T - 3 follows from 2 if you do that substitution, and then divide by P(T|R), which is less than or equal 1 (this may assume an extra commonsense axiom that probabilities are positive).

Now consider applying rule 1 to P(s∧T|R), T proven by s. R proves that only one of s∧T,¬s∧T,¬s∧¬T is true, and also proves that only one of s,¬s∧T,¬s∧¬T is true. Thus 3 is derivable from 1 and 2.

Reply
[-]János Kramár10y20

This is interesting! I would dispute, though, that a good logical conditional probability must be able to condition on arbitrary, likely-non-r.e. sets of sentences.

Reply
[-]Benya_Fallenstein10y10

Hm; we could add an uninterpreted predicate symbol Q(n) to the language of arithmetic, and let s≡Q(0) and rn≡Q(¯¯¯¯¯¯¯¯¯¯¯¯¯n+1). Then, it seems like the only barrier to recursive enumerability of T∞ is that P's opinions about Q(⋅) aren't computable; this seems worrying in practice, since it seems certain that we would like logical uncertainty to be able to reason about the values of computations that use more resources than we use to compute our own probability estimates. But on the other hand, all of this makes this sound like an issue of self-reference, which is its own can of worms (once we have a computable process assigning probabilities to the value of computations, we can consider the sentence saying "I'm assigned probability <12" etc.).

Reply
[-]orthonormal10y00

Nice! Basically, it looks like you construct a theory by assembling an infinite quantity of what the prior takes as evidence about s, so that either the prior or the posterior has to take the most extreme odds on s. It's pretty intuitive in that light, and so I'm not dismayed that the "0 and 1 are not probabilities" property can't hold when conditioned on arbitrary theories.

Important typo: P assigns conditional probability 1 to some statement that cannot be proven.

Reply
Moderation Log
More from Scott Garrabrant
View more
Curated and popular this week
5Comments

Fix a theory T over a language L. A coherent probability function is one that satisfies laws of probability theory, each coherent probability function represents a probability distribution on complete logical extensions of T.

One of many equivalent definitions of coherence is that P is coherent if P(s1)+P(s2)+…+P(sk)=1 whenever T can prove that exactly one of s1,…,sk is true.

Another very basic desirable property is that P(s)=1 only when s is provable. There have been many proposals of specific coherent probability assignments that all satisfy this basic requirement. Many satisfy stronger requirements that give bounds on how far P(s) is from 1 when s is not provable.

In this post, I modify the framework slightly to instead talk about conditional probability. Consider a function P which takes in a consistent theory T and a sentence s, and outputs a number P(s|T)∈[0,1], which represents the conditional probability of s given T.

We say that P is coherent if:

  1. P(s1|T)+P(s2|T)+…+P(sk|T)=1 whenever T can prove that exactly one of s1,…,sk is true, and

  2. P(s∧r|T)=P(r|T)⋅P(s|T∪{r}).

  3. If s proves every sentence in T, then P(s|R∪T)≥P(s|R).

Theorem: There is no coherent conditional probability function P such that P(s|T)=1 only when T proves s.

Proof:

This proof will use the notation of log odds ℓ(p)=log2(p1−p) to make things simpler.

Let P be a coherent conditional probability function. Fix a sentence s which is neither provable nor disprovable from the empty theory. Construct an infinite sequences of theories as follows:

  1. T0 is the empty theory.

  2. To construct Tn+1, choose a sentence rn such that neither s→rn nor s→¬rn are provable in Tn. If P(s∧rn|Tn)≤P(s∧¬rn|Tn), then let Tn+1=Tn∪{s→rn}. Otherwise, let Tn+1=Tn∪{s→¬rn}.

Fix an n, and without loss of generality, assume P(s∧rn|Tn)≤P(s∧¬rn|Tn). Since P is coherent we have P(s∧r|Tn)+P(s∧¬r|Tn)=P(s|Tn). In particular, this means that P(s∧r|Tn)≤12P(s|Tn).

Observe that P(s∧(s→r)|Tn)=P(s|Tn+1)P(s→r|Tn), and P(¬s∧(s→r)|Tn)=P(¬s|Tn+1)P(s→r|Tn). Therefore, P(s∧r|Tn)/P(¬s|Tn)=P(s|Tn+1)/P(¬s|Tn+1), so 12P(s|Tn)/P(¬s|Tn)≥P(s|Tn+1)/P(¬s|Tn+1).

In the language of log odds, this means that ℓ(P(s|Tn))−1≥ℓ(P(s|Tn+1)).

Let T∞ be the union of all the Tn. Note that by the third condition of coherence, ℓ(P(¬s|T∞))≥ℓ(P(¬s|Tn)) for all n, so ℓ(P(s|T∞))≤ℓ(P(s|Tn)) for all n

Consider ℓ(P(s|T0)) and ℓ(P(s|T∞)). These numbers cannot both be finite, since ℓ(P(s|T∞))≤ℓ(P(s|Tn))≤ℓ(P(s|T0))−n. Therefore, at least one of P(s|T0) and P(s|T∞) must be 0 or 1. However neither T0 nor T∞ prove or disprove s, so this means that P assigns conditional probability 1 to some statement that cannot be proven.

Open Problem: Does this theorem still hold if we leave condition 3 out of the definition of coherence?