Here's a simple Kripke frame proof of Payor's lemma.
Let be a Kripke frame over our language, i.e. is a set of possible worlds, is an accessibility relation, and judges that a sentence holds in a world. Now, suppose for contradiction that but that , i.e. does not hold in some world .
A bit of De Morganing tells us that the hypothesis on is equivalent to , so . So, there is some world with such that . But again looking at our equivalent form for , we see that , so , a contradiction.
Both this proof and the proof in the post are very simple, but at least for me I feel like this proof tells me a bit more about what's going on, or at least tells me something about what's going on that the other doesn't. Though in a broader sense there's a lot I don't know about what's going on in modal fixed points.
Kripke frame-style semantics are helpful for thinking about lots of modal logic things. In particular, there are cool inductiony interpretations of the Gödel/Löb theorems. These are more complicated, but I'd be happy to talk about them sometime.
Nice, I like this proof also. Maybe there's a clearer way to say thing, but your "unrolling one step" corresponds to my going from to . We somehow need to "look two possible worlds deep".
Also, here's a proof that a bot is never exploited. It only cooperates when its partner provably cooperates.
First, note that , i.e. if cooperates it provably cooperates. (Proof sketch: .)
Now we show that (i.e. if chooses to cooperate, its partner is provably cooperating):
(PS: we can strengthen this to , by noticing that .)
A point of confusion: is it enough to prove that ? What about ? I'm not sure I can say this well, but here goes:
We might not be able to prove in the theory, or even that (which would mean "there are no proofs of inconsistency"). But if we believe our theory is sound, we believe that it can't show that a copy of itself proves something false.
So tells us that if is true, the theory would show that a copy of itself proves is true. And this is enough to convince us that we can't simultaneously have true and false.
This is cool (and fwiw to other readers) correct. I must reflect on what it means for real world cooperation... I especially like the A <-> []X -> [][]X <-> []A trick.
In case this helps folks' intuition, my generating idea was something like: "look at what my opponent is thinking, but assume that whenever they check if I cooperate, they think the answer is yes". This is how we break the loop.
This results in a proof of the lemma like thus:
(EDIT: This is basically the same proof as in the post, but less simple. Maybe the interesting part is the "unroll once then break the loop" intuition.)
Relatedly, with two agents and , we can have:
(In fact, I think "assume they will think I cooperate" turns out to be too strong, and leads to unnecessary defection. I'm still working through the details.)
If I follow what you mean, we can derive:
So there's a Löbian proof, in which the provability is self-fulfilling. But this isn't sufficient to avoid this kind of proof.
(Aside on why I don't like the Löbian method: I moreso want the agents to be doing "correct" counterfactual reasoning about how their actions affect their opponent, and to cooperate because they see that mutual cooperation is possible and then choose it. The Löbian proof style isn't a good model of that, imo.)
I'm working on it :) At this point what I think is true is the following:
If ShortProof(x \leftrightarrow LongProof(ShortProof(x) \to x)), then MediumProof(x).
Apologies that I haven't written out calculations very precisely yet, but since you asked, that's roughly where I'm at :)
It looks like you're investigating an angle that I can't follow, but here's my two cents re bounded agents:
My main idea to port this to the bounded setting is to have a bot that searches for increasingly long proofs, knowing that if it takes longer to find a proof then it is itself a bit harder to reason about.
We can instantiate this like:
The idea is that if there is a short way to prove that the opponent would cooperate back, then it takes just a constant steps more to prove that we cooperate. So it doesn't open us up to exploitation to assume that our own cooperation is provable in steps.
The way in which this works at all is by cutting the loop at the point where the opponent is thinking about our own behaviour. This bot cuts it rather aggressively: it assumes that no matter the context, when thinks about whether cooperates, it's provable that does cooperate. (I think this isn't great and can be improved to a weaker assumption that would lead to more potential cooperation.)
If you construct similarly, I claim that and mutually cooperate if and are large enough, and mutually defect otherwise.
Similarly, I claim can mutually cooperate with other bots like .
Followed By: Payor's Lemma in Natural Language
TL;DR: This post introduces a novel logical approach to achieving group-scale cooperation, based on modal fixpoint theory. This approach is both easier to understand and roughly 3x more efficient than previous approaches that factored through Löb's Theorem, measured in terms of the length / complexity of the proofs involved.
The following theorem was inspired by Scott Garrabrant, and uses a lemma of James Payor in place of Löb's Theorem to prove cooperation between a group of agents. I'll state the theorem for three agents because that's most illustrative of what's going on:
Theorem: Suppose A,B, and C are agents that return "true" to signify cooperation and "false" to signify defection. Let E=A∧B∧C, so E is the statement that "everyone cooperates". Let □A, □B, and □C denote proof systems that extend Peano Arithmetic, let □EX stand for □AX∧□BX∧□CX, and suppose the agents behave in a manner satisfying the following conditions:
Then it follows that ⊢E, i.e., everyone cooperates (provably!).
(Intuitively, the strategy of the agents in this theorem is to check that the group is trustworthy in a certain way before joining (cooperating with) the group. The theorem shows that the collective check on trustworthiness nests inside itself in a way that self-validates and yields cooperation.)
Proof:
[end proof]
The key to the theorem is the following lemma, due to James Payor:
Lemma: If ⊢□(□x→x)→x then ⊢x.
Historical note: Originally this lemma assumed ⊢□(□x→x)↔x, but then I realized only the forward implication is needed, so I rewrote it.
Proof: The proof uses the same modal rules of inference for □ as Löb's theorem, namely, necessitation and distributivity:
[end proof]
Sweet! In comparison to Löb's Theorem, two things are beautiful about Payor's lemma:
Discussion
In the proof of the Theorem, you might be wondering if it really makes sense to be thinking of □E as a logical system of its own. It doesn't need to be, but the answer is yes if □A, □B, and □C are all finite extensions of PA. Then the axioms of □E are just [the conjunction of axioms of □A]∨[the conjunction of axioms of □B] ∨ [the conjunction of axioms of □C].
You also might wonder if an alternative approach to group cooperation might be to instead use the following strategies:
Then you'd be right! Here it also follows that ⊢A∧B∧C. However, the proof involves a lot more nesting, with A thinking about what B's thinking about what C's thinking about (etc.), and it's not as easy or short as the proof of the Theorem above.
Writing implications backwards
If you think of □(□x→x) in Payor's Lemma as "defining" x as a program, it's fun to write all the implication arrows from right to left, so they look a little like variable assignments. (And, under the Curry-Howard correspondence, they do actually correspond to maps.) Here's the lemma and proof again, written that way:
Lemma: If ⊢x←□(x←□x) then ⊢x.
Proof: The same as above, but with the arrows written leftward!
[end proof]
I find the above somewhat more elegant than the version with rightward arrows, albeit a less standard way of writing.
Conclusion
In my opinion, what's great about the lemma and theorem above is that they're both relatively short and simple (relative to proving and using Löb's Theorem), and they allow a proof of unexploitable group cooperation that's roughly three times shorter than than one that starts by proving Löb's Theorem (only ~6 lines of logic, vs ~18).
PS James says his next idea will be even better ;)