This is a version of a result that Kenny Easwaran, Nate Soares and I proved in the context of UDT with bounded proof length when Kenny visited MIRI for two days earlier this year. (I'll post about the bounded proof length version some other time; it's more complicated, as you'd expect.)
In order to carry out the idea above, we need to talk about provability in extensions of PA, not just about the provability in PA expressed by the □(⋅) operator of Gödel-Löb provability logic. Fortunately, the systems we'll need in this post only extend PA by a finite number of axioms which can be expressed as closed formulas in the language of GL, and there's a simple trick to reason about such extensions in GL itself:
Suppose that we extend PA by a single axiom, φ. Then, by the deduction theorem for first-order logic, PA+φ⊢ψ if and only if PA⊢φ→ψ. Thus, if both φ and ψ can be expressed as modal formulas, then we can express the proposition that ψ is provable in PA+φ by the modal formula □(φ→ψ). The same trick works, of course, if we want to add finitely many different modal formulas to PA, since we can simply let φ be their conjunction.
In this post, when I'm using this trick to talk about PA+φ, I'll emphasize this fact by abbreviating □(φ→ψ) to □φ(ψ) (since we often subscript □ by the proof system when it's not clear which system we are referring to).
Modal UDT
The version of modal UDT I'm using in this article is equivalent to the one in Vladimir's article, except for using □φ(⋅) instead of □(⋅), for an appropriate φ. Nevertheless, let me give a quick recap, which puts a slightly different spin on the system, and check that this does indeed meet the specification of a "modal decision theory".
As a modal decision theory, m-action, n-preference level modal UDT is a sequence →UDT(→u) of m formulas in n parameters, such that UDTi(→U) gives the action UDT chooses in the modal universe →U. Let's write →UDT(φ)(→u) for the version which uses provability in PA+φ.
I think the easiest way to specify →UDT(φ)(→u) is to say that it's a sequence of modal formulas which describe the behavior of the following an algorithm (which needs a halting oracle):
For each j=1,…,n:
For each i=1,…,m:
If it's provable in PA+φ that "I take action i" implies "I obtain outcome j", then return i.
If still here, return 1.
In other words, UDT starts with the most preferred outcome, and tries to find an action that provably implies that outcome; if there is such an action, it returns it, breaking ties lexicographically. If there isn't, it tries the second-most-preferred outcome, and so on down to the least preferred one. If no actions imply any outcomes, it returns an arbitrary default action.
To actually write out UDT(φ)i, we need to find a formula that describes when the above algorithm returns the action i. Clearly, for actions other than the default action, the condition is that there's some outcome j such that (a) it's provable that i implies j; (b) there's no action i′<i which provably implies j; (c) and there isn't any outcome j′<j which is provably implied by any action i′. The default action 1 is taken if the condition above holds or if no action provably implies any outcome.
Let's agree to order pairs (j,i) lexicographically, i.e., (j,i)<(j′,i′) if j<j′ or j=j′ and i<i′. Then, we can write UDT(φ)i(→u) as follows, for i>1:
UDT(φ)i(→u):≡n⋁j=1⎛⎝□φ(UDT(φ)i(→u)→uj)∧⋀(j′,i′)<(j,i)¬□φ(UDT(φ)i′(→u)→uj′)⎞⎠,
and for i=1:
UDT(φ)1(→u):≡n⋁j=1⎛⎝□φ(UDT(φ)1(→u)→uj)∧⋀(j′,i′)<(j,1)¬□φ(UDT(φ)i′(→u)→uj′)⎞⎠∨(m,n)⋀(j′,i′)=(1,1)¬□φ(UDT(φ)i′(→u)→uj′).
From this definition, it's straight-forward to see that →UDT(φ) is provably mutually exclusive and exhaustive (p.m.e.e.); we work in GL and distinguish the following cases: There either is no pair (j,i) such that □(UDT(φ)i(→u)→uj), or some pair (j,i) is the first one. If there is no such pair, then UDT(φ)1(→u) is true, and all other formulas in the sequence are false. If (j,i) is the first such pair, then UDT(φ)i(→u) is true, and all the other formulas are false.
Provably extensional problems
Let's call a closed formula in the language of GLsound if its translation into the language of arithmetic is true on the natural numbers; unsurprisingly, we'll write this as N⊨φ for short. My claim is that for every provably extensional decision problem→P(→a), there is a sound formula φ such that →UDT(φ)(→u) performs optimally on →P(→a), that is, obtains the best outcome that any decision theory can achieve on this outcome.
Recall that a decision problem →P(→a) is defined to be provably extensional if
GL⊢(→a↔→b)→(→P(→a)↔→P(→b)),
where →φ↔→ψ stands for the conjunction (φ1↔ψ1)∧⋯∧(φn↔ψn). (Of course, the two sequences need to be of the same length.)
Intuitively, such a decision problem assigns a single outcome f(i) to every action i, and every agent which chooses action i will obtain outcome f(i); this is a formalization of the idea that we should consider a decision problem "fair" if it only rewards or punishes you for the actions you decide to take, not for the decision process you used to arrive at these decisions.
This doesn't mean that the mapping f(i) is all that matters about the decision problem---for example, any particular version of modal UDT fails on its "evil" decision problem, despite the fact that these "evil" problems are provably extensional. That's because it can be difficult to determine which mapping f(⋅) a particular →P(→a) corresponds to.
The idea of the optimality proof in this post is that for any particular →P(→u), there is some truth about what the corresponding mapping f(⋅) is, even if it's hard to determine, and if we run UDT with a proof system strong enough to handle this problem, this version of UDT will simply be able to figure out which action is best and then take it.
How do we find a proof system strong enough to determine the action/outcome mapping f(⋅) for a given →P(→a)? The simplest possibility is to just take PA and add to it the information about the true action/outcome mapping.
Let χ(i)i:≡⊤ and χ(i)i′:≡⊥ for i′≠i, and write →χ(i) for the corresponding sequence of length m or n (depending on the context). By the true mapping, I mean the mapping f(⋅) such that for all i∈{1,…,m}, we have N⊨Pf(i)(→χ(i)). This mapping is unique, because →P(→a) is a decision problem, which by definition implies that the formulas in this sequence are p.m.e.e. (i.e., exactly one of them is true, for any particular value of the →a).
Now it's clear how we can encode the fact that this is the true mapping as a formula of GL: we simply set φ:≡⋀ni=1Pf(i)(→χ(i)).
Optimality
It's now clear the best outcome any agent can obtain on the provably extensional decision problem →P(→a) is minif(i): An agent →A is a sequence of closed p.m.e.e. formulas, which means that for any agent →A there is some i such that N⊨→A↔→χ(i), which by extensionality implies N⊨→P(→A)↔→P(→χ(i)), so the outcome obtained by →A is f(i).
Write (→A,→U) for the fixed point of →UDT(φ)(→u) and →P(→a), with φ defined as in the previous section. Then, in order to show that →UDT(φ)(→u) performs optimally on this decision problem, we need to show that N⊨→A↔→χ(i∗) for some i∗ such that f(i∗)=j∗:=minif(i).
I claim that this holds for i∗ the smallest i such that f(i)=j∗. Given the way →UDT(φ)(→u) is constructed, in order to show this I need to show that N⊨□φ(UDT(φ)i∗(→U)→Uj∗), plus N⊨¬□φ(UDT(φ)i′(→U)→Uj′) for all (j′,i′)<(j∗,i∗).
The first of these statements is equivalent to
GL⊢φ→(UDT(φ)i∗(→U)→Uj∗),
which, given the properties GL⊢→A↔→UDT(φ)(→U) and GL⊢→U↔→P(→A) defining →A and →U, we can rewrite as
GL⊢φ→(Ai∗→Pj∗(→A)).
Since →A is p.m.e.e., Ai∗ is equivalent to →A↔χ(i∗), so by provable extensionality of →P(→a), we have GL⊢Ai∗→(→P(→A)↔→P(→χ(i∗))). The definition of φ implies GL⊢φ→(→P(→χ(i∗)↔→χ(j∗)), so we obtain GL⊢(φ∧Ai∗)→(→P(→A)↔→χ(j∗)) or GL⊢(φ∧Ai∗)→Pj∗(→A), which is equivalent to the desired result.
For the second statement, we need to show that UDT(φ)i′(→U)→Uj′ is not provable in PA+φ for any (j′,i′)<(j∗,i∗). So suppose it were provable, for some such (j′,i′). Then →UDT(φ)(→U) would take the action i′ corresponding to the smallest such pair; in other words, we would have N⊨UDT(φ)i′. We can again use the defining properties of →A and →U to rewrite this as N⊨Ai′.
Since PA is sound and N⊨φ, the assumption that UDT(φ)i′(→U)→Uj′ is provable in PA+φ implies that it is true on N. We can rewrite this as well, to N⊨Ai′→Pj′(→A). Hence, we have N⊨Ai′∧Pj′(→A).
If j′<j∗, this says that →A achieves an outcome better than j∗, which we've shown above to be the best outcome that any agent can possibly achieve---contradiction. If j′=j∗ and i′<i∗, then we have found an i′<i∗ such that f(i′)=j∗ (since by extensionality, N⊨Ai′ implies N⊨Pf(i′)(→A), and hence N⊨¬Pj′(→A) for j′≠f(i′) as →P(→a) is p.m.e.e.)---contradiction to the definition of i∗.
Yesterday, I gave a definition of "decision theory" and "fair decision problem" in the context of provability logic, and gave a formal version of drnickbone's argument that no decision theory can perform optimally on every "fair" decision problem. Today, I prove that for any provably "fair" decision problem →P(→u), there is a sound extension of PA by a finite set of formulas, expressible in the language of provability logic, such that Vladimir Slepnev's modal logic version of UDT performs optimally on →P(→u) if we have it search for proofs in this extended proof system.
This is a version of a result that Kenny Easwaran, Nate Soares and I proved in the context of UDT with bounded proof length when Kenny visited MIRI for two days earlier this year. (I'll post about the bounded proof length version some other time; it's more complicated, as you'd expect.)
Prerequisites: A primer on provability logic, "Evil" decision problems in provability logic.
Provability in extensions of PA
In order to carry out the idea above, we need to talk about provability in extensions of PA, not just about the provability in PA expressed by the □(⋅) operator of Gödel-Löb provability logic. Fortunately, the systems we'll need in this post only extend PA by a finite number of axioms which can be expressed as closed formulas in the language of GL, and there's a simple trick to reason about such extensions in GL itself:
Suppose that we extend PA by a single axiom, φ. Then, by the deduction theorem for first-order logic, PA+φ⊢ψ if and only if PA⊢φ→ψ. Thus, if both φ and ψ can be expressed as modal formulas, then we can express the proposition that ψ is provable in PA+φ by the modal formula □(φ→ψ). The same trick works, of course, if we want to add finitely many different modal formulas to PA, since we can simply let φ be their conjunction.
In this post, when I'm using this trick to talk about PA+φ, I'll emphasize this fact by abbreviating □(φ→ψ) to □φ(ψ) (since we often subscript □ by the proof system when it's not clear which system we are referring to).
Modal UDT
The version of modal UDT I'm using in this article is equivalent to the one in Vladimir's article, except for using □φ(⋅) instead of □(⋅), for an appropriate φ. Nevertheless, let me give a quick recap, which puts a slightly different spin on the system, and check that this does indeed meet the specification of a "modal decision theory".
As a modal decision theory, m-action, n-preference level modal UDT is a sequence →UDT(→u) of m formulas in n parameters, such that UDTi(→U) gives the action UDT chooses in the modal universe →U. Let's write →UDT(φ)(→u) for the version which uses provability in PA+φ.
I think the easiest way to specify →UDT(φ)(→u) is to say that it's a sequence of modal formulas which describe the behavior of the following an algorithm (which needs a halting oracle):
In other words, UDT starts with the most preferred outcome, and tries to find an action that provably implies that outcome; if there is such an action, it returns it, breaking ties lexicographically. If there isn't, it tries the second-most-preferred outcome, and so on down to the least preferred one. If no actions imply any outcomes, it returns an arbitrary default action.
To actually write out UDT(φ)i, we need to find a formula that describes when the above algorithm returns the action i. Clearly, for actions other than the default action, the condition is that there's some outcome j such that (a) it's provable that i implies j; (b) there's no action i′<i which provably implies j; (c) and there isn't any outcome j′<j which is provably implied by any action i′. The default action 1 is taken if the condition above holds or if no action provably implies any outcome.
Let's agree to order pairs (j,i) lexicographically, i.e., (j,i)<(j′,i′) if j<j′ or j=j′ and i<i′. Then, we can write UDT(φ)i(→u) as follows, for i>1: UDT(φ)i(→u):≡n⋁j=1⎛⎝□φ(UDT(φ)i(→u)→uj)∧⋀(j′,i′)<(j,i)¬□φ(UDT(φ)i′(→u)→uj′)⎞⎠, and for i=1: UDT(φ)1(→u):≡n⋁j=1⎛⎝□φ(UDT(φ)1(→u)→uj)∧⋀(j′,i′)<(j,1)¬□φ(UDT(φ)i′(→u)→uj′)⎞⎠∨(m,n)⋀(j′,i′)=(1,1)¬□φ(UDT(φ)i′(→u)→uj′). From this definition, it's straight-forward to see that →UDT(φ) is provably mutually exclusive and exhaustive (p.m.e.e.); we work in GL and distinguish the following cases: There either is no pair (j,i) such that □(UDT(φ)i(→u)→uj), or some pair (j,i) is the first one. If there is no such pair, then UDT(φ)1(→u) is true, and all other formulas in the sequence are false. If (j,i) is the first such pair, then UDT(φ)i(→u) is true, and all the other formulas are false.
Provably extensional problems
Let's call a closed formula in the language of GL sound if its translation into the language of arithmetic is true on the natural numbers; unsurprisingly, we'll write this as N⊨φ for short. My claim is that for every provably extensional decision problem →P(→a), there is a sound formula φ such that →UDT(φ)(→u) performs optimally on →P(→a), that is, obtains the best outcome that any decision theory can achieve on this outcome.
Recall that a decision problem →P(→a) is defined to be provably extensional if GL⊢(→a↔→b)→(→P(→a)↔→P(→b)), where →φ↔→ψ stands for the conjunction (φ1↔ψ1)∧⋯∧(φn↔ψn). (Of course, the two sequences need to be of the same length.)
Intuitively, such a decision problem assigns a single outcome f(i) to every action i, and every agent which chooses action i will obtain outcome f(i); this is a formalization of the idea that we should consider a decision problem "fair" if it only rewards or punishes you for the actions you decide to take, not for the decision process you used to arrive at these decisions.
This doesn't mean that the mapping f(i) is all that matters about the decision problem---for example, any particular version of modal UDT fails on its "evil" decision problem, despite the fact that these "evil" problems are provably extensional. That's because it can be difficult to determine which mapping f(⋅) a particular →P(→a) corresponds to.
The idea of the optimality proof in this post is that for any particular →P(→u), there is some truth about what the corresponding mapping f(⋅) is, even if it's hard to determine, and if we run UDT with a proof system strong enough to handle this problem, this version of UDT will simply be able to figure out which action is best and then take it.
How do we find a proof system strong enough to determine the action/outcome mapping f(⋅) for a given →P(→a)? The simplest possibility is to just take PA and add to it the information about the true action/outcome mapping.
Let χ(i)i:≡⊤ and χ(i)i′:≡⊥ for i′≠i, and write →χ(i) for the corresponding sequence of length m or n (depending on the context). By the true mapping, I mean the mapping f(⋅) such that for all i∈{1,…,m}, we have N⊨Pf(i)(→χ(i)). This mapping is unique, because →P(→a) is a decision problem, which by definition implies that the formulas in this sequence are p.m.e.e. (i.e., exactly one of them is true, for any particular value of the →a).
Now it's clear how we can encode the fact that this is the true mapping as a formula of GL: we simply set φ:≡⋀ni=1Pf(i)(→χ(i)).
Optimality
It's now clear the best outcome any agent can obtain on the provably extensional decision problem →P(→a) is minif(i): An agent →A is a sequence of closed p.m.e.e. formulas, which means that for any agent →A there is some i such that N⊨→A↔→χ(i), which by extensionality implies N⊨→P(→A)↔→P(→χ(i)), so the outcome obtained by →A is f(i).
Write (→A,→U) for the fixed point of →UDT(φ)(→u) and →P(→a), with φ defined as in the previous section. Then, in order to show that →UDT(φ)(→u) performs optimally on this decision problem, we need to show that N⊨→A↔→χ(i∗) for some i∗ such that f(i∗)=j∗:=minif(i).
I claim that this holds for i∗ the smallest i such that f(i)=j∗. Given the way →UDT(φ)(→u) is constructed, in order to show this I need to show that N⊨□φ(UDT(φ)i∗(→U)→Uj∗), plus N⊨¬□φ(UDT(φ)i′(→U)→Uj′) for all (j′,i′)<(j∗,i∗).
The first of these statements is equivalent to GL⊢φ→(UDT(φ)i∗(→U)→Uj∗), which, given the properties GL⊢→A↔→UDT(φ)(→U) and GL⊢→U↔→P(→A) defining →A and →U, we can rewrite as GL⊢φ→(Ai∗→Pj∗(→A)). Since →A is p.m.e.e., Ai∗ is equivalent to →A↔χ(i∗), so by provable extensionality of →P(→a), we have GL⊢Ai∗→(→P(→A)↔→P(→χ(i∗))). The definition of φ implies GL⊢φ→(→P(→χ(i∗)↔→χ(j∗)), so we obtain GL⊢(φ∧Ai∗)→(→P(→A)↔→χ(j∗)) or GL⊢(φ∧Ai∗)→Pj∗(→A), which is equivalent to the desired result.
For the second statement, we need to show that UDT(φ)i′(→U)→Uj′ is not provable in PA+φ for any (j′,i′)<(j∗,i∗). So suppose it were provable, for some such (j′,i′). Then →UDT(φ)(→U) would take the action i′ corresponding to the smallest such pair; in other words, we would have N⊨UDT(φ)i′. We can again use the defining properties of →A and →U to rewrite this as N⊨Ai′.
Since PA is sound and N⊨φ, the assumption that UDT(φ)i′(→U)→Uj′ is provable in PA+φ implies that it is true on N. We can rewrite this as well, to N⊨Ai′→Pj′(→A). Hence, we have N⊨Ai′∧Pj′(→A).
If j′<j∗, this says that →A achieves an outcome better than j∗, which we've shown above to be the best outcome that any agent can possibly achieve---contradiction. If j′=j∗ and i′<i∗, then we have found an i′<i∗ such that f(i′)=j∗ (since by extensionality, N⊨Ai′ implies N⊨Pf(i′)(→A), and hence N⊨¬Pj′(→A) for j′≠f(i′) as →P(→a) is p.m.e.e.)---contradiction to the definition of i∗.