Theorem 4: Pseudocausal IPOMDP's: If an infra-POMDP has a transition kernel K:S×Aik→O×S which fulfills the niceness conditions, and there's a starting infradistribution ψ∈□S, unrolling it into an infrakernel via Θ(π):=pr(A×O)ω∗(ψ⋉Kπ:∞) has the infrakernel Θ being pseudocausal and fulfilling all belief function conditions except for normalization.
This takes a whole lot of work to show. The proof breaks down into two phases. The first is showing that the Kπn fulfill the niceness conditions (which splits into four subphases of showing 1-Lipschitzness, lower-semicontinuity, compact-shared CAS, and moving constants up/mapping 1 to 1). This is the easy part to do. The second phase is much longer. It splits into five pieces, where we show our list of conditions for Θ. Our list of conditions to show is:
1:The Θ(π) have a uniform upper bound on their Lipschitz constants.
2: π↦Θ(π)(U) is lower-semicontinuous for all U
3: Θ(π) is supported entirely on histories compatible with π.
4: All the Θ(π) agree on the value of 1 (or infinity).
5: Θ is pseudocausal.
Parts 1 and 4 can be done with little trouble, but parts 2, 3, and 5 split into additional pieces. To show pseudocausality, ie, condition 5, we have to start with our proof goal and keep rewriting it into an easier form, until we get something we can knock out with an induction proof, which splits into a base case and an induction step. To show sensible supports, ie, part 3, we have to split it into three parts, there's one induction proof for the finite Kπ:n, showing sensible supports for Kπ:∞, and then finally extending this to the result we actually want, which is fairly easy.
And then there's lower-semicontinuity, part 2, which is an absolute nightmare. This splits into three parts. One is explicitly constructing a sequence of compact sets to use in the Almost-Monotone Lemma for any policy whatsoever, and then there's trying to show that π↦(ψ⋉Kπ:∞)(f) is lower-semimicontinuous which requires a massive amount of messing around with integrals and limits, and then it's pretty easy to show lower-semicontinuity for Θ from there.
T4.1 Our first order of business is showing the five niceness conditions for all the Kπn infrakernels, so we can take the infinite semidirect product and show that Kπ:∞ inherits the five niceness conditions (from the proof of Theorem 1).
Remember that Kπn is defined as
Kπn(s0,aos1:n):=δπ(ao1:n)⋉(λa.K(sn,a))
Obviously it produces inframeasures, since K does.
T4.1.1 For lower-semicontinuity in the input, we have
liminfm→∞Kπn(sm0,aosm1:n)(f)
=liminfm→∞(δπ(aom1:n)⋉(λa.K(smn,a)))(f)
=liminfm→∞K(smn,π(aom1:n))(λo,s.f(π(aom1:n),o,s))
and then, as aosm1:n limits to aos∞1:n, we have aom1:n=ao∞1:n forever after some finite m, because ∏i=ni=1(A×O) is a finite space. So, it turns into
=liminfm→∞K(smn,π(ao∞1:n))(λo,s.f(π(ao∞1:n),o,s))
and then, by lower-semicontinuity for K (one of the niceness conditions)
≥K(s∞n,π(ao∞1:n))(λo,s.f(π(ao∞1:n),o,s))
and then we pack this back up as
=δπ(aom1:n)⋉(λa.K(s∞n,a))(f)=Kπn(s∞0,aos∞1:n)(f)
and lower-semicontinuity has been shown.
T4.1.2 For 1-Lipschitzness, observe that δπ(ao1:n)⋉(λa.K(sn,a)) is 1-Lipschitz because K is 1-Lipschitz, and this is the semidirect product of a 1-Lipschitz (all probability distributions are 1-Lipschitz, and this is a dirac-delta distribution) infradistribution with a 1-Lipschitz infrakernel.
T4.1.3 For compact-shared compact-almost-support, observe that (s0,aos1:n)↦(sn,π(ao1:n)) is a continuous function, so any compact subset of S×(A×O×S)n (input to Kπn) maps to a compact subset of S×A (corresponding inputs for K). Then, we can apply compact-shared compact-almost-support for K (one of the niceness conditions) to get a compact almost-support on O×S, and take the product of that subset with all of A itself, to get a shared-compact almost-support for Kπn.
T4.1.4 For mapping constants to above constants, we have
Kπn(s0,aos1:n)(c)=(δπ(ao1:n)⋉(λa.K(sn,a)))(c)
=δπ(ao1:n)(λa.K(sn,a)(c))=K(sn,π(ao1:n))(c)≥c
By increasing constants for K (one of the niceness conditions). This same proof works with equality when c=1, because of the "map 1 to 1" condition on the [0,1] type signature. So all the Kπn have the niceness properties, and we can form the infinite semidirect product at all, and the infinite semidirect product inherits these niceness conditions.
T4.2 Now we can get working on showing the belief function conditions.
T4.2.1 For showing a bounded Lipschitz constant, we have
and then, since all the Kπ:∞(s0) are 1-Lipschitz (infinite semidirect product has niceness conditions), we get
≤λ⊙supaos1:∞|U(ao1:∞)−U′(ao1:∞)|=λ⊙⋅d(U,U′)
and we're done, we showed a Lipschitz bound on the Θ(π) that is uniform in π.
T4.2.2 Now for lower-semicontinuity for Θ, which is the extremely hard one.
The way this proof will work is that we'll first show a variant of the Almost-Monotone Lemma that works for all the Kπ:∞ infrakernels simultaneously. Then, using that, we'll show that
π↦(ψ⋉Kπ:∞)(f) is lower-semicontinuous. And then, it's pretty easy to wrap up and get that
π↦Θ(π)(U) is lower-semicontinuous, from that.
T4.2.2.1 Anyways, our first goal is a variant of the Almost-Monotone Lemma. The particular variant we're searching for, specialized to our particular case, is:
We let our CS be an arbitrary compact subset of states, and ϵ be arbitrary.
Now, let's assume there's a sequence of compact sets C⊤i[CS,ϵ] where, regardless of n or π, C⊤1:n+1[CS,ϵ] is a shared ϵ(1−12n+1)-almost-support for all the Kπ:n(s0) inframeasures (where s0∈CS).
The proof of the Almost-Monotone Lemma goes through perfectly fine, the key part where our choice of compact sets matters is in the ability to make the assumption that C⊤1:n+m+1[CS,ϵ] is an ϵ(1−12n+m+1)-almost-support for all the Kπ:n+m(s0) with s0∈CS, as our s0 is. And we assumed our sequence of compact sets fulfilled that property.
So, where we're at is that our variant of the Almost-Monotone-Lemma works, as long as our assumption works out, that there's a sequence of compact sets C⊤i[CS,ϵ] where, regardless of n or π or s0∈CS, C⊤1:n+1[CS,ϵ] is a ϵ(1−12n+1)-almost-support for Kπ:n(s0).
In order to get our variant of the AML to go through, we really need to prove that there is such a sequence, for any ϵ and CS. For this, we'll be looking at a function
Ξ:(0,∞)×A×K(S)→K(O,S)
What this does is that Ξ(ϵ,a,CS) is defined to return a compact subset of O×S which is an ϵ-almost-support for all the K(a,s) inframeasures, when s∈CS. There are always compact subsets like this for any input because of shared almost-compact-support for the infrakernel K (one of the assumed niceness conditions), so there does indeed exist a function Ξ with these properties.
Then, we recursively define the C⊤i[CS,ϵ] as:
C⊤1[CS,ϵ]:=A×⋃aΞ(ϵ2,a,CS)
and
C⊤n+1[CS,ϵ]:=A×⋃aΞ(ϵ2n+1,a,prSn(C⊤1:n[CS,ϵ]))
All these sets are compact, by induction. For the first one, since Ξ always returns compact sets, we're taking a finite union of compact sets (is compact), and taking the product with a finite set (is compact). For induction up, since the product of compact sets is compact, C⊤1:n[CS,ϵ] is compact, and then projections of compact sets are compact, so then Ξ returns a compact set, and again, we take a finite union of compact sets and then the product with a compact set to make another compact set.
Now, let's get started on showing that C^{\top}_{1:n+1}[C^{S},\eps] is an \eps\left(1-\frac{1}{2^{n+1}}\right)-almost-support for K^{\pi}_{:n}(s_0), for arbitrary \pi and s_0\in C^{S}. This proof will proceed by induction.
For the base-case, we want that C⊤1[CS,ϵ] is a ϵ2-almost-support for Kπ:n(s0) with s0∈CS. Well, assume f,f′ are equal on C⊤1[CS,ϵ], and s0∈CS.
and then we observe that since f=f′ on C⊤1[CS,ϵ], and that set is defined as A×⋃aΞ(ϵ2,a,CS) , we have that they agree on {π()}×Ξ(ϵ2,π(),CS)
So that term turns into 0, the two functions are equal on our set of interest, so our net upper bound is
≤ϵ2d(f,f′)
Showing that C⊤1[CS,ϵ] is a ϵ2-almost support for all the Kπ:0(s0), where π is arbitrary and s_0\in C^{S}.
Now for induction. We want to show that C⊤1:n+2[CS,ϵ] is a ϵ(1−12n+2)-almost-support for Kπ:n+1(s0) with s0∈CS, assuming that C⊤1:n+1[CS,ϵ] is a ϵ(1−12n+1)-almost-support for Kπ:n(s0) with s0∈CS.
To begin, assume f,f′ are equal on C⊤1:n+2[CS,ϵ] and s0∈CS. Then we can go:
We apply a Lemma 2 from LBIT decomposition, with C⊤1:n+1[CS,ϵ] as an ϵ(1−12n+1)-almost support for Kπ:n(s0), which works by induction assumption. Since Kπ:n(s0) is also 1-Lipschitz (the niceness conditions we already know), the Lemma 2 upper bound breaks down as
Now, for that top term, we know that sn+1 comes from prSn+1(C⊤1:n+1[CS,ϵ]). We apply another Lemma 2 decomposition on K this time, with the compact set of interest being
Ξ(ϵ2n+2,π(ao1:n+1),prSn+1(C⊤1:n+1[CS,ϵ]))
which is a ϵ(12n+2)-almost-support for K(sn+1,π(ao1:n+1)). Pairing this with 1-Lipschitzness of K, we have an upper bound of
which all our aos1:n+1,π(ao1:n+1),o,s we're feeding in lie within. So those functions are identical for those inputs, and our upper-bound just reduces to
=ϵ(1−12n+2)d(f,f′)
Which witnesses that C⊤1:n+2[CS,ϵ] is a ϵ(1−12n+2)-almost-support for any Kπ:n+1(s0) (arbitrary π) as long as s0∈CS. So our induction step goes off without a hitch, and we now know that the C⊤i[CS,ϵ]sequence is suitable to get our modification of the Almost-Monotone-Lemma to work out. We have the result
And remember, the sequence of compact sets only depends on CS and ϵ, not on the policy. This works for any policy. Now that we have this result in the bag, we can move on to phase 2.
T4.2.2.2 For phase 2, we'll show that π↦(ψ⋉Kπ:∞)(f) is lower-semicontinuous, and dear lord this is going to be hard and take a lot of integrals and limits to show. We start with
liminfm→∞(ψ⋉Kπm:∞)(f)
Which unpacks as
=liminfm→∞ψ(λs0.Kπm:∞(s0)(λaos1:∞.f(s0,aos1:∞)))
If this liminf is infinity, we're just done, infinity beats everything. So we'll be assuming that it settles down to a finite value. We can unpack the ψ as a challenge of picking points in Ψ, so we get
In the [0,1] case, the set of minimal points is precompact. In the R case, it may not be precompact, but the only way it fails to be precompact is having b values running off to infinity. Said λs0.Kπm:∞(s0)(f) function can't be unbounded below, because it is above Kπm:∞(s0)(−||f||), and infinite semidirect products map constants to greater than their usual value, and Kπm:∞(s0) fulfills this condition. Thus, if the b value of a selected minimal point is far higher than the finite liminf value plus λ⊙||f||, the value said minimal point assigns to the function of interest is far higher than the true minimal value. So, when minimizing, we can restrict to a compact subset of Ψ to attain the min, call it CΨ. With this, we get
Now, we can pass to two subsequences. First, we can pass to a subsequence of the m that actually limits to the liminf value. On this subsequence, the policies still limit to π∞. Also, for each member of this subsequence, we can find an true minimizing (m,b) pair for it since we're minimizing the lower-semicontinuous function
(m,b)↦∫S(λs0.Kπm:∞(s0)(λaos1:∞.f(s0,aos1:∞)))dm+b
over a compact set. Said function is lower-semicontinuous because
λs0.Kπm:∞(s0)(λaos1:∞.f(s0,aos1:∞))
is lower-semicontinuous, it's one of the niceness properties that the infinite infrakernel inherits.
Since we're selecting a-measures from the compact set CΨ, we can find a subsequence of our subsequence where the a-measures converge. Now we'll be using i as our index, and using it to index the (m,b) pairs. The limit a-measure is (m∞,b∞). So, we have a rewrite as
Further, all numbers in this limit are finite, because we're operating in the case where the liminf is finite, and i indexes a subsequence which actually limits to the liminf. For said limit, bi limits to b∞, so we can pull that out of the limit, to get
At this point, since the mi limit to m∞, the set of the measure components {mi}i∈N⊔{∞} is compact in the space of measures on S. In particular, this implies that there is some compact set CS⊆S where all the mi only have ϵ or less measure outside said set (a necessary condition for precompactness of a set of measures)
Now, we split the measures mi into the component of measure on CS, and the component outside of it.
Because CS is a compact subset of S, we can use it as a seed set for our C⊤n[CS,ϵ] sequence that attains almost-monotonicity. To economize on space for the lines to fit, we'll write
infaosn+2:∞∈C⊤n+2:∞[CS,ϵ]f(s0,aos1:n+1,aosn+2:∞)
as just
infC⊤n+2:∞[CS,ϵ]f(s0,aos1:n+1,aosn+2:∞)
We unpack the defining limit of the infinite semidirect product with our relevant sequence of compact sets to get
It is exceptionally important to note here that we can't just throw our variant of the Almost-Monotone Lemma at this, because we don't have a promise thats0∈CS.
An easy way to show this is to show it for all n≥n∗. So let n≥n∗. Then, our proof goal is that
Now, to break down the integral on the top. Since the behavior of the interior of that integral only depends on s0∈CS, we can apply our good old Almost-Monotone-Lemma variation, and lower-bound the limit by the n∗ finite stage, minus 4ϵ||f|| (independently of the choice of πi) yielding
Now, we pull that constant piece out of the integral, and since both functions in the integral are now identical, we can stitch the measure components back together to get:
Now, mi,CS is a fragment of mi, which came from a minimal point in Ψ or a limit of such, so the maximal amount of measure present would be λ⊙. This lets us get a lower bound of
For all n∗. Regardless of n∗, this liminf value in i is finite because it's upper-bounded by the liminf in m, which is assumed to be finite (we disposed of the infinite case already). Because the sequence mi and the compact sets C⊤n[CS,ϵ] were fixed before we picked n∗, we can take a limit of the n∗ without worrying about it affecting any of the stuff on the inside of the integral, to get
This liminf in i of the integral is finite for all n. At this point, we can notice something interesting. For fixed n, when we take the limit as i approaches infinity, eventually Kπi:n will become Kπ∞:n and never go back, because of convergence of the policies, and the fact that the infrakernel has a cutoff at time n, so the behavior of the policies before that time will stabilize on a partial policy. So, we can rewrite our lower bound as:
is lower-semicontinuous in s0 (as all the finite stages of the infinite infrakernel have that niceness property). And m↦m(f∗) is lower-semicontinuous when f∗ is. Also, by now, the only thing i is controlling is the measure term. Therefore, that liminf of the integral (in i) is greater than what you'd get if you just passed to m∞. So, we get
(any sequence of compact sets works to define the infinite semidirect product) and for all n, the integrals are finite (this is why we kept restating it over and over!), we can apply Fatou's Lemma to move the liminf into the integral, to get
But because ϵ was arbitrary and ||f|| and λ⊙ are finite, we have our desired net result of
liminfm→∞(ψ⋉Kπm:∞)(f)≥(ψ⋉Kπ∞:∞)(f)
And lower-semicontinuity is shown for π↦(ψ⋉Kπ:∞)(f)
T4.2.2.3 Now for our phase 3, getting lower-semicontinuity of π↦Θ(π)(U). Let πm limit to π∞. We have
liminfm→∞Θ(πm)(U)
and then this unpacks as
=liminfm→∞pr(A×O)ω∗(ψ⋉Kπm:∞)(U)
Reexpressing the projection, we have:
=liminfm→∞(ψ⋉Kπm:∞)(λs0,aos1:∞.U(ao1:∞))
And then, we can apply lower-semicontinuity of π↦(ψ⋉Kπ:∞)(f) which we've shown in phase 2, to get
≥(ψ⋉Kπ∞:∞)(λs0,aos1:∞.U(ao1:∞))
Which then packs up as
=pr(A×O)ω∗(ψ⋉Kπ∞:∞)(U)
and then
=Θ(π∞)(U)
and we have lower-semicontinuity for Θ! Finally!
T4.2.3 Time to return to easier fare. We need that Θ(π) is supported entirely on histories compatible with π. To do this, we proceed in three steps.
First, we do an induction proof that, for any given π and s0, Kπ:n(s0) is supported on the subset of (A×O×S)n+1 where the actions and observations so far are compatible with π.
Second, we use our first result to show that for any two continuous bounded utility functions U,U′ which only depend on the action/observation sequence, and agree on all histories compatible with π,
Finally, we use this result to show that if U,U′ agree on all histories compatible with π, Θ(π)(U)=Θ(π)(U′), showing that Θ(π) is supported on histories compatible with π.
T4.2.3.1 Let's start with the induction proof that Kπ:n(s0) has the subset of (A×O×S)n+1 where the history of actions and observations is compatible with π as a support. The way to do that is to pick two functions that are identical on said subset, and show they have equal expectation values. For the base case of the induction proof, fix an f and f′ which are identical on {π()}×O×S (as that's the subset of A×O×S which is compatible with π), and observe:
Kπ:0(s0)(f)=Kπ0(s0)(λa,o,s.f(a,o,s))
and then by how Kπ0(s0) is defined, we have
=(δπ()⋉(λa.K(s0,a)))(λa′,o,s.f(a′,o,s))
Which, by unpacking the semidirect product and subsituting the dirac-delta value in, we have
=K(s0,π())(λo,s.f(π(),o,s))
and then since f=f′ when the initial action equals π(), we have
=K(s0,π())(λo,s.f′(π(),o,s))
and then this just wraps back up by our sequence of rewrite steps in reverse as
=Kπ:0(s0)(f′)
Now that we have the base case out of the way, we'll attempt to show the induction step. Let f and f′ be identical on the subset of (A×O×S)n+2 which is compatible with π, and we'll show they have equal expectation value. Our job is to unpack
|Kπ:n+1(s0)(f)−Kπ:n+1(s0)(f′)|
this unpacks, by how sequences of semidirect products are defined, as
Now, since, by induction assumption, Kπ:n(s0) is supported on the subset of (A×O×S)n+1 where the history is compatible with π, we just need to show that the inner functions are identical on that subset to get that the difference is 0, so our proof goal is now
at this point, we realize that if aos1:n+1 is compatible with π, then regardless of o and s, aos1:n+1,π(ao1:n+1),o,s is compatible with π, so the functions
λo,s.f(aos1:n+1,π(ao1:n+1),o,s)
and
λo,s.f′(aos1:n+1,π(ao1:n+1),o,s)
are equal, and we get our desired proof goal. So the induction goes through, and the subset of (A×O×S)n+1 where the history is compatible with π will always be a support of Kπ:n for all n.
T4.2.3.2 Now that's in place, we'll show that any two continuous bounded functions U and U′ which only depend on the action/observation sequence and are identical on histories compatible with π will be assigned equal value by Kπ:∞(s0). Assume U and U′ only depend on action/observation sequences and are identical on histories compatible with π. We have a desired proof target of
Now, we can use the Lemma 2 from LBIT decomposition on Kπ:n(s0), since we know that the subset of (A×O×S)n+1 where the history is compatible with π is a support of Kπ:n(s0), to get an upper bound on
For the first part, the 1 is there because all the infrakernels are 1-Lipschitz, and we're assessing distance of the inner functions on the support. For the second part, it's there because we don't care how different the two functions are outside the support, they must be assigned equal value. So anyways, this upper bound reduces to
Now, for all choices of n and aos1:n+1∼π, fix an arbitrary extension ao∼πn+2:∞ which is compatible with π (which can always be done). Then use the triangle inequality twice to get an upper-bound of
We can observe that, since U and U′ are equal on all histories compatible with π, and ao∼πn+2:∞ was picked specifically to have ao1:n+1,ao∼πn+2:∞∼π, that second absolute value term must always be 0, to get
quantity we wanted to show equals 0 to hit our proof target, we just need to show that it equals 0.
Here's what we do now. Since U and U′ are defined over (A×O)ω, a compact space, they must both be uniformly continuous. You can take any ϵ, and find some δ where two histories that only differ by δ are assigned values ϵ or less apart. And for any δ, there's some n where any two histories that agree on the first n steps only differ by δ. So, in the limit, both of these terms become 0, and our result that
for any s0 and U,U′ which agree on all histories compatible with π follows since we hit our proof target.
T4.2.3.3 Finally, it's time to apply this result to get that Θ(π) is supported only on histories compatible with π. Fix any π, and U,U′ which are identical on histories compatible with π. We can start unpacking.
And for the [0,1] type signature, any infinite semidirect product fulfilling the niceness conditions (which is the case) must map 1 to 1, so this turns into
=|ψ(1)−ψ(1)|=0
And we're done, and in fact get the stronger result that Θ(π)(1)=1 regardless of π because ψ(1)=1 because we assumed ψ was an infradistribution. For the R type signature, fixing any π and π′, we have
Regardless of type signature, an infinite semidirect product fulfilling the niceness conditions (which we're assuming) must map constants to an equal or greater constant, and infinity can be regarded as the supremum of putting constants in, so we get
=|ψ(∞)−ψ(∞)|=0
Importantly, this does not mean that Θ(π)(∞) must be ∞.
T4.2.5 Now for the very last condition, checking that Θ is pseudocausal. This takes a whole lot of work (though less than lower-semicontinuity), and we'll do it by repeatedly showing equivalence of our desired result to some slightly simpler result, until we ground out in an induction proof we clean up in the same way, which splits into a base case and induction step.
T4.2.5.1 We want to show that
∀U:u1∼π′(Θ(π))(U)≥Θ(π′)(U)
We can unpack Θ(π) and Θ(π′), to get the equivalent proof target of:
u1∼π′(pr(A×O)ω∗(ψ⋉Kπ:∞))(U)≥pr(A×O)ω∗(ψ⋉Kπ′:∞)(U)
First, we expand the update, to get the equivalent proof target of:
We'd be able to prove this if we could prove the inequality for all the n individually, as then the inequality would carry over to the limit. So, our new proof target is, for arbitrary n,
We can observe something interesting here for the update. if ao1:n+1 is compatible with π′, then the inf can be modeled as always picking some continuation compatible with π′, because doing otherwise would make the function equal 1 (or infinity). In such a case, the inner function of the top would turn into infaon+2:∞U(ao1:n+1,aon+2:∞). If ao1:n+1 is incompatible with π′, then it doesn't matter the extension, any extension will always be incompatible with π′, and return 1 (or infinity). So, this can be rephrased as
Now, since these inner functions are identical, we'd be able to prove this proof target if we proved the following for arbitrary functions fn+1:(A×O×S)n+1→[0,1] (or R), and arbitrary n,π′,π,s0 as before.
u1∼π′(Kπ:n(s0))(fn+1)≥Kπ′:n(s0)(fn+1)
So, since we're trying to prove this result for arbitrary n, the obvious thing to do is an induction proof. If we can show this by induction, we hit our proof target and we're done.
We recall that for the iterated semidirect product, the :0 thing is just Kπ0, and Kπ0(s0)=δπ()⋉(λa.K(s0,a)). Making these substitutions, we get an equivalent proof target of
Now, looking at that gigantic first term... if ao1:n+1 is incompatible with π′, then the really big function is just going to turn into 1 (or infinity), which is assigned a value of 1 (or infinity) by K, from the usual constants increase/1-normalization argument.
If ao1:n+1∼π′, then we're back in the usual business except we don't need to worry about incompatibilities of the history with π′ within ao1:n+1, we only need to worry about those incompatiblities for the final action, π(ao1:n+1). So, we can rewrite that first term with indicator functions.
,we can just put the two inequalities together, and we'll attain our proof target and the induction will go through and we'll be done. So, our new proof target is
Splitting into two cases, it could be that π(ao1:n+1)≠π′(ao1:n+1). In this case that big inner function in the attempted upper bound turns into just 1, and then K(sn+1,π(ao1:n+1))(1)=1. Or, for the R type signature, we'd get K(sn+1,π(ao1:n+1))(∞)=∞.
In the second case where π(ao1:n+1)=π′(ao1:n+1), we'd get K(sn+1,π′(ao1:n+1))(λo,s.fn+2(aos1:n+1,π′(ao1:n+1),o,s)) as the value of the function inside Kπ′:n(s0), due to the indicator function working out appropriately, and π′ matching up with what π does. This ability to break stuff up into an indicator function yields a new equivalent proof target of
We would be able to show this via inframeasure monotonicity (as the starting inframeasures are equal now) if, for all aos1:n+1, the first function was larger than the second one. Let aos1:n+1 be arbitrary, and our proof goal is now
(or infinity). Infinity beats everything, and the inframeasures in the [0,1] type signature can report a maximum value of 1, so this proof goal is just obviously true.
In the second subcase, where π(ao1:n+1)=π′(ao1:n+1), our proof goal would turn into:
But they're equal. We hit our proof goal, the induction goes through, and this hits the proof goal for establishing pseudocausality, which was the last condition we needed to show our result. We're done!
Theorem 4: Pseudocausal IPOMDP's: If an infra-POMDP has a transition kernel K:S×Aik→O×S which fulfills the niceness conditions, and there's a starting infradistribution ψ∈□S, unrolling it into an infrakernel via Θ(π):=pr(A×O)ω∗(ψ⋉Kπ:∞) has the infrakernel Θ being pseudocausal and fulfilling all belief function conditions except for normalization.
This takes a whole lot of work to show. The proof breaks down into two phases. The first is showing that the Kπn fulfill the niceness conditions (which splits into four subphases of showing 1-Lipschitzness, lower-semicontinuity, compact-shared CAS, and moving constants up/mapping 1 to 1). This is the easy part to do. The second phase is much longer. It splits into five pieces, where we show our list of conditions for Θ. Our list of conditions to show is:
1:The Θ(π) have a uniform upper bound on their Lipschitz constants.
2: π↦Θ(π)(U) is lower-semicontinuous for all U
3: Θ(π) is supported entirely on histories compatible with π.
4: All the Θ(π) agree on the value of 1 (or infinity).
5: Θ is pseudocausal.
Parts 1 and 4 can be done with little trouble, but parts 2, 3, and 5 split into additional pieces. To show pseudocausality, ie, condition 5, we have to start with our proof goal and keep rewriting it into an easier form, until we get something we can knock out with an induction proof, which splits into a base case and an induction step. To show sensible supports, ie, part 3, we have to split it into three parts, there's one induction proof for the finite Kπ:n, showing sensible supports for Kπ:∞, and then finally extending this to the result we actually want, which is fairly easy.
And then there's lower-semicontinuity, part 2, which is an absolute nightmare. This splits into three parts. One is explicitly constructing a sequence of compact sets to use in the Almost-Monotone Lemma for any policy whatsoever, and then there's trying to show that π↦(ψ⋉Kπ:∞)(f) is lower-semimicontinuous which requires a massive amount of messing around with integrals and limits, and then it's pretty easy to show lower-semicontinuity for Θ from there.
T4.1 Our first order of business is showing the five niceness conditions for all the Kπn infrakernels, so we can take the infinite semidirect product and show that Kπ:∞ inherits the five niceness conditions (from the proof of Theorem 1).
Remember that Kπn is defined as
Kπn(s0,aos1:n):=δπ(ao1:n)⋉(λa.K(sn,a))
Obviously it produces inframeasures, since K does.
T4.1.1 For lower-semicontinuity in the input, we have
liminfm→∞Kπn(sm0,aosm1:n)(f)
=liminfm→∞(δπ(aom1:n)⋉(λa.K(smn,a)))(f)
=liminfm→∞K(smn,π(aom1:n))(λo,s.f(π(aom1:n),o,s))
and then, as aosm1:n limits to aos∞1:n, we have aom1:n=ao∞1:n forever after some finite m, because ∏i=ni=1(A×O) is a finite space. So, it turns into
=liminfm→∞K(smn,π(ao∞1:n))(λo,s.f(π(ao∞1:n),o,s))
and then, by lower-semicontinuity for K (one of the niceness conditions)
≥K(s∞n,π(ao∞1:n))(λo,s.f(π(ao∞1:n),o,s))
and then we pack this back up as
=δπ(aom1:n)⋉(λa.K(s∞n,a))(f)=Kπn(s∞0,aos∞1:n)(f)
and lower-semicontinuity has been shown.
T4.1.2 For 1-Lipschitzness, observe that δπ(ao1:n)⋉(λa.K(sn,a)) is 1-Lipschitz because K is 1-Lipschitz, and this is the semidirect product of a 1-Lipschitz (all probability distributions are 1-Lipschitz, and this is a dirac-delta distribution) infradistribution with a 1-Lipschitz infrakernel.
T4.1.3 For compact-shared compact-almost-support, observe that (s0,aos1:n)↦(sn,π(ao1:n)) is a continuous function, so any compact subset of S×(A×O×S)n (input to Kπn) maps to a compact subset of S×A (corresponding inputs for K). Then, we can apply compact-shared compact-almost-support for K (one of the niceness conditions) to get a compact almost-support on O×S, and take the product of that subset with all of A itself, to get a shared-compact almost-support for Kπn.
T4.1.4 For mapping constants to above constants, we have
Kπn(s0,aos1:n)(c)=(δπ(ao1:n)⋉(λa.K(sn,a)))(c)
=δπ(ao1:n)(λa.K(sn,a)(c))=K(sn,π(ao1:n))(c)≥c
By increasing constants for K (one of the niceness conditions). This same proof works with equality when c=1, because of the "map 1 to 1" condition on the [0,1] type signature. So all the Kπn have the niceness properties, and we can form the infinite semidirect product at all, and the infinite semidirect product inherits these niceness conditions.
T4.2 Now we can get working on showing the belief function conditions.
T4.2.1 For showing a bounded Lipschitz constant, we have
|Θ(π)(U)−Θ(π)(U′)|=|pr(A×O)ω∗(ψ⋉Kπ:∞)(U)−pr(A×O)ω∗(ψ⋉Kπ:∞)(U′)|
=|ψ(λs0.Kπ:∞(s0)(λaos1:∞.U(ao1:∞)))−ψ(λs0.Kπ:∞(s0)(λaos1:∞.U′(ao1:∞)))|
and then since ψ is an infradistribution, it must have a finite Lipschitz constant, λ⊙. So we then get
≤λ⊙⋅sups0|Kπ:∞(s0)(λaos1:∞.U(ao1:∞))−Kπ:∞(s0)(λaos1:∞.U′(ao1:∞))|
and then, since all the Kπ:∞(s0) are 1-Lipschitz (infinite semidirect product has niceness conditions), we get
≤λ⊙supaos1:∞|U(ao1:∞)−U′(ao1:∞)|=λ⊙⋅d(U,U′)
and we're done, we showed a Lipschitz bound on the Θ(π) that is uniform in π.
T4.2.2 Now for lower-semicontinuity for Θ, which is the extremely hard one.
The way this proof will work is that we'll first show a variant of the Almost-Monotone Lemma that works for all the Kπ:∞ infrakernels simultaneously. Then, using that, we'll show that
π↦(ψ⋉Kπ:∞)(f) is lower-semicontinuous. And then, it's pretty easy to wrap up and get that
π↦Θ(π)(U) is lower-semicontinuous, from that.
T4.2.2.1 Anyways, our first goal is a variant of the Almost-Monotone Lemma. The particular variant we're searching for, specialized to our particular case, is:
∀CS∈K(S),ϵ>0:∃¯¯¯¯C∈∏∞i=1K(A×O×S):∀π∈Π,n,m∈N,s0∈CS,f∈CB((A×O×S)ω):
Kπ:n+m(s0)(λaos1:n+m+1.infaosn+m+2:∞∈∏∞i=n+m+2Cif(aos1:n+m+1,aosn+m+2:∞))
≥Kπ:n(s0)(λaos1:n+1.infaosn+2:∞∈∏∞i=n+2Cif(aos1:n+1,aosn+2:∞))−4ϵ||f||
We let our CS be an arbitrary compact subset of states, and ϵ be arbitrary.
Now, let's assume there's a sequence of compact sets C⊤i[CS,ϵ] where, regardless of n or π, C⊤1:n+1[CS,ϵ] is a shared ϵ(1−12n+1)-almost-support for all the Kπ:n(s0) inframeasures (where s0∈CS).
Then our proof goal becomes
∀π∈Π,n,m∈N,s0∈CS,f∈CB((A×O×S)ω):
Kπ:n+m(s0)(λaos1:n+m+1.infaosn+m+2:∞∈C⊤n+m+2:∞[CS,ϵ]f(aos1:n+m+1,aosn+m+2:∞))
≥Kπ:n(s0)(λaos1:n+1.infaosn+2:∞∈C⊤n+2:∞[CS,ϵ]f(aos1:n+1,aosn+2:∞))−4ϵ||f||
The proof of the Almost-Monotone Lemma goes through perfectly fine, the key part where our choice of compact sets matters is in the ability to make the assumption that C⊤1:n+m+1[CS,ϵ] is an ϵ(1−12n+m+1)-almost-support for all the Kπ:n+m(s0) with s0∈CS, as our s0 is. And we assumed our sequence of compact sets fulfilled that property.
So, where we're at is that our variant of the Almost-Monotone-Lemma works, as long as our assumption works out, that there's a sequence of compact sets C⊤i[CS,ϵ] where, regardless of n or π or s0∈CS, C⊤1:n+1[CS,ϵ] is a ϵ(1−12n+1)-almost-support for Kπ:n(s0).
In order to get our variant of the AML to go through, we really need to prove that there is such a sequence, for any ϵ and CS. For this, we'll be looking at a function
Ξ:(0,∞)×A×K(S)→K(O,S)
What this does is that Ξ(ϵ,a,CS) is defined to return a compact subset of O×S which is an ϵ-almost-support for all the K(a,s) inframeasures, when s∈CS. There are always compact subsets like this for any input because of shared almost-compact-support for the infrakernel K (one of the assumed niceness conditions), so there does indeed exist a function Ξ with these properties.
Then, we recursively define the C⊤i[CS,ϵ] as:
C⊤1[CS,ϵ]:=A×⋃aΞ(ϵ2,a,CS)
and
C⊤n+1[CS,ϵ]:=A×⋃aΞ(ϵ2n+1,a,prSn(C⊤1:n[CS,ϵ]))
All these sets are compact, by induction. For the first one, since Ξ always returns compact sets, we're taking a finite union of compact sets (is compact), and taking the product with a finite set (is compact). For induction up, since the product of compact sets is compact, C⊤1:n[CS,ϵ] is compact, and then projections of compact sets are compact, so then Ξ returns a compact set, and again, we take a finite union of compact sets and then the product with a compact set to make another compact set.
Now, let's get started on showing that C^{\top}_{1:n+1}[C^{S},\eps] is an \eps\left(1-\frac{1}{2^{n+1}}\right)-almost-support for K^{\pi}_{:n}(s_0), for arbitrary \pi and s_0\in C^{S}. This proof will proceed by induction.
For the base-case, we want that C⊤1[CS,ϵ] is a ϵ2-almost-support for Kπ:n(s0) with s0∈CS. Well, assume f,f′ are equal on C⊤1[CS,ϵ], and s0∈CS.
|Kπ:0(s0)(f)−Kπ:0(s0)(f′)|=|Kπ0(s0)(f)−Kπ0(s0)(f′)|
=|(δπ()⋉(λa.K(s0,a)))(λa,o,s.f(a,o,s))−(δπ()⋉(λa.K(s0,a)))(λa,o,s.f′(a,o,s))|
=|K(s0,π())(λo,s.f(π(),o,s))−K(s0,π())(λo,s.f′(π(),o,s))|
Now, Ξ(ϵ2,π(),CS) is a ϵ2-almost-support for K(s0,π()) because s0∈CS, so we can apply Lemma 2 from LBIT, and 1-Lipschitzness of K to split this up as
≤supo,s∈Ξ(ϵ2,π(),CS)|f(π(),o,s)−f′(π(),o,s)|+ϵ2supo,s|f(π(),o,s)−f′(π(),o,s)|
That second term can be upper bounded by d(f,f′), so we get an upper bound of
≤supo,s∈Ξ(ϵ2,π(),CS)|f(π(),o,s)−f′(π(),o,s)|+ϵ2d(f,f′)
and then we observe that since f=f′ on C⊤1[CS,ϵ], and that set is defined as A×⋃aΞ(ϵ2,a,CS) , we have that they agree on {π()}×Ξ(ϵ2,π(),CS)
So that term turns into 0, the two functions are equal on our set of interest, so our net upper bound is
≤ϵ2d(f,f′)
Showing that C⊤1[CS,ϵ] is a ϵ2-almost support for all the Kπ:0(s0), where π is arbitrary and s_0\in C^{S}.
Now for induction. We want to show that C⊤1:n+2[CS,ϵ] is a ϵ(1−12n+2)-almost-support for Kπ:n+1(s0) with s0∈CS, assuming that C⊤1:n+1[CS,ϵ] is a ϵ(1−12n+1)-almost-support for Kπ:n(s0) with s0∈CS.
To begin, assume f,f′ are equal on C⊤1:n+2[CS,ϵ] and s0∈CS. Then we can go:
|Kπ:n+1(s0)(f)−Kπ:n+1(s0)(f′)|
=|Kπ:n(s0)(λaos1:n+1.Kπn+1(s0,aos1:n+1)(λa,o,s.f(aos1:n+1,a,o,s)))
−Kπ:n(s0)(λaos1:n+1.Kπn+1(s0,aos1:n+1)(λa,o,s.f′(aos1:n+1,a,o,s)))|
and then unpack to get
=|Kπ:n(s0)(λaos1:n+1.(δπ(ao1:n+1)⋉(λa.K(sn+1,a)))(λa,o,s.f(aos1:n+1,a,o,s)))
−Kπ:n(s0)(λaos1:n+1.(δπ(ao1:n+1)⋉(λa.K(sn+1,a)))(λa,o,s.f′(aos1:n+1,a,o,s)))|
and unpack the semidirect product and substitute the dirac-delta value in to get
=|Kπ:n(s0)(λaos1:n+1.K(sn+1,π(ao1:n+1))(λo,s.f(aos1:n+1,π(ao1:n+1),o,s)))
−Kπ:n(s0)(λaos1:n+1.K(sn+1,π(ao1:n+1))(λo,s.f′(aos1:n+1,π(ao1:n+1),o,s)))|
We apply a Lemma 2 from LBIT decomposition, with C⊤1:n+1[CS,ϵ] as an ϵ(1−12n+1)-almost support for Kπ:n(s0), which works by induction assumption. Since Kπ:n(s0) is also 1-Lipschitz (the niceness conditions we already know), the Lemma 2 upper bound breaks down as
≤supaos1:n+1∈C⊤1:n+1[CS,ϵ]|K(sn+1,π(ao1:n+1))(λo,s.f(aos1:n+1,π(ao1:n+1),o,s))
−K(sn+1,π(ao1:n+1))(λo,s.f′(aos1:n+1,π(ao1:n+1),o,s))|
+ϵ(1−12n+1)supaos1:n+1|K(sn+1,π(ao1:n+1))(λo,s.f(aos1:n+1,π(ao1:n+1),o,s))
−K(sn+1,π(ao1:n+1))(λo,s.f′(aos1:n+1,π(ao1:n+1),o,s))|
By 1-Lipschitzness of K, we get an upper-bound of
≤supaos1:n+1∈C⊤1:n+1[CS,ϵ]|K(sn+1,π(ao1:n+1))(λo,s.f(aos1:n+1,π(ao1:n+1),o,s))
−K(sn+1,π(ao1:n+1))(λo,s.f′(aos1:n+1,π(ao1:n+1),o,s))|
+ϵ(1−12n+1)supaos1:n+1supo,s|f(aos1:n+1,π(ao1:n+1),o,s)−f′(aos1:n+1,π(ao1:n+1),o,s)|
And then this is upper-boundable by d(f,f′), so we get
≤supaos1:n+1∈C⊤1:n+1[CS,ϵ]|K(sn+1,π(ao1:n+1))(λo,s.f(aos1:n+1,π(ao1:n+1),o,s))
−K(sn+1,π(ao1:n+1))(λo,s.f′(aos1:n+1,π(ao1:n+1),o,s))|+ϵ(1−12n+1)d(f,f′)
Now, for that top term, we know that sn+1 comes from prSn+1(C⊤1:n+1[CS,ϵ]). We apply another Lemma 2 decomposition on K this time, with the compact set of interest being
Ξ(ϵ2n+2,π(ao1:n+1),prSn+1(C⊤1:n+1[CS,ϵ]))
which is a ϵ(12n+2)-almost-support for K(sn+1,π(ao1:n+1)). Pairing this with 1-Lipschitzness of K, we have an upper bound of
≤supaos1:n+1∈C⊤1:n+1[CS,ϵ](supo,s∈Ξ(ϵ2n+2,π(ao1:n+1),prSn+1(C⊤1:n+1[CS,ϵ]))
|f(aos1:n+1,π(ao1:n+1),o,s)−f′(aos1:n+1,π(ao1:n+1),o,s)|
+ϵ2n+2supo,s|f(aos1:n+1,π(ao1:n+1),o,s)−f′(aos1:n+1,π(ao1:n+1),o,s)|)
+ϵ(1−12n+1)d(f,f′)
That second term is upper-bounded by d(f,f′) no matter what, so we have
≤supaos1:n+1∈C⊤1:n+1[CS,ϵ](supo,s∈Ξ(ϵ2n+2,π(ao1:n+1),prSn+1(C⊤1:n+1[CS,ϵ]))
|f(aos1:n+1,π(ao1:n+1),o,s)−f′(aos1:n+1,π(ao1:n+1),o,s)|
+ϵ2n+2d(f,f′))+ϵ(1−12n+1)d(f,f′)
Pull out the constant and combine, to get
=supaos1:n+1∈C⊤1:n+1[CS,ϵ](supo,s∈Ξ(ϵ2n+2,π(ao1:n+1),prSn+1(C⊤1:n+1[CS,ϵ]))
|f(aos1:n+1,π(ao1:n+1),o,s)−f′(aos1:n+1,π(ao1:n+1),o,s)|)
+ϵ(1−12n+2)d(f,f′)
Now, f,f′ were assumed to agree on C⊤1:n+2[CS,ϵ], which factors as
C⊤1:n+1[CS,ϵ]×C⊤n+2[CS,ϵ]
and then expands as
C⊤1:n+1[CS,ϵ]×(A×⋃aΞ(ϵ2n+2,a,prSn+1(C⊤1:n+1[CS,ϵ])))
which all our aos1:n+1,π(ao1:n+1),o,s we're feeding in lie within. So those functions are identical for those inputs, and our upper-bound just reduces to
=ϵ(1−12n+2)d(f,f′)
Which witnesses that C⊤1:n+2[CS,ϵ] is a ϵ(1−12n+2)-almost-support for any Kπ:n+1(s0) (arbitrary π) as long as s0∈CS. So our induction step goes off without a hitch, and we now know that the C⊤i[CS,ϵ]sequence is suitable to get our modification of the Almost-Monotone-Lemma to work out. We have the result
∀π∈Π,n,m∈N,s0∈CS,f∈CB((A×O×S)ω):
Kπ:n+m(s0)(λaos1:n+m+1.infaosn+m+2:∞∈C⊤n+m+2:∞[CS,ϵ]f(aos1:n+m+1,aosn+m+2:∞))
≥Kπ:n(s0)(λaos1:n+1.infaosn+2:∞∈C⊤n+2:∞[CS,ϵ]f(aos1:n+1,aosn+2:∞))−4ϵ||f||
In particular, since any defining sequence for Kπ:∞(s0) has an actual limit, this can be trivially reshuffled to attain the result:
∀π∈Π,n∗∈N,s0∈CS,f∈CB((A×O×S)ω):
limn→∞Kπ:n(s0)(λaos1:n+1.infaosn+2:∞∈C⊤n+2:∞[CS,ϵ]f(aos1:n+1,aosn+2:∞))
≥Kπ:n∗(s0)(λaos1:n∗+1.infaosn∗+2:∞∈C⊤n∗+2:∞[CS,ϵ]f(aos1:n∗+1,aosn∗+2:∞))−4ϵ||f||
And remember, the sequence of compact sets only depends on CS and ϵ, not on the policy. This works for any policy. Now that we have this result in the bag, we can move on to phase 2.
T4.2.2.2 For phase 2, we'll show that π↦(ψ⋉Kπ:∞)(f) is lower-semicontinuous, and dear lord this is going to be hard and take a lot of integrals and limits to show. We start with
liminfm→∞(ψ⋉Kπm:∞)(f)
Which unpacks as
=liminfm→∞ψ(λs0.Kπm:∞(s0)(λaos1:∞.f(s0,aos1:∞)))
If this liminf is infinity, we're just done, infinity beats everything. So we'll be assuming that it settles down to a finite value. We can unpack the ψ as a challenge of picking points in Ψ, so we get
=liminfm→∞(inf(m,b)∈Ψ(m(λs0.Kπm:∞(s0)(λaos1:∞.f(s0,aos1:∞)))+b))
And, since we can pick amongst minimal points, we can rewrite this as
=liminfm→∞(inf(m,b)∈Ψmin(m(λs0.Kπm:∞(s0)(λaos1:∞.f(s0,aos1:∞)))+b))
In the [0,1] case, the set of minimal points is precompact. In the R case, it may not be precompact, but the only way it fails to be precompact is having b values running off to infinity. Said λs0.Kπm:∞(s0)(f) function can't be unbounded below, because it is above Kπm:∞(s0)(−||f||), and infinite semidirect products map constants to greater than their usual value, and Kπm:∞(s0) fulfills this condition. Thus, if the b value of a selected minimal point is far higher than the finite liminf value plus λ⊙||f||, the value said minimal point assigns to the function of interest is far higher than the true minimal value. So, when minimizing, we can restrict to a compact subset of Ψ to attain the min, call it CΨ. With this, we get
=liminfm→∞(inf(m,b)∈CΨ(m(λs0.Kπm:∞(s0)(λaos1:∞.f(s0,aos1:∞)))+b))
And then we unpack this as an integral.
=liminfm→∞(inf(m,b)∈CΨ(∫S(λs0.Kπm:∞(s0)(λaos1:∞.f(s0,aos1:∞)))dm+b))
Now, we can pass to two subsequences. First, we can pass to a subsequence of the m that actually limits to the liminf value. On this subsequence, the policies still limit to π∞. Also, for each member of this subsequence, we can find an true minimizing (m,b) pair for it since we're minimizing the lower-semicontinuous function
(m,b)↦∫S(λs0.Kπm:∞(s0)(λaos1:∞.f(s0,aos1:∞)))dm+b
over a compact set. Said function is lower-semicontinuous because
λs0.Kπm:∞(s0)(λaos1:∞.f(s0,aos1:∞))
is lower-semicontinuous, it's one of the niceness properties that the infinite infrakernel inherits.
Since we're selecting a-measures from the compact set CΨ, we can find a subsequence of our subsequence where the a-measures converge. Now we'll be using i as our index, and using it to index the (m,b) pairs. The limit a-measure is (m∞,b∞). So, we have a rewrite as
=limi→∞(∫S(λs0.Kπi:∞(s0)(λaos1:∞.f(s0,aos1:∞)))dmi+bi)
Further, all numbers in this limit are finite, because we're operating in the case where the liminf is finite, and i indexes a subsequence which actually limits to the liminf. For said limit, bi limits to b∞, so we can pull that out of the limit, to get
=limi→∞(∫S(λs0.Kπi:∞(s0)(λaos1:∞.f(s0,aos1:∞)))dmi)+b∞
At this point, since the mi limit to m∞, the set of the measure components {mi}i∈N⊔{∞} is compact in the space of measures on S. In particular, this implies that there is some compact set CS⊆S where all the mi only have ϵ or less measure outside said set (a necessary condition for precompactness of a set of measures)
Now, we split the measures mi into the component of measure on CS, and the component outside of it.
=limi→∞(∫S(λs0.Kπi:∞(s0)(λaos1:∞.f(s0,aos1:∞)))dmi,CS
+∫S(λs0.Kπi:∞(s0)(λaos1:∞.f(s0,aos1:∞)))dmi,¬CS)+b∞
Because CS is a compact subset of S, we can use it as a seed set for our C⊤n[CS,ϵ] sequence that attains almost-monotonicity. To economize on space for the lines to fit, we'll write
infaosn+2:∞∈C⊤n+2:∞[CS,ϵ]f(s0,aos1:n+1,aosn+2:∞)
as just
infC⊤n+2:∞[CS,ϵ]f(s0,aos1:n+1,aosn+2:∞)
We unpack the defining limit of the infinite semidirect product with our relevant sequence of compact sets to get
=limi→∞(∫S(λs0.limn→∞(Kπi:n(s0)(λaos1:n+1.infC⊤n+2:∞[CS,ϵ]f(s0,aos1:n+1,aosn+2:∞))))dmi,CS
+∫S(λs0.limn→∞(Kπm:n(s0)(λaos1:n+1.infC⊤n+2:∞[CS,ϵ]f(s0,aos1:n+1,aosn+2:∞))))dmi,¬CS)+b∞
We'll now be splitting up the second integral as two more integrals. Fix an arbitrary n∗.
=limi→∞(∫S(λs0.limn→∞(Kπi:n(s0)(λaos1:n+1.infC⊤n+2:∞[CS,ϵ]f(s0,aos1:n+1,aosn+2:∞))))dmi,CS
+∫S(λs0.Kπi:n∗(s0)(λaos1:n∗+1.infC⊤n∗+2:∞[CS,ϵ]f(s0,aos1:n∗+1,aosn∗+2:∞)))dmi,¬CS
+∫S(λs0.limn→∞(Kπi:n(s0)(λaos1:n+1.infC⊤n+2:∞[CS,ϵ]f(s0,aos1:n+1,aosn+2:∞)))
−Kπi:n∗(s0)(λaos1:n∗+1.infC⊤n∗+2:∞[CS,ϵ]f(s0,aos1:n∗+1,aosn∗+2:∞)))dmi,¬CS)+b∞
We're really in the weeds here, let's see if we can make any dent whatsoever. What we'll be showing is that, regardless of s0,
limn→∞(Kπi:n(s0)(λaos1:n+1.infC⊤n+2:∞[CS,ϵ]f(s0,aos1:n+1,aosn+2:∞)))
−Kπi:n∗(s0)(λaos1:n∗+1.infC⊤n∗+2:∞[CS,ϵ]f(s0,aos1:n∗+1,aosn∗+2:∞))≥−2||f||
It is exceptionally important to note here that we can't just throw our variant of the Almost-Monotone Lemma at this, because we don't have a promise that s0∈CS.
An easy way to show this is to show it for all n≥n∗. So let n≥n∗. Then, our proof goal is that
Kπi:n(s0)(λaos1:n+1.infC⊤n+2:∞[CS,ϵ]f(s0,aos1:n+1,aosn+2:∞))
−Kπi:n∗(s0)(λaos1:n∗+1.infC⊤n∗+2:∞[CS,ϵ]f(s0,aos1:n∗+1,aosn∗+2:∞))≥−2||f||
First, observe that the functions
λaos1:n+1.infC⊤n+2:∞[CS,ϵ]f(s0,aos1:n+1,aosn+2:∞)
and
λaos1:n+1.infC⊤n∗+2:∞[CS,ϵ]f(s0,aos1:n∗+1,aosn∗+2:∞)
can only differ on the same input by at most 2||f||. Since the finite stages are all 1-Lipschitz or less, we have
Kπi:n(s0)(λaos1:n+1.infC⊤n+2:∞[CS,ϵ]f(s0,aos1:n+1,aosn+2:∞))
−Kπi:n∗(s0)(λaos1:n∗+1.infC⊤n∗+2:∞[CS,ϵ]f(s0,aos1:n∗+1,aosn∗+2:∞))
≥Kπi:n(s0)(λaos1:n+1.infC⊤n∗+2:∞[CS,ϵ]f(s0,aos1:n∗+1,aosn∗+2:∞))−2||f||
−Kπi:n∗(s0)(λaos1:n∗+1.infC⊤n∗+2:∞[CS,ϵ]f(s0,aos1:n∗+1,aosn∗+2:∞))
And then, by monotonicity of the infrakernel stages for functions which cut off at a finite point (it was from Theorem 1), we have
≥Kπi:n∗(s0)(λaos1:n∗+1.infC⊤n∗+2:∞[CS,ϵ]f(s0,aos1:n∗+1,aosn∗+2:∞))−2||f||
−Kπi:n∗(s0)(λaos1:n∗+1.infC⊤n∗+2:∞[CS,ϵ]f(s0,aos1:n∗+1,aosn∗+2:∞))
=−2||f||
And we're done, we have shown that, regardless of s0, we have
limn→∞(Kπi:n(s0)(λaos1:n+1.infC⊤n+2:∞[CS,ϵ]f(s0,aos1:n+1,aosn+2:∞)))
−Kπi:n∗(s0)(λaos1:n∗+1.infC⊤n∗+2:∞[CS,ϵ]f(s0,aos1:n∗+1,aosn∗+2:∞))≥−2||f||
Now, applying this fact to our giant list of integrals, we can take
limi→∞(∫S(λs0.limn→∞(Kπi:n(s0)(λaos1:n+1.infC⊤n+2:∞[CS,ϵ]f(s0,aos1:n+1,aosn+2:∞))))dmi,CS
+∫S(λs0.Kπi:n∗(s0)(λaos1:n∗+1.infC⊤n∗+2:∞[CS,ϵ]f(s0,aos1:n∗+1,aosn∗+2:∞)))dmi,¬CS
+∫S(λs0.limn→∞(Kπi:n(s0)(λaos1:n+1.infC⊤n+2:∞[CS,ϵ]f(s0,aos1:n+1,aosn+2:∞)))
−Kπi:n∗(s0)(λaos1:n∗+1.infC⊤n∗+2:∞[CS,ϵ]f(s0,aos1:n∗+1,aosn∗+2:∞)))dmi,¬CS)+b∞
and apply our lower-bound to get
≥liminfi→∞(∫S(λs0.limn→∞(Kπi:n(s0)(λaos1:n+1.infC⊤n+2:∞[CS,ϵ]f(s0,aos1:n+1,aosn+2:∞))))dmi,CS
+∫S(λs0.Kπi:n∗(s0)(λaos1:n∗+1.infC⊤n∗+2:∞[CS,ϵ]f(s0,aos1:n∗+1,aosn∗+2:∞)))dmi,¬CS
+∫S(−2||f||)dmi,¬CS)+b∞
Also, since we picked CS so mi only has ϵ or less measure outside of it, regardless of i, we can impose a lower bound of:
≥liminfi→∞(∫S(λs0.limn→∞(Kπi:n(s0)(λaos1:n+1.infC⊤n+2:∞[CS,ϵ]f(s0,aos1:n+1,aosn+2:∞))))dmi,CS
+∫S(λs0.Kπi:n∗(s0)(λaos1:n∗+1.infC⊤n∗+2:∞[CS,ϵ]f(s0,aos1:n∗+1,aosn∗+2:∞)))dmi,¬CS)−2ϵ||f||+b∞
Now, to break down the integral on the top. Since the behavior of the interior of that integral only depends on s0∈CS, we can apply our good old Almost-Monotone-Lemma variation, and lower-bound the limit by the n∗ finite stage, minus 4ϵ||f|| (independently of the choice of πi) yielding
≥liminfi→∞(∫S(λs0.Kπi:n∗(s0)(λaos1:n∗+1.infC⊤n∗+2:∞[CS,ϵ]f(s0,aos1:n∗+1,aosn∗+2:∞))−4ϵ||f||)dmi,CS
+∫S(λs0.Kπi:n∗(s0)(λaos1:n∗+1.infC⊤n∗+2:∞[CS,ϵ]f(s0,aos1:n∗+1,aosn∗+2:∞)))dmi,¬CS)−2ϵ||f||+b∞
Now, we pull that constant piece out of the integral, and since both functions in the integral are now identical, we can stitch the measure components back together to get:
=liminfi→∞(∫S(λs0.Kπi:n∗(s0)(λaos1:n∗+1.infC⊤n∗+2:∞[CS,ϵ]f(s0,aos1:n∗+1,aosn∗+2:∞)))dmi
+∫S(−4ϵ||f||)dmi,CS)−2ϵ||f||+b∞
Now, mi,CS is a fragment of mi, which came from a minimal point in Ψ or a limit of such, so the maximal amount of measure present would be λ⊙. This lets us get a lower bound of
≥liminfi→∞∫S(λs0.Kπi:n∗(s0)(λaos1:n∗+1.infC⊤n∗+2:∞[CS,ϵ]f(s0,aos1:n∗+1,aosn∗+2:∞)))dmi
−4ϵλ⊙||f||−2ϵ||f||+b∞
Doing a little bit of regrouping, we have
=liminfi→∞∫S(λs0.Kπi:n∗(s0)(λaos1:n∗+1.infC⊤n∗+2:∞[CS,ϵ]f(s0,aos1:n∗+1,aosn∗+2:∞)))dmi
−2ϵ||f||(2λ⊙+1)+b∞
Recapping what we've done so far, we showed that
liminfm→∞(ψ⋉Kπm:∞)(f)
≥liminfi→∞∫S(λs0.Kπi:n∗(s0)(λaos1:n∗+1.infC⊤n∗+2:∞[CS,ϵ]f(s0,aos1:n∗+1,aosn∗+2:∞)))dmi
−2ϵ||f||(2λ⊙+1)+b∞
For all n∗. Regardless of n∗, this liminf value in i is finite because it's upper-bounded by the liminf in m, which is assumed to be finite (we disposed of the infinite case already). Because the sequence mi and the compact sets C⊤n[CS,ϵ] were fixed before we picked n∗, we can take a limit of the n∗ without worrying about it affecting any of the stuff on the inside of the integral, to get
liminfm→∞(ψ⋉Kπm:∞)(f)
≥liminfn→∞liminfi→∞∫S(λs0.Kπi:n(s0)(λaos1:n+1.infC⊤n+2:∞[CS,ϵ]f(s0,aos1:n+1,aosn+2:∞)))dmi
−2ϵ||f||(2λ⊙+1)+b∞
This liminf in i of the integral is finite for all n. At this point, we can notice something interesting. For fixed n, when we take the limit as i approaches infinity, eventually Kπi:n will become Kπ∞:n and never go back, because of convergence of the policies, and the fact that the infrakernel has a cutoff at time n, so the behavior of the policies before that time will stabilize on a partial policy. So, we can rewrite our lower bound as:
=liminfn→∞liminfi→∞∫S(λs0.Kπ∞:n(s0)(λaos1:n+1.infC⊤n+2:∞[CS,ϵ]f(s0,aos1:n+1,aosn+2:∞)))dmi
−2ϵ||f||(2λ⊙+1)+b∞
Again, for all n, this liminf in i quantity is finite. Now, the function
λs0.Kπ∞:n(s0)(λaos1:n+1.infCn+2:∞[CS,ϵ]f(s0,aos1:n+1,aosn+2:∞))
is lower-semicontinuous in s0 (as all the finite stages of the infinite infrakernel have that niceness property). And m↦m(f∗) is lower-semicontinuous when f∗ is. Also, by now, the only thing i is controlling is the measure term. Therefore, that liminf of the integral (in i) is greater than what you'd get if you just passed to m∞. So, we get
≥liminfn→∞∫S(λs0.Kπ∞:n(s0)(λaos1:n+1.infC⊤n+2:∞[CS,ϵ]f(s0,aos1:n+1,aosn+2:∞)))dm∞
+b∞−2ϵ||f||(2λ⊙+1)
Again, for all n, the integral is finite. Now, since the functions
λs0.Kπ∞:n(s0)(λaos1:n+1.infC⊤n+2:∞[CS,ϵ]f(s0,aos1:n+1,aosn+2:∞))
converge pointwise in n to the function
λs0.Kπ∞:∞(s0)(λaos1:∞.f(s0,aos1:∞))
(any sequence of compact sets works to define the infinite semidirect product) and for all n, the integrals are finite (this is why we kept restating it over and over!), we can apply Fatou's Lemma to move the liminf into the integral, to get
≥∫S(λs0.liminfn→∞Kπ∞:n(s0)(λaos1:n+1.infC⊤n+2:∞[CS,ϵ]f(s0,aos1:n+1,aosn+2:∞)))dm∞
+b∞−2ϵ||f||(2λ⊙+1)
And since we have pointwise limiting, this is
=∫S(λs0.limn→∞Kπ∞:n(s0)(λaos1:n+1.infC⊤n+2:∞[CS,ϵ]f(s0,aos1:n+1,aosn+2:∞)))dm∞
+b∞−2ϵ||f||(2λ⊙+1)
and then, packing it up,
=∫S(λs0.Kπ∞:∞(s0)(λaos1:∞.f(s0,aos1:∞)))dm∞+b∞−2ϵ||f||(2λ⊙+1)
And then, pack it up as
=m∞(λs0.Kπ∞:∞(s0)(λaos1:∞.f(s0,aos1:∞)))+b∞−2ϵ||f||(2λ⊙+1)
and one last step. Since (m∞,b∞) lies in Ψ, we can get
≥inf(m,b)∈Ψ(m(λs0.Kπ∞:∞(s0)(λaos1:∞.f(s0,aos1:∞)))+b)−2ϵ||f||(2λ⊙+1)
which reexpresses as
=ψ(λs0.Kπ∞:∞(s0)(λaos1:∞.f(s0,aos1:∞)))−2ϵ||f||(2λ⊙+1)
=(ψ⋉Kπ∞:∞)(f)−2ϵ||f||(2λ⊙+1)
Our net result is
liminfm→∞(ψ⋉Kπm:∞)(f)≥(ψ⋉Kπ∞:∞)(f)−2ϵ||f||(2λ⊙+1)
But because ϵ was arbitrary and ||f|| and λ⊙ are finite, we have our desired net result of
liminfm→∞(ψ⋉Kπm:∞)(f)≥(ψ⋉Kπ∞:∞)(f)
And lower-semicontinuity is shown for π↦(ψ⋉Kπ:∞)(f)
T4.2.2.3 Now for our phase 3, getting lower-semicontinuity of π↦Θ(π)(U). Let πm limit to π∞. We have
liminfm→∞Θ(πm)(U)
and then this unpacks as
=liminfm→∞pr(A×O)ω∗(ψ⋉Kπm:∞)(U)
Reexpressing the projection, we have:
=liminfm→∞(ψ⋉Kπm:∞)(λs0,aos1:∞.U(ao1:∞))
And then, we can apply lower-semicontinuity of π↦(ψ⋉Kπ:∞)(f) which we've shown in phase 2, to get
≥(ψ⋉Kπ∞:∞)(λs0,aos1:∞.U(ao1:∞))
Which then packs up as
=pr(A×O)ω∗(ψ⋉Kπ∞:∞)(U)
and then
=Θ(π∞)(U)
and we have lower-semicontinuity for Θ! Finally!
T4.2.3 Time to return to easier fare. We need that Θ(π) is supported entirely on histories compatible with π. To do this, we proceed in three steps.
First, we do an induction proof that, for any given π and s0, Kπ:n(s0) is supported on the subset of (A×O×S)n+1 where the actions and observations so far are compatible with π.
Second, we use our first result to show that for any two continuous bounded utility functions U,U′ which only depend on the action/observation sequence, and agree on all histories compatible with π,
Kπ:∞(s0)(λaos1:∞.U(ao1:∞))=Kπ:∞(s0)(λaos1:∞.U′(ao1:∞))
Finally, we use this result to show that if U,U′ agree on all histories compatible with π, Θ(π)(U)=Θ(π)(U′), showing that Θ(π) is supported on histories compatible with π.
T4.2.3.1 Let's start with the induction proof that Kπ:n(s0) has the subset of (A×O×S)n+1 where the history of actions and observations is compatible with π as a support. The way to do that is to pick two functions that are identical on said subset, and show they have equal expectation values. For the base case of the induction proof, fix an f and f′ which are identical on {π()}×O×S (as that's the subset of A×O×S which is compatible with π), and observe:
Kπ:0(s0)(f)=Kπ0(s0)(λa,o,s.f(a,o,s))
and then by how Kπ0(s0) is defined, we have
=(δπ()⋉(λa.K(s0,a)))(λa′,o,s.f(a′,o,s))
Which, by unpacking the semidirect product and subsituting the dirac-delta value in, we have
=K(s0,π())(λo,s.f(π(),o,s))
and then since f=f′ when the initial action equals π(), we have
=K(s0,π())(λo,s.f′(π(),o,s))
and then this just wraps back up by our sequence of rewrite steps in reverse as
=Kπ:0(s0)(f′)
Now that we have the base case out of the way, we'll attempt to show the induction step. Let f and f′ be identical on the subset of (A×O×S)n+2 which is compatible with π, and we'll show they have equal expectation value. Our job is to unpack
|Kπ:n+1(s0)(f)−Kπ:n+1(s0)(f′)|
this unpacks, by how sequences of semidirect products are defined, as
=|Kπ:n(s0)(λaos1:n+1.Kπn+1(s0,aos1:n+1)(λa,o,s.f(aos1:n+1,a,o,s)))
−Kπ:n(s0)(λaos1:n+1.Kπn+1(s0,aos1:n+1)(λa,o,s.f′(aos1:n+1,a,o,s)))|
Now, since, by induction assumption, Kπ:n(s0) is supported on the subset of (A×O×S)n+1 where the history is compatible with π, we just need to show that the inner functions are identical on that subset to get that the difference is 0, so our proof goal is now
∀aos1:n+1∼π:Kπn+1(s0,aos1:n+1)(λa,o,s.f(aos1:n+1,a,o,s))
=Kπn+1(s0,aos1:n+1)(λa,o,s.f′(aos1:n+1,a,o,s))
We now unpack Kπn+1(s0,aos1:n+1) as δπ(ao1:n+1)⋉(λa.K(sn+1,a)) to get our new proof goal of:
∀aos1:n+1∼π:(δπ(ao1:n+1)⋉(λa.K(sn+1,a)))(λa′,o,s.f(aos1:n+1,a′,o,s))
=(δπ(ao1:n+1)⋉(λa.K(sn+1,a)))(λa′,o,s.f′(aos1:n+1,a′,o,s))
We unpack the semidirect product and substitute the dirac-delta value in to get a proof goal of
∀aos1:n+1∼π:K(sn+1,π(ao1:n+1))(λo,s.f(aos1:n+1,π(ao1:n+1),o,s))
=K(sn+1,π(ao1:n+1))(λo,s.f′(aos1:n+1,π(ao1:n+1),o,s))
at this point, we realize that if aos1:n+1 is compatible with π, then regardless of o and s, aos1:n+1,π(ao1:n+1),o,s is compatible with π, so the functions
λo,s.f(aos1:n+1,π(ao1:n+1),o,s)
and
λo,s.f′(aos1:n+1,π(ao1:n+1),o,s)
are equal, and we get our desired proof goal. So the induction goes through, and the subset of (A×O×S)n+1 where the history is compatible with π will always be a support of Kπ:n for all n.
T4.2.3.2 Now that's in place, we'll show that any two continuous bounded functions U and U′ which only depend on the action/observation sequence and are identical on histories compatible with π will be assigned equal value by Kπ:∞(s0). Assume U and U′ only depend on action/observation sequences and are identical on histories compatible with π. We have a desired proof target of
Kπ:∞(s0)(λaos1:∞.U(ao1:∞))=Kπ:∞(s0)(λaos1:∞.U′(ao1:∞))
and, unpacking the limit, it turns into
limn→∞Kπ:n(s0)(λaos1:n+1.infaon+2:∞U(ao1:n+1,aon+2:∞))
=limn→∞Kπ:n(s0)(λaos1:n+1.infaon+2:∞U′(ao1:n+1,aon+2:∞))
The limits exist, so we'd be able to show that if we were able to show that
limsupn→∞|Kπ:n(s0)(λaos1:n+1.infaon+2:∞U(ao1:n+1,aon+2:∞))
−Kπ:n(s0)(λaos1:n+1.infaon+2:∞U′(ao1:n+1,aon+2:∞))|=0
Now, we can use the Lemma 2 from LBIT decomposition on Kπ:n(s0), since we know that the subset of (A×O×S)n+1 where the history is compatible with π is a support of Kπ:n(s0), to get an upper bound on
limsupn→∞|Kπ:n(s0)(λaos1:n+1.infaon+2:∞U(ao1:n+1,aon+2:∞))
−Kπ:n(s0)(λaos1:n+1.infaon+2:∞U′(ao1:n+1,aon+2:∞))|
of
≤limsupn→∞(1⋅supaos1:n+1∼π|infaon+2:∞U(ao1:n+1,aon+2:∞)
−infaon+2:∞U′(ao1:n+1,aon+2:∞)|+0⋅d(U,U′))
For the first part, the 1 is there because all the infrakernels are 1-Lipschitz, and we're assessing distance of the inner functions on the support. For the second part, it's there because we don't care how different the two functions are outside the support, they must be assigned equal value. So anyways, this upper bound reduces to
=limsupn→∞supaos1:n+1∼π|infaon+2:∞U(ao1:n+1,aon+2:∞)−infaon+2:∞U′(ao1:n+1,aon+2:∞)|
Now, for all choices of n and aos1:n+1∼π, fix an arbitrary extension ao∼πn+2:∞ which is compatible with π (which can always be done). Then use the triangle inequality twice to get an upper-bound of
≤limsupn→∞supaos1:n+1∼π(|infaon+2:∞U(ao1:n+1,aon+2:∞)−U(ao1:n+1,ao∼πn+2:∞)|
+|U(ao1:n+1,ao∼πn+2:∞)−U′(ao1:n+1,ao∼πn+2:∞)|
+|U′(ao1:n+1,ao∼πn+2:∞)−infaon+2:∞U′(ao1:n+1,aon+2:∞)|)
We can observe that, since U and U′ are equal on all histories compatible with π, and ao∼πn+2:∞ was picked specifically to have ao1:n+1,ao∼πn+2:∞∼π, that second absolute value term must always be 0, to get
=limsupn→∞supaos1:n+1∼π(|infaon+2:∞U(ao1:n+1,aon+2:∞)−U(ao1:n+1,ao∼πn+2:∞)|
+|U′(ao1:n+1,ao∼πn+2:∞)−infaon+2:∞U′(ao1:n+1,aon+2:∞)|)
and then distribute the sups and limsups into the addition to get
≤limsupn→∞(supaos1:n+1∼π|infaon+2:∞U(ao1:n+1,aon+2:∞)−U(ao1:n+1,ao∼πn+2:∞)|)
+limsupn→∞(supaos1:n+1∼π|U′(ao1:n+1,ao∼πn+2:∞)−infaon+2:∞U′(ao1:n+1,aon+2:∞)|)
Since this quantity is an upper bound on the
limsupn→∞|Kπ:n(s0)(λaos1:n+1.infaon+2:∞U(ao1:n+1,aon+2:∞))
−Kπ:n(s0)(λaos1:n+1.infaon+2:∞U′(ao1:n+1,aon+2:∞))|
quantity we wanted to show equals 0 to hit our proof target, we just need to show that it equals 0.
Here's what we do now. Since U and U′ are defined over (A×O)ω, a compact space, they must both be uniformly continuous. You can take any ϵ, and find some δ where two histories that only differ by δ are assigned values ϵ or less apart. And for any δ, there's some n where any two histories that agree on the first n steps only differ by δ. So, in the limit, both of these terms become 0, and our result that
Kπ:∞(s0)(λaos1:∞.U(ao1:∞))=Kπ:∞(s0)(λaos1:∞.U′(ao1:∞))
for any s0 and U,U′ which agree on all histories compatible with π follows since we hit our proof target.
T4.2.3.3 Finally, it's time to apply this result to get that Θ(π) is supported only on histories compatible with π. Fix any π, and U,U′ which are identical on histories compatible with π. We can start unpacking.
|Θ(π)(U)−Θ(π)(U′)|=|pr(A×O)ω∗(ψ⋉Kπ:∞)(U)−pr(A×O)ω∗(ψ⋉Kπ:∞)(U′)|
=|ψ(λs0.Kπ:∞(s0)(λaos1:∞.U(ao1:∞)))−ψ(λs0.Kπ:∞(s0)(λaos1:∞.U′(ao1:∞)))|
And then, for all s0,
Kπ:∞(s0)(λaos1:∞.U(ao1:∞))=Kπ:∞(s0)(λaos1:∞.U′(ao1:∞))
as we've already shown, since U and U′ agree on histories compatible with π. Thus the functions in ψ are identical, and we have
|Θ(π)(U)−Θ(π)(U′)|=0
and so we're done.
T4.2.4 The next belief function condition to check is agreement on max value. For the [0,1] type signature, fixing any π and π′
|Θ(π)(1)−Θ(π′)(1)|=|pr(A×O)ω∗(ψ⋉Kπ:∞)(1)−pr(A×O)ω∗(ψ⋉Kπ′:∞)(1)|
=|ψ(λs0.Kπ:∞(s0)(1))−ψ(λs0.Kπ′:∞(s0)(1))|
And for the [0,1] type signature, any infinite semidirect product fulfilling the niceness conditions (which is the case) must map 1 to 1, so this turns into
=|ψ(1)−ψ(1)|=0
And we're done, and in fact get the stronger result that Θ(π)(1)=1 regardless of π because ψ(1)=1 because we assumed ψ was an infradistribution. For the R type signature, fixing any π and π′, we have
|Θ(π)(∞)−Θ(π′)(∞)|=|pr(A×O)ω∗(ψ⋉Kπ:∞)(∞)−pr(A×O)ω∗(ψ⋉Kπ′:∞)(∞)|
=|ψ(λs0.Kπ:∞(s0)(∞))−ψ(λs0.Kπ′:∞(s0)(∞))|
Regardless of type signature, an infinite semidirect product fulfilling the niceness conditions (which we're assuming) must map constants to an equal or greater constant, and infinity can be regarded as the supremum of putting constants in, so we get
=|ψ(∞)−ψ(∞)|=0
Importantly, this does not mean that Θ(π)(∞) must be ∞.
T4.2.5 Now for the very last condition, checking that Θ is pseudocausal. This takes a whole lot of work (though less than lower-semicontinuity), and we'll do it by repeatedly showing equivalence of our desired result to some slightly simpler result, until we ground out in an induction proof we clean up in the same way, which splits into a base case and induction step.
T4.2.5.1 We want to show that
∀U:u1∼π′(Θ(π))(U)≥Θ(π′)(U)
We can unpack Θ(π) and Θ(π′), to get the equivalent proof target of:
u1∼π′(pr(A×O)ω∗(ψ⋉Kπ:∞))(U)≥pr(A×O)ω∗(ψ⋉Kπ′:∞)(U)
First, we expand the update, to get the equivalent proof target of:
pr(A×O)ω∗(ψ⋉Kπ:∞)(λao:∞.1ao1:∞∼π′U(ao1:∞)+1ao1:∞≁π′)
≥pr(A×O)ω∗(ψ⋉Kπ′:∞)(λao1:∞.U(ao1:∞))
Then we rewrite the projections to get the equivalent proof target of
(ψ⋉Kπ:∞)(λs0,aos1:∞.1ao1:∞∼π′U(ao1:∞)+1ao1:∞≁π′)≥(ψ⋉Kπ′:∞)(λs0,aos1:∞.U(ao1:∞))
Then we rewrite the semidirect products to get the equivalent proof target of
ψ(λs0.Kπ:∞(s0)(λaos:∞.1ao1:∞∼π′U(ao1:∞)+1ao1:∞≁π′))≥ψ(λs0.Kπ′:∞(s0)(λaos1:∞.U(ao1:∞)))
By monotonicity for ψ, we'd be able to prove this if, for any s0 initial state, we had
Kπ:∞(s0)(λaos1:∞.1ao1:∞∼π′U(ao1:∞)+1ao1:∞≁π′)≥Kπ′:∞(s0)(λaos1:∞.U(ao1:∞))
So we'll now try to prove this, let s0 be arbitrary. We can explicitly unpack the infinite semidirect product, and get a proof target of
limn→∞Kπ:n(s0)(λaos1:n+1.infaon+2:∞(1ao1:n+1,aon+2:∞∼π′U(ao1:n+1,aon+2:∞)+1ao1:n+1,aon+2:∞≁π′))
≥limn→∞Kπ′:n(s0)(λaos1:n+1.infaon+2:∞U(ao1:n+1,aon+2:∞))
We'd be able to prove this if we could prove the inequality for all the n individually, as then the inequality would carry over to the limit. So, our new proof target is, for arbitrary n,
Kπ:n(s0)(λaos1:n+1.infaon+2:∞(1ao1:n+1,aon+2:∞∼π′U(ao1:n+1,aon+2:∞)+1ao1:n+1,aon+2:∞≁π′))
≥Kπ′:n(s0)(λaos1:n+1.infaon+2:∞U(ao1:n+1,aon+2:∞))
We can observe something interesting here for the update. if ao1:n+1 is compatible with π′, then the inf can be modeled as always picking some continuation compatible with π′, because doing otherwise would make the function equal 1 (or infinity). In such a case, the inner function of the top would turn into infaon+2:∞U(ao1:n+1,aon+2:∞). If ao1:n+1 is incompatible with π′, then it doesn't matter the extension, any extension will always be incompatible with π′, and return 1 (or infinity). So, this can be rephrased as
Kπ:n(s0)(λaos1:n+1.1ao1:n+1∼π′infaon+2:∞U(ao1:n+1,aon+2:∞)+1ao1:n+1≁π′)
≥Kπ′:n(s0)(λaos1:n+1.infaon+2:∞U(ao1:n+1,aon+2:∞))
Now, we can pack up the fact that we're updating on "partial history is compatible with π′" to get an equivalent proof target of
u1∼π′(Kπ:n(s0))(λaos1:n+1.infaon+2:∞U(ao1:n+1,aon+2:∞))
≥Kπ′:n(s0)(λaos1:n+1.infaon+2:∞U(ao1:n+1,aon+2:∞))
Now, since these inner functions are identical, we'd be able to prove this proof target if we proved the following for arbitrary functions fn+1:(A×O×S)n+1→[0,1] (or R), and arbitrary n,π′,π,s0 as before.
u1∼π′(Kπ:n(s0))(fn+1)≥Kπ′:n(s0)(fn+1)
So, since we're trying to prove this result for arbitrary n, the obvious thing to do is an induction proof. If we can show this by induction, we hit our proof target and we're done.
T4.2.5.2 For our base case, we'd want to show
u1∼π′(Kπ:0(s0))(f1)≥Kπ′:0(s0)(f1)
We reexpress the update to get a proof target of
Kπ:0(s0)(λa,o,s.1a,o∼π′f1(a,o,s)+1a,o≁π′)≥Kπ′:0(s0)(λa,o,s.f1(a,o,s))
Now, the only way for a,o to be incompatible with π′ is if a≠π′(). Doing that reexpression, we get
Kπ:0(s0)(λa,o,s.1a=π′()f1(a,o,s)+1a≠π′())≥Kπ′:0(s0)(λa,o,s.f1(a,o,s))
We recall that for the iterated semidirect product, the :0 thing is just Kπ0, and Kπ0(s0)=δπ()⋉(λa.K(s0,a)). Making these substitutions, we get an equivalent proof target of
(δπ()⋉(λa.K(s0,a)))(λa,o,s.1a=π′()f1(a,o,s)+1a≠π′())
≥(δπ′()⋉(λa.K(s0,a)))(λa,o,s.f1(a,o,s))
Reexpressing the semidirect product and subsituting the dirac-delta value in, we get the equivalent
K(s0,π())(λo,s.1π()=π′()f1(π(),o,s)+1π()≠π′())≥K(s0,π′())(λo,s.f1(π′(),o,s))
We have two possible cases. In case 1, π()=π′(), and then our proof target turns into
K(s0,π′())(λo,s.f1(π′(),o,s))≥K(s0,π′())(λo,s.f1(π′(),o,s))
Which is just true, they're equal. In case 2, π()≠π′(), and we then get the proof target of
K(s0,π())(1)≥K(s0,π′())(λo,s.f1(π′(),o,s))
Which is true because
K(s0,π())(1)=1≥K(s0,π′())(λo,s.f1(π′(),o,s))
Because of mapping 1 to 1. Or, for the R type signature, we'd have
K(s0,π())(∞)=∞≥K(s0,π′())(λo,s.f1(π′(),o,s))
By Constants Increase for K. So, we've finally finished up our proof target, setting up the base case for the induction.
T4.2.5.3 Now for the induction step, which is all we need to finish up the proof. We need to somehow show that
u1∼π′(Kπ:n+1(s0))(fn+2)≥Kπ′:n+1(s0)(fn+2)
For all functions fn+2:(A×O×S)n+2→[0,1], assuming that
u1∼π′(Kπ:n(s0))(fn+1)≥Kπ′:n(s0)(fn+1)
holds for all functions fn+1:(A×O×S)n+1→[0,1]. Let's begin. Our proof target is
u1∼π′(Kπ:n+1(s0))(fn+2)≥Kπ′:n+1(s0)(fn+2)
We reexpress the update, to get the equivalent statement
Kπ:n+1(s0)(λaos1:n+2.1ao1:n+2∼π′fn+2(aos1:n+2)+1ao1:n+2≁π′)
≥Kπ′:n+1(s0)(λaos1:n+2.fn+2(aos1:n+2))
We unpack the definition of Kπ:n+1(s0) to get
Kπ:n(s0)(λaos1:n+1.Kπn+1(s0,aos1:n+1)
(λa,o,s.1ao1:n+1,a,o∼π′fn+2(aos1:n+1,a,o,s)+1ao1:n+1,a,o≁π′))
≥Kπ′:n(s0)(λaos1:n+1.Kπ′n+1(s0,aos:n+1)(λa,o,s.fn+2(aos1:n+1,a,o,s)))
And then substitute in the definition of Kπn+1 to get
Kπ:n(s0)(λaos1:n+1.(δπ(ao1:n+1)⋉(λa.K(sn+1,a)))
(λa,o,s.1ao1:n+1,a,o∼π′fn+2(aos1:n+1,a,o,s)+1ao1:n+1,a,o≁π′))
≥Kπ′:n(s0)(λaos1:n+1.(δπ′(ao1:n+1)⋉(λa.K(sn+1,a)))(λa,o,s.fn+2(aos1:n+1,a,o,s)))
And unpack the semidirect product and substitute in the dirac-delta value to get the proof target of
Kπ:n(s0)(λaos1:n+1.K(sn+1,π(ao1:n+1))
(λo,s.1ao1:n+1,π(ao1:n+1),o∼π′fn+2(aos1:n+1,π(ao1:n+1),o,s)+1ao1:n+1,π(ao1:n+1),o≁π′))
≥Kπ′:n(s0)(λaos1:n+1.K(sn+1,π′(ao1:n+1))(λo,s.fn+2(aos1:n+1,π′(ao1:n+1),o,s)))
Now, looking at that gigantic first term... if ao1:n+1 is incompatible with π′, then the really big function is just going to turn into 1 (or infinity), which is assigned a value of 1 (or infinity) by K, from the usual constants increase/1-normalization argument.
If ao1:n+1∼π′, then we're back in the usual business except we don't need to worry about incompatibilities of the history with π′ within ao1:n+1, we only need to worry about those incompatiblities for the final action, π(ao1:n+1). So, we can rewrite that first term with indicator functions.
Kπ:n(s0)(λaos1:n+1.1ao1:n+1∼π′(K(sn+1,π(ao:n+1))
(λo,s.1π(ao:n+1)=π′(ao:n+1)fn+2(aos1:n+1,π(ao1:n+1),o,s)+1π(ao1:n+1)≠π′(ao1:n+1)))
+1ao1:n+1≁π′)
≥Kπ′:n(s0)(λaos1:n+1.K(sn+1,π′(ao1:n+1))(λo,s.fn+2(aos1:n+1,π′(ao:n+1),o,s)))
And then rewrite this as just an update on Kπ:n(s0), getting the equivalent proof goal
u1∼π′(Kπ:n(s0))(λaos1:n+1.K(sn+1,π(ao:n+1))
(λo,s.1π(ao:n+1)=π′(ao:n+1)fn+2(aos1:n+1,π(ao1:n+1),o,s)+1π(ao1:n+1)≠π′(ao1:n+1)))
≥Kπ′:n(s0)(λaos1:n+1.K(sn+1,π′(ao1:n+1))(λo,s.fn+2(aos1:n+1,π′(ao:n+1),o,s)))
Now, we can define our fn+1 as
λaos1:n+1.K(sn+1,π(ao:n+1))
(λo,s.1π(ao:n+1)=π′(ao:n+1)fn+2(aos1:n+1,π(ao1:n+1),o,s)+1π(ao1:n+1)≠π′(ao1:n+1))
To derive, from our induction assumption, that
u1∼π′(Kπ:n(s0))(λaos1:n+1.K(sn+1,π(ao:n+1))
(λo,s.1π(ao:n+1)=π′(ao:n+1)fn+2(aos1:n+1,π(ao1:n+1),o,s)+1π(ao1:n+1)≠π′(ao1:n+1)))
=u1∼π′(Kπ:n(s0))(fn+1)≥Kπ′:n(s0)(fn+1)
=Kπ′:n(s0)(λaos1:n+1.K(sn+1,π(ao:n+1))
(λo,s.1π(ao:n+1)=π′(ao:n+1)fn+2(aos1:n+1,π(ao1:n+1),o,s)+1π(ao1:n+1)≠π′(ao1:n+1)))
So, as long as we're able to show that this quantity still lies above
Kπ′:n(s0)(λaos1:n+1.K(sn+1,π′(ao1:n+1))(λo,s.fn+2(aos1:n+1,π′(ao:n+1),o,s)))
,we can just put the two inequalities together, and we'll attain our proof target and the induction will go through and we'll be done. So, our new proof target is
Kπ′:n(s0)(λaos1:n+1.K(sn+1,π(ao:n+1))
(λo,s.1π(ao:n+1)=π′(ao:n+1)fn+2(aos1:n+1,π(ao1:n+1),o,s)+1π(ao1:n+1)≠π′(ao1:n+1)))
≥Kπ′:n(s0)(λaos1:n+1.K(sn+1,π′(ao1:n+1))(λo,s.fn+2(aos1:n+1,π′(ao:n+1),o,s)))
Splitting into two cases, it could be that π(ao1:n+1)≠π′(ao1:n+1). In this case that big inner function in the attempted upper bound turns into just 1, and then K(sn+1,π(ao1:n+1))(1)=1. Or, for the R type signature, we'd get K(sn+1,π(ao1:n+1))(∞)=∞.
In the second case where π(ao1:n+1)=π′(ao1:n+1), we'd get
K(sn+1,π′(ao1:n+1))(λo,s.fn+2(aos1:n+1,π′(ao1:n+1),o,s))
as the value of the function inside Kπ′:n(s0), due to the indicator function working out appropriately, and π′ matching up with what π does. This ability to break stuff up into an indicator function yields a new equivalent proof target of
Kπ′:n(s0)(λaos1:n+1.
1π(ao1:n+1)=π′(ao1:n+1)K(sn+1,π′(ao1:n+1))(λo,s.fn+2(aos1:n+1,π′(ao1:n+1),o,s))
+1π(ao1:n+1)≠π′(ao1:n+1))
≥Kπ′:n(s0)(λaos1:n+1.K(sn+1,π′(ao1:n+1))(λo,s.fn+2(aos1:n+1,π′(ao1:n+1),o,s)))
We would be able to show this via inframeasure monotonicity (as the starting inframeasures are equal now) if, for all aos1:n+1, the first function was larger than the second one. Let aos1:n+1 be arbitrary, and our proof goal is now
1π(ao1:n+1)=π′(ao1:n+1)K(sn+1,π′(ao1:n+1))(λo,s.fn+2(aos1:n+1,π′(ao1:n+1),o,s))
+1π(ao1:n+1)≠π′(ao1:n+1)
≥K(sn+1,π′(ao1:n+1))(λo,s.fn+2(aos1:n+1,π′(ao1:n+1),o,s))
In the first subcase, where π(ao1:n+1)≠π′(ao1:n+1), because of 1-normalization (or constants moving up), our goal would be that
1≥K(sn+1,π′(ao1:n+1))(λo,s.fn+2(aos1:n+1,π′(ao1:n+1),o,s))
(or infinity). Infinity beats everything, and the inframeasures in the [0,1] type signature can report a maximum value of 1, so this proof goal is just obviously true.
In the second subcase, where π(ao1:n+1)=π′(ao1:n+1), our proof goal would turn into:
K(sn+1,π′(ao1:n+1))(λo,s.fn+2(aos1:n+1,π′(ao1:n+1),o,s))
≥K(sn+1,π′(ao1:n+1))(λo,s.fn+2(aos1:n+1,π′(ao1:n+1),o,s))
But they're equal. We hit our proof goal, the induction goes through, and this hits the proof goal for establishing pseudocausality, which was the last condition we needed to show our result. We're done!