3

19 min read

3

Theorem 4: Pseudocausal IPOMDP's: If an infra-POMDP has a transition kernel  which fulfills the niceness conditions, and there's a starting infradistribution , unrolling it into an infrakernel via  has the infrakernel  being pseudocausal and fulfilling all belief function conditions except for normalization.

This takes a whole lot of work to show. The proof breaks down into two phases. The first is showing that the  fulfill the niceness conditions (which splits into four subphases of showing 1-Lipschitzness, lower-semicontinuity, compact-shared CAS, and moving constants up/mapping 1 to 1). This is the easy part to do. The second phase is much longer. It splits into five pieces, where we show our list of conditions for .  Our list of conditions to show is: 

1:The  have a uniform upper bound on their Lipschitz constants.

2:  is lower-semicontinuous for all 

3:  is supported entirely on histories compatible with .

4: All the  agree on the value of 1 (or infinity).

5:  is pseudocausal.

Parts 1 and 4 can be done with little trouble, but parts 2, 3, and 5 split into additional pieces. To show pseudocausality, ie, condition 5, we have to start with our proof goal and keep rewriting it into an easier form, until we get something we can knock out with an induction proof, which splits into a base case and an induction step. To show sensible supports, ie, part 3, we have to split it into three parts, there's one induction proof for the finite , showing sensible supports for , and then finally extending this to the result we actually want, which is fairly easy.

And then there's lower-semicontinuity, part 2, which is an absolute nightmare. This splits into three parts. One is explicitly constructing a sequence of compact sets to use in the Almost-Monotone Lemma for any policy whatsoever, and then there's trying to show that  is lower-semimicontinuous which requires a massive amount of messing around with integrals and limits, and then it's pretty easy to show lower-semicontinuity for  from there.

T4.1 Our first order of business is showing the five niceness conditions for all the  infrakernels, so we can take the infinite semidirect product and show that  inherits the five niceness conditions (from the proof of Theorem 1).

Remember that  is defined as

Obviously it produces inframeasures, since  does. 

T4.1.1 For lower-semicontinuity in the input, we have

and then, as  limits to , we have  forever after some finite , because  is a finite space. So, it turns into

and then, by lower-semicontinuity for K (one of the niceness conditions)

and then we pack this back up as

and lower-semicontinuity has been shown.

T4.1.2 For 1-Lipschitzness, observe that  is 1-Lipschitz because  is 1-Lipschitz, and this is the semidirect product of a 1-Lipschitz (all probability distributions are 1-Lipschitz, and this is a dirac-delta distribution) infradistribution with a 1-Lipschitz infrakernel.

T4.1.3 For compact-shared compact-almost-support, observe that  is a continuous function, so any compact subset of  (input to ) maps to a compact subset of  (corresponding inputs for ). Then, we can apply compact-shared compact-almost-support for  (one of the niceness conditions) to get a compact almost-support on , and take the product of that subset with all of  itself, to get a shared-compact almost-support for .

T4.1.4 For mapping constants to above constants, we have

By increasing constants for  (one of the niceness conditions). This same proof works with equality when , because of the "map 1 to 1" condition on the  type signature. So all the  have the niceness properties, and we can form the infinite semidirect product at all, and the infinite semidirect product inherits these niceness conditions.

T4.2 Now we can get working on showing the belief function conditions.

T4.2.1 For showing a bounded Lipschitz constant, we have

and then since  is an infradistribution, it must have a finite Lipschitz constant, . So we then get

and then, since all the  are 1-Lipschitz (infinite semidirect product has niceness conditions), we get

and we're done, we showed a Lipschitz bound on the  that is uniform in .

T4.2.2 Now for lower-semicontinuity for , which is the extremely hard one.

The way this proof will work is that we'll first show a variant of the Almost-Monotone Lemma that works for all the  infrakernels simultaneously. Then, using that, we'll show that

 is lower-semicontinuous. And then, it's pretty easy to wrap up and get that

 is lower-semicontinuous, from that.

T4.2.2.1 Anyways, our first goal is a variant of the Almost-Monotone Lemma. The particular variant we're searching for, specialized to our particular case, is:



We let our  be an arbitrary compact subset of states, and  be arbitrary.

Now, let's assume there's a sequence of compact sets C where, regardless of n or  is a shared -almost-support for all the  inframeasures (where ).

Then our proof goal becomes



The proof of the Almost-Monotone Lemma goes through perfectly fine, the key part where our choice of compact sets matters is in the ability to make the assumption that  is an -almost-support for all the  with , as our  is. And we assumed our sequence of compact sets fulfilled that property.

So, where we're at is that our variant of the Almost-Monotone-Lemma works, as long as our assumption works out, that there's a sequence of compact sets  where, regardless of n or  or  is a -almost-support for .

In order to get our variant of the AML to go through, we really need to prove that there is such a sequence, for any  and . For this, we'll be looking at a function

What this does is that  is defined to return a compact subset of  which is an -almost-support for all the  inframeasures, when . There are always compact subsets like this for any input because of shared almost-compact-support for the infrakernel  (one of the assumed niceness conditions), so there does indeed exist a function  with these properties.

Then, we recursively define the  as:

and

All these sets are compact, by induction. For the first one, since  always returns compact sets, we're taking a finite union of compact sets (is compact), and taking the product with a finite set (is compact). For induction up, since the product of compact sets is compact,  is compact, and then projections of compact sets are compact, so then  returns a compact set, and again, we take a finite union of compact sets and then the product with a compact set to make another compact set.

Now, let's get started on showing that C^{\top}_{1:n+1}[C^{S},\eps] is an \eps\left(1-\frac{1}{2^{n+1}}\right)-almost-support for K^{\pi}_{:n}(s_0), for arbitrary \pi and s_0\in C^{S}. This proof will proceed by induction.

For the base-case, we want that  is a -almost-support for  with . Well, assume  are equal on , and .

Now,  is a -almost-support for  because , so we can apply Lemma 2 from LBIT, and 1-Lipschitzness of  to split this up as

That second term can be upper bounded by , so we get an upper bound of

and then we observe that since  on , and that set is defined as  , we have that they agree on 

So that term turns into 0, the two functions are equal on our set of interest, so our net upper bound is

Showing that  is a -almost support for all the , where  is arbitrary and s_0\in C^{S}.

Now for induction. We want to show that  is a -almost-support for  with , assuming that  is a -almost-support for  with .

To begin, assume  are equal on  and . Then we can go:


and then unpack to get


and unpack the semidirect product and substitute the dirac-delta value in to get


We apply a Lemma 2 from LBIT decomposition, with  as an -almost support for , which works by induction assumption. Since  is also 1-Lipschitz (the niceness conditions we already know), the Lemma 2 upper bound breaks down as




By 1-Lipschitzness of , we get an upper-bound of



And then this is upper-boundable by , so we get


Now, for that top term, we know that  comes from . We apply another Lemma 2 decomposition on  this time, with the compact set of interest being 

which is a -almost-support for . Pairing this with 1-Lipschitzness of , we have an upper bound of




That second term is upper-bounded by  no matter what, so we have



Pull out the constant and combine, to get



Now,  were assumed to agree on , which factors as

and then expands as

which all our  we're feeding in lie within. So those functions are identical for those inputs, and our upper-bound just reduces to

Which witnesses that  is a -almost-support for any  (arbitrary ) as long as . So our induction step goes off without a hitch, and we now know that the sequence is suitable to get our modification of the Almost-Monotone-Lemma to work out. We have the result 



In particular, since any defining sequence for  has an actual limit, this can be trivially reshuffled to attain the result:



And remember, the sequence of compact sets only depends on  and , not on the policy. This works for any policy. Now that we have this result in the bag, we can move on to phase 2.

T4.2.2.2 For phase 2, we'll show that  is lower-semicontinuous, and dear lord this is going to be hard and take a lot of integrals and limits to show. We start with

Which unpacks as

If this liminf is infinity, we're just done, infinity beats everything. So we'll be assuming that it settles down to a finite value. We can unpack the  as a challenge of picking points in , so we get

And, since we can pick amongst minimal points, we can rewrite this as

In the  case, the set of minimal points is precompact. In the  case, it may not be precompact, but the only way it fails to be precompact is having b values running off to infinity. Said  function can't be unbounded below, because it is above , and infinite semidirect products map constants to greater than their usual value, and  fulfills this condition. Thus, if the b value of a selected minimal point is far higher than the finite liminf value plus , the value said minimal point assigns to the function of interest is far higher than the true minimal value. So, when minimizing, we can restrict to a compact subset of  to attain the min, call it . With this, we get

And then we unpack this as an integral.

Now, we can pass to two subsequences. First, we can pass to a subsequence of the m that actually limits to the liminf value. On this subsequence, the policies still limit to . Also, for each member of this subsequence, we can find an true minimizing  pair for it since we're minimizing the lower-semicontinuous function 

over a compact set. Said function is lower-semicontinuous because

is lower-semicontinuous, it's one of the niceness properties that the infinite infrakernel inherits.

Since we're selecting a-measures from the compact set , we can find a subsequence of our subsequence where the a-measures converge. Now we'll be using i as our index, and using it to index the  pairs. The limit a-measure is . So, we have a rewrite as

Further, all numbers in this limit are finite, because we're operating in the case where the liminf is finite, and i indexes a subsequence which actually limits to the liminf. For said limit,  limits to , so we can pull that out of the limit, to get

At this point, since the  limit to , the set of the measure components  is compact in the space of measures on . In particular, this implies that there is some compact set  where all the  only have  or less measure outside said set (a necessary condition for precompactness of a set of measures)

Now, we split the measures  into the component of measure on , and the component outside of it.


Because  is a compact subset of , we can use it as a seed set for our  sequence that attains almost-monotonicity. To economize on space for the lines to fit, we'll write 

as just

We unpack the defining limit of the infinite semidirect product with our relevant sequence of compact sets to get


We'll now be splitting up the second integral as two more integrals. Fix an arbitrary .




We're really in the weeds here, let's see if we can make any dent whatsoever. What we'll be showing is that, regardless of ,


It is exceptionally important to note here that we can't just throw our variant of the Almost-Monotone Lemma at this, because we don't have a promise that .

An easy way to show this is to show it for all . So let . Then, our proof goal is that


First, observe that the functions

and

can only differ on the same input by at most . Since the finite stages are all 1-Lipschitz or less, we have



And then, by monotonicity of the infrakernel stages for functions which cut off at a finite point (it was from Theorem 1), we have


And we're done, we have shown that, regardless of , we have


Now, applying this fact to our giant list of integrals, we can take




and apply our lower-bound to get



Also, since we picked  so  only has  or less measure outside of it, regardless of i, we can impose a lower bound of:


Now, to break down the integral on the top. Since the behavior of the interior of that integral only depends on , we can apply our good old Almost-Monotone-Lemma variation, and lower-bound the limit by the  finite stage, minus  (independently of the choice of ) yielding


Now, we pull that constant piece out of the integral, and since both functions in the integral are now identical, we can stitch the measure components back together to get:


Now,  is a fragment of , which came from a minimal point in  or a limit of such, so the maximal amount of measure present would be . This lets us get a lower bound of


Doing a little bit of regrouping, we have


Recapping what we've done so far, we showed that 


For all . Regardless of , this liminf value in i is finite because it's upper-bounded by the liminf in m, which is assumed to be finite (we disposed of the infinite case already). Because the sequence  and the compact sets  were fixed before we picked , we can take a limit of the  without worrying about it affecting any of the stuff on the inside of the integral, to get


This liminf in i of the integral is finite for all n. At this point, we can notice something interesting. For fixed n, when we take the limit as i approaches infinity, eventually  will become  and never go back, because of convergence of the policies, and the fact that the infrakernel has a cutoff at time n, so the behavior of the policies before that time will stabilize on a partial policy. So, we can rewrite our lower bound as:


Again, for all n, this liminf in i quantity is finite. Now, the function

is lower-semicontinuous in  (as all the finite stages of the infinite infrakernel have that niceness property). And  is lower-semicontinuous when  is. Also, by now, the only thing i is controlling is the measure term. Therefore, that liminf of the integral (in i) is greater than what you'd get if you just passed to . So, we get


Again, for all n, the integral is finite. Now, since the functions

converge pointwise in n to the function

(any sequence of compact sets works to define the infinite semidirect product) and for all n, the integrals are finite (this is why we kept restating it over and over!), we can apply Fatou's Lemma to move the liminf into the integral, to get


And since we have pointwise limiting, this is


and then, packing it up,

And then, pack it up as

and one last step. Since  lies in , we can get

which reexpresses as

Our net result is

But because  was arbitrary and  and  are finite, we have our desired net result of

And lower-semicontinuity is shown for 

T4.2.2.3 Now for our phase 3, getting lower-semicontinuity of . Let  limit to . We have

and then this unpacks as

Reexpressing the projection, we have:

And then, we can apply lower-semicontinuity of  which we've shown in phase 2, to get

Which then packs up as

and then

and we have lower-semicontinuity for ! Finally!

T4.2.3 Time to return to easier fare. We need that  is supported entirely on histories compatible with . To do this, we proceed in three steps.

First, we do an induction proof that, for any given  and  is supported on the subset of  where the actions and observations so far are compatible with .

Second, we use our first result to show that for any two continuous bounded utility functions  which only depend on the action/observation sequence, and agree on all histories compatible with 

Finally, we use this result to show that if  agree on all histories compatible with , showing that  is supported on histories compatible with .

T4.2.3.1 Let's start with the induction proof that  has the subset of  where the history of actions and observations is compatible with  as a support. The way to do that is to pick two functions that are identical on said subset, and show they have equal expectation values. For the base case of the induction proof, fix an  and  which are identical on  (as that's the subset of  which is compatible with ), and observe:

and then by how  is defined, we have

Which, by unpacking the semidirect product and subsituting the dirac-delta value in, we have

and then since  when the initial action equals , we have

and then this just wraps back up by our sequence of rewrite steps in reverse as

Now that we have the base case out of the way, we'll attempt to show the induction step. Let  and  be identical on the subset of  which is compatible with , and we'll show they have equal expectation value. Our job is to unpack

this unpacks, by how sequences of semidirect products are defined, as


Now, since, by induction assumption,  is supported on the subset of  where the history is compatible with , we just need to show that the inner functions are identical on that subset to get that the difference is 0, so our proof goal is now


We now unpack  as  to get our new proof goal of:


We unpack the semidirect product and substitute the dirac-delta value in to get a proof goal of


at this point, we realize that if  is compatible with , then regardless of  and  is compatible with , so the functions

and

are equal, and we get our desired proof goal. So the induction goes through, and the subset of  where the history is compatible with  will always be a support of  for all n.

T4.2.3.2 Now that's in place, we'll show that any two continuous bounded functions  and  which only depend on the action/observation sequence and are identical on histories compatible with  will be assigned equal value by . Assume  and  only depend on action/observation sequences and are identical on histories compatible with . We have a desired proof target of

and, unpacking the limit, it turns into


The limits exist, so we'd be able to show that if we were able to show that


Now, we can use the Lemma 2 from LBIT decomposition on , since we know that the subset of  where the history is compatible with  is a support of , to get an upper bound on


of


For the first part, the 1 is there because all the infrakernels are 1-Lipschitz, and we're assessing distance of the inner functions on the support. For the second part, it's there because we don't care how different the two functions are outside the support, they must be assigned equal value. So anyways, this upper bound reduces to

Now, for all choices of n and , fix an arbitrary extension  which is compatible with  (which can always be done). Then use the triangle inequality twice to get an upper-bound of



We can observe that, since  and  are equal on all histories compatible with , and  was picked specifically to have , that second absolute value term must always be 0, to get


and then distribute the sups and limsups into the addition to get


Since this quantity is an upper bound on the 


quantity we wanted to show equals 0 to hit our proof target, we just need to show that it equals 0.

Here's what we do now. Since  and  are defined over , a compact space, they must both be uniformly continuous. You can take any , and find some  where two histories that only differ by  are assigned values  or less apart. And for any , there's some n where any two histories that agree on the first n steps only differ by . So, in the limit, both of these terms become 0, and our result that

for any  and  which agree on all histories compatible with  follows since we hit our proof target. 

T4.2.3.3 Finally, it's time to apply this result to get that  is supported only on histories compatible with . Fix any , and  which are identical on histories compatible with . We can start unpacking.

And then, for all ,

as we've already shown, since  and  agree on histories compatible with . Thus the functions in  are identical, and we have

and so we're done.

T4.2.4 The next belief function condition to check is agreement on max value. For the  type signature, fixing any  and 

And for the  type signature, any infinite semidirect product fulfilling the niceness conditions (which is the case) must map 1 to 1, so this turns into

And we're done, and in fact get the stronger result that  regardless of  because  because we assumed  was an infradistribution. For the  type signature, fixing any  and , we have

Regardless of type signature, an infinite semidirect product fulfilling the niceness conditions (which we're assuming) must map constants to an equal or greater constant, and infinity can be regarded as the supremum of putting constants in, so we get

Importantly, this does not mean that  must be .

T4.2.5 Now for the very last condition, checking that  is pseudocausal. This takes a whole lot of work (though less than lower-semicontinuity), and we'll do it by repeatedly showing equivalence of our desired result to some slightly simpler result, until we ground out in an induction proof we clean up in the same way, which splits into a base case and induction step.

T4.2.5.1 We want to show that

We can unpack  and , to get the equivalent proof target of:

First, we expand the update, to get the equivalent proof target of:


Then we rewrite the projections to get the equivalent proof target of

Then we rewrite the semidirect products to get the equivalent proof target of

By monotonicity for , we'd be able to prove this if, for any  initial state, we had

So we'll now try to prove this, let  be arbitrary. We can explicitly unpack the infinite semidirect product, and get a proof target of


We'd be able to prove this if we could prove the inequality for all the n individually, as then the inequality would carry over to the limit. So, our new proof target is, for arbitrary n,


We can observe something interesting here for the update. if  is compatible with , then the inf can be modeled as always picking some continuation compatible with , because doing otherwise would make the function equal 1 (or infinity). In such a case, the inner function of the top would turn into . If  is incompatible with , then it doesn't matter the extension, any extension will always be incompatible with , and return 1 (or infinity). So, this can be rephrased as


Now, we can pack up the fact that we're updating on "partial history is compatible with " to get an equivalent proof target of


Now, since these inner functions are identical, we'd be able to prove this proof target if we proved the following for arbitrary functions  (or ), and arbitrary  as before.

So, since we're trying to prove this result for arbitrary n, the obvious thing to do is an induction proof. If we can show this by induction, we hit our proof target and we're done.

T4.2.5.2 For our base case, we'd want to show

We reexpress the update to get a proof target of

Now, the only way for  to be incompatible with  is if . Doing that reexpression, we get

We recall that for the iterated semidirect product, the  thing is just , and . Making these substitutions, we get an equivalent proof target of


Reexpressing the semidirect product and subsituting the dirac-delta value in, we get the equivalent

We have two possible cases. In case 1, , and then our proof target turns into

Which is just true, they're equal. In case 2, , and we then get the proof target of

Which is true because

Because of mapping 1 to 1. Or, for the  type signature, we'd have

By Constants Increase for . So, we've finally finished up our proof target, setting up the base case for the induction. 

T4.2.5.3 Now for the induction step, which is all we need to finish up the proof. We need to somehow show that

For all functions , assuming that

holds for all functions . Let's begin. Our proof target is

We reexpress the update, to get the equivalent statement


We unpack the definition of  to get



And then substitute in the definition of  to get



And unpack the semidirect product and substitute in the dirac-delta value to get the proof target of



Now, looking at that gigantic first term... if  is incompatible with , then the really big function is just going to turn into 1 (or infinity), which is assigned a value of 1 (or infinity) by , from the usual constants increase/1-normalization argument.

If , then we're back in the usual business except we don't need to worry about incompatibilities of the history with  within , we only need to worry about those incompatiblities for the final action, . So, we can rewrite that first term with indicator functions.




And then rewrite this as just an update on , getting the equivalent proof goal



Now, we can define our  as


To derive, from our induction assumption, that



So, as long as we're able to show that this quantity still lies above

,we can just put the two inequalities together, and we'll attain our proof target and the induction will go through and we'll be done. So, our new proof target is



Splitting into two cases, it could be that . In this case that big inner function in the attempted upper bound turns into just 1, and then . Or, for the  type signature, we'd get .

In the second case where , we'd get 

as the value of the function inside , due to the indicator function working out appropriately, and  matching up with what  does. This ability to break stuff up into an indicator function yields a new equivalent proof target of




We would be able to show this via inframeasure monotonicity (as the starting inframeasures are equal now) if, for all , the first function was larger than the second one. Let  be arbitrary, and our proof goal is now



In the first subcase, where , because of 1-normalization (or constants moving up), our goal would be that

(or infinity). Infinity beats everything, and the inframeasures in the  type signature can report a maximum value of 1, so this proof goal is just obviously true.

In the second subcase, where , our proof goal would turn into:


But they're equal. We hit our proof goal, the induction goes through, and this hits the proof goal for establishing pseudocausality, which was the last condition we needed to show our result. We're done!

New Comment