This post accompanies An Introduction to Credal Sets and Infra-Bayes Learnability.
We use to denote the space of probability distributions over a set , which is assumed throughout to be a compact metric space. We use to denote the set of credal sets over
Given and , let
Let denote the space of continuous functions from to
Lemma 1: If and are finite, the set of countably infinite histories is a compact metric space under the metric where and is the time of first difference between and
Proof. The space is compact under the discrete topology since it is finite. Therefore, is compact under the product topology by Tychonoff's theorem. The stated metric induces a topology . By the definition of compactness, it is sufficient to show that all basis elements of are contained in
The basis elements of have the form
for and
Furthermore, sets of the form
are basis elements of where and
Given define . We will verify that if and only if Note that if then by construction, For the other direction, we have that if then since By the contrapositive, if then
Since if and only if for , this proves that and are equal as sets. Thus all basis elements of are contained in and is compact in the metric topology.
The following lemma is used both in the proof of Proposition 1 and Proposition 2.
Lemma 2: The set of deterministic policies is first-countable under the product topology.
Proof: Let We aim to show that has a countable neighborhood basis. Consider the collection of open sets in of the form where for finitely many and otherwise There is a bijection between and the set of finite binary sequences, so this collection is countable.
Let be a neighborhood of By the definition of a basis, there exists a basis element for the product topology such that By definition of the product topology, the set can be written as where for finitely many Since an element of is contained in and thus in Therefore, is a countable neighborhood basis.
Proposition 1: Every crisp causal law is continuous with respect to the product topology on and the Hausdorff topology on
Proof: By Lemma 2, is first-countable under the product topology. Consequently, is continuous if given a convergent sequence such that , then Let such a convergent sequence be given.
By definition, there exists a set of environments that generates We will show convergence using the weak topology, so let be a 1-Lipschitz continuous function.
Since given any finite time horizon there exists such that for all and agree up to time Furthermore, the set of destinies can be partitioned into a disjoint union where is a set of destinies that are equal up to time and Note that by construction, if then
Then for every environment
For all By the 1-Lipschitz property of and the construction of
Therefore,
As can be taken arbitrarily large and tends to zero as tends to infinity. The Kantorovich-Rubinstein metric induces the weak topology on so for all Therefore, in the Hausdorff topology.
Corollary 1 below corresponds to Proposition 5 in this proof section of the original infra-Bayesianism sequence. The proof of that proposition was organized in three phases. Since we expand the ideas in more detail, we have several lemmas; "phase 1" is captured here in Lemma 5, and "phase 2" and "phase 3" are captured here in Proposition 2. The other lemmas cover prerequisite ideas that are used in the proof.
The following lemma is a special case of Theorem 6.9 of [1]. We provide a simplified proof under the assumption that is a compact metric space using the fact that the set of all Lipschitz functions is dense in the space of continuous functions from to [2].
Lemma 3: Let be a compact metric space, be continuous, and Then is continuous as a function from with the Kantorovich-Rubinstein (KR-) metric to
Proof. Since is a metric space, is first-countable. Thus is continuous if implies . Let and By definition, there exists such that for all
where the supremum is taken over all 1-Lipschitz continuous functions
Suppose is 1-Lipschitz. Then since
Suppose is -Lipschitz. Then is 1-Lipschitz. So, by the first case, there exists such that for all
By linearity of expectation, for all
Suppose is continuous. By the density of the set of all Lipschitz functions in [2], there exists a Lipschitz function such that By the second case, there exists such that for all Applying the triangle inequality, we obtain that for all
Here we have used the fact that and thus This shows that which completes the proof by the remark at the beginning.
Lemma 4: Let be a credal set over a compact metric space Then for all continuous there exists such that .
Proof. By Lemma 3, the map is continuous. By definition, is compact. A continuous function over a compact set achieves a maximum over that set, which proves the result.
The next lemma corresponds to "phase one" of the proof of Proposition 5 in this proof section of the original infra-Bayesianism sequence.
Lemma 5: Let be a crisp causal law, and let be continuous. Suppose a sequence of policies converges to in the KR-metric on Then
Proof. Let a convergent sequence and be given. We aim to show that there exists such that for all
By Lemma 1 and Lemma 4, for all , there exists such that . Similarly, there exists such that . Then
By definition, for all and This implies
By (2),
By applying (3) similarly, we obtain
These inequalities imply that
We will now consider how to bound each of the expressions in the set on the right hand side.
Since is compact (by Lemma 2 of An Introduction to Reinforcement Learning for Understanding Infra-Bayesianism) and is continuous, is uniformly continuous by the Heine-Cantor theorem. Recall that the metric on is the Chebyshev metric defined by By the definition of uniform continuity, there exists such that implies that
Since , there exists such that for all Then (with steps explained below),
Note that due to the linearity of expectation and the triangle inequality for integrals, which states that for any measure and integrable function By definition, By monotonicity of the Lebesgue integral,
By the fact that is a probability measure and thus
for all
By a similar argument, for all
Returning to equations (1) and (4), we obtain that for all ,
which completes the proof.
The next proposition is a special case of "phase two" and "phase 3" of the proof of Proposition 5 in this proof section of the original infra-Bayesianism sequence. The ideas here are the same, although we provide a direct proof.
Proposition 2: Let be a crisp causal law, and let be continuous. Then the function is continuous.
Proof: By Lemma 2, is first-countable, and thus it is sufficient to show that for any convergent sequence By Lemma 5, it is furthermore sufficient to prove that
To that end, let We will show that and
By Lemma 1 and Lemma 4, there exists such that By the continuity of , there exists a convergent sequence such that and for all By assumption, is continuous, and thus Lemma 3 implies that By construction, Then
For the second argument, choose a subsequence such that
Invoking Lemma 1 and Lemma 4 again, for each , choose such that Since is continuous and is closed, there exists a convergent subsequence with a limit point
Then by (1), Lemma 3, and the definition of expectation with respect to a credal set,
Thus which completes the proof.
Corollary 1 then follows from Proposition 2 using a standard continuity argument.
Corollary 1: For all crisp causal laws and for all continuous functions , is a closed, non-empty set.
The following proposition corresponds to Proposition 12 in this proof section of the infra-Bayesian sequence. Equation (2) below appears there, and we explain it in more detail here. The remainder of the proof differs from the original version.
Proposition 3: For any non-dogmatic prior over a learnable collection of crisp causal laws , if a family of policies is infra-Bayes optimal with respect to , then learns
Proof: Assume that is learnable. We will proceed by contrapositive and show that if a family of policies does not learn then for any non-dogmatic prior over is not infra-Bayes optimal. By definition, there exists a family of policies such that for all
Equivalently, by the definition of infra-regret, for all
We preliminarily aim to show that
Let be given. Note that for all
Since is countable and , there exists such that Since for all the second sum above is bounded by . We will now bound the first sum.
Let Applying equation (1) for each we can obtain such that for all Let . Then holds simultaneously for all and Thus for proving Equation (2).
Since does not learn there exists such that We will show that this implies which when combined with (2) indicates that is not infra-Bayes optimal with respect to .
With steps explained in the proceeding paragraphs, we have that
The first equality follows from the definition of infra-regret, and the second equality follows from the definition of expectation with respect to a measure on a countable set. The third equality is an application of the Lebesgue Dominating Convergence Theorem using the constant one function as a dominating function. The inequalities follow from the fact that all terms in the sum are positive, and which is true by the assumption that is non-dogmatic.
We will now show that is not infra-Bayes optimal with respect to . There exists such that
Therefore,
By linearity,
By definition,
Simplifying and applying linearity of expectation, we obtain
Therefore, is not an infra-Bayes optimal family of policies.
Many thanks Vanessa Kosoy and Marcus Ogren for their constructive feedback on the initial draft.
[1] Villani, Cédric, Optimal Transport: Old and New. Springer Berlin, Heidelberg, 2009.
[2] Carothers, N. L., "The Stone-Weierstrass Theorem." In Real Analysis. Cambridge University Press, 2012.