Fair upfront warning: This is not a particularly readable proof section (though much better than Section 2 about belief functions). There's dense notation, logical leaps due to illusion of transparency since I've spent a month getting fluent with these concepts, and a relative lack of editing since it's long. If you really want to read this, I'd suggest PM-ing me to get a link to MIRIxDiscord, where I'd be able to guide you through it and answer questions.
Proposition 1:If f∈C(X,[0,1]) then f+:(m,b)↦m(f)+b is a positive functional on Msa(X).
Proof Sketch: We just check three conditions. Linearity, being nonnegative on Msa(X), and continuity.
So we have verified that f+(aM+a′M′)=af+(M)+a′f+(M′) and we have linearity.
Positivity proof: An sa-measure M, writeable as (m,b) has m uniquely writeable as a pair of finite measures m+ (all the positive regions) and a m− (all the negative regions) by the Jordan Decomposition Theorem, and b+m−(1)≥0. So,
f+(M)=m(f)+b=m+(f)+m−(f)+b≥0+m−(1)+b≥0
The first ≥ by 1≥f≥0, so the expectation of f is positive and m− is negative so taking the expectation of 1 is more negative. The second ≥ is by the condition on how m− relates to b.
Continuity proof: Fix a sequence (mn,bn) converging to (m,b). Obviously the b part converges, so now we just need to show that mn(f) converges to m(f). The metric we have on the space of finite signed measures is the KR-metric, which implies the thing we want. This only works for continuous f, not general f.
Theorem 1:Every positive functional on Msa(X) can be written as (m,b)↦c(m(f)+b), where c≥0, and f∈C(X,[0,1])
Proof Sketch: The first part is showing that it's impossible to have a positive functional where the b term doesn't matter, without the positive functional being the one that maps everything to 0. The second part of the proof is recovering our f by applying the positive functional to Dirac-delta measures δx, to see what the function must be on point x.
Part 1: Let's say f+ isn't 0, ie there's some nonzero (m,b) pair where f+(m,b)>0, and yet f+(0,1)=0 (which, by linearity, means that f+(0,b)=0 for all b). We'll show that this situation is impossible.
Then, 0<f+(m,b)=f+(m+,0)+f+(m−,b) by our starting assumption, and Jordan decomposition of m, along with linearity of positive functionals. Now, f+(m−,b)+f+(−2(m−),0)=f+(−(m−),b) because positive functionals are linear, and everything in that above equation is an sa-measure (flipping a negative measure makes a positive measure, which doesn't impose restrictions on the b term except that it be ≥0). And so, by nonnegativity of positive functionals on sa-measures, f+(m−,b)≤f+(−(m−),b). Using this, we get
Another use of linearity was invoked for the first = in the second line, and then the second = made use of our assumption that f+(0,b)=0 for all b.
At this point, we have derived that 0<f+(m+,0)+f+(−(m−),0). Both of these are positive measures. So, there exists some positive measure m′ where f+(m′,0)>0.
Now, observe that, for all b, 0=f+(0,b)=f+(m′,0)+f+(−(m′),b)
Let b be sufficiently huge to make (−(m′),b) into an sa-measure. Also, since f+(m′,0)>0, f+(−(m′),b)<0, which is impossible because positive functionals are nonnegative on all sa-measures. Contradiction. Due to the contradiction, if there's a nonzero positive functional, it must assign f+(0,1)>0, so let f+(0,1) be our c term.
Proof part 2: Let's try to extract our f. Let f(x):=f+(δx,0)f+(0,1) This is just recovering the value of the hypothesized f on x by feeding our positive functional the measure δx that assigns 1 value to x and nothing else, and scaling. Now, we just have to verify that this f is continuous and in [0,1].
For continuity, let xn limit to x. By the KR-metric we're using, (δxn,0) limits to (δx,0). By continuity of f+, f+(δxn,0) limits to f+(δx,0). Therefore, f(xn) limits to f(x) and we have continuity.
For a lower bound, f≥0, because f(x) is a ratio of two nonnegative numbers, and the denominator isn't 0.
Now we just have to show that f≤1. For contradiction, assume there's an x where f(x)>1. Then f+(δx,0)f+(0,1)>1, so f+(δx,0)>f+(0,1), and in particular, f+(0,1)−f+(δx,0)<0.
But then, f+(−(δx),1)+f+(δx,0)=f+(0,1), so f+(−(δx),1)=f+(0,1)−f+(δx,0)<0
However, (−(δx),1) is an sa-measure, because δx(1)+1=0, and must have nonnegative value, so we get a contradiction. Therefore, f∈C(X,[0,1]).
Lemma 1: Compactness Lemma:Fixing some nonnegative constants λ◯ and b◯, the set of sa-measures where m+(1)∈[0,λ◯], b∈[0,b◯], is compact. Further, if a set lacks an upper bound on m+(1) or on b, it's not compact.
Proof Sketch: We fix an arbitrary sequence of sa-measures, and then use the fact that closed intervals are compact-complete and the space ΔX is compact-complete to isolate a suitable convergent subsequence. Since all sequences have a limit point, the set is compact. Then, we go in the other direction, and get a sequence with no limit points assuming either a lack of upper bounds on m+(1), or a lack of upper bounds on b.
Proof: Fix some arbitrary sequence Mn wandering about within this space, which breaks down into (m+n,0)+(m−n,bn), and then, since all measures are just a probability distribution scaled by the constant m(1), it further breaks down into (m+n(1)⋅μn,0)+(m−n(1)⋅μ′n,bn). Since bn+m−n(1)≥0, m−n(1) must be bounded in [−b◯,0].
Now, what we can do is extract a subseqence where bn ,m+n(1), m−n(1), μn, and μ′n all converge, by Tychonoff's Theorem (finite product, no axiom of choice required) Our three number sequences are all confined to a bounded interval, and our two probability sequences are wandering around within ΔX which is a compact complete metric space if X is. The limit of this subsequence is a limit point of the original sequence, since all its components are arbitrarily close to the components that make up Mn for large enough n in our subsequence.
The limiting value of m+(1) and b both obey their respective bounds, and the cone of sa-measures is closed, so the limit point is an sa-measure and respects the bounds too. Therefore the set is compact, because all sequences of points in it have a limit point.
In the other direction, assume a set B has unbounded b values. Then we can fix a sequence (mn,bn)∈B where bn increases without bound, so the a-measures can't converge. The same applies to all subsequences, so there's no limit point, so B isn't compact.
Now, assume a set B has bounded b values, call the least upper bound b⊙, but the value of m+(1) is unbounded. Fix a sequence (mn,bn)∈B where m+n(1) is unbounded above. Assume a convergent subsequence exists. Since bn+m−n(1)≥0, m−n(1) must be bounded in [−b⊙,0]. Then because mn(1)=m+n(1)+m−n(1)≥m+n(1)−b⊙, and the latter quantity is finite, mn(1) must be unbounded above. However, in order for the mn to limit to some m, limn→∞mn(1)=m(1), which results in a contradiction. Therefore, said convergent subsequence doesn't exist, and B is not compact.
Put together, we have a necessary-and-sufficient condition for a closed subset of Msa(X) to be compact. There must be an upper bound on b and m+(1), respectively.
Lemma 2:The upper completion of a closed set of sa-measures is closed.
Proof sketch: We'll take a convergent sequence (mn,bn) in the upper completion of B that limits to (m,b), and show that, in order for it to converge, the same sorts of bounds as the Compactness Lemma uses must apply. Then, breaking down (mn,bn) into (mBn,bBn)+(m∗n,b∗n), where (mBn,bBn)∈B, and (m∗n,b∗n) is an sa-measure, we'll transfer these Compactness-Lemma-enabling bounds to the sequences (mBn,bBn) and (m∗n,b∗n), to get that they're both wandering around in a compact set. Then, we just take a convergent subsequence of both, add the two limit points together, and get our limit point (m,b), witnessing that it's in the upper completion of B.
Proof: Let (mn,bn)∈B+Msa(X) limit to some (m,b). A convergent sequence (plus its one limit point) is a compact set of points, so, by the Compactness Lemma, there must be a b◯ and λ◯ that are upper bounds on the bn and m+n(1) values, respectively.
Now, for all n, break down (mn,bn) as (mBn,bBn)+(m∗n,b∗n), where (mBn,bBn)∈B, and (m∗n,b∗n) is an sa-measure.
Because bBn+b∗n=bn≤b◯, we can bound the bBn and b∗n quantities by b◯. This transfers into a −b◯ lower bound on mB−n(1) and m∗−n(1), respectively.
Using worst-case values for mB−n(1) and m∗−n(1), we get:
mB+n(1)+m∗+n(1)−2b◯≤λ◯
mB+n(1)+m∗+n(1)≤λ◯+2b◯
So, we have upper bounds on mB+n(1) and m∗+n(1) of λ◯+2b◯, respectively.
Due to the sequences (mBn,bBn) and (m∗n,b∗n) respecting bounds on b and m+(1) (b◯ and λ◯+2b◯ respectively), and wandering around within the closed sets B and Msa(X) respectively, we can use the Compactness Lemma and Tychonoff's theorem (finite product, no axiom of choice needed) to go "hey, there's a subsequence where both (mBn,bBn) and (m∗n,b∗n) converge, call the limit points (mB,bB) and (m∗,b∗). Since B and Msa(X) are closed, (mB,bB)∈B, and (m∗,b∗)∈Msa(X)."
Now, does (mB,bB)+(m∗,b∗)=(m,b)? Well, for any ϵ, there's some really large n where d((mBn,bBn),(mB,bB))<ϵ, d((m∗n,b∗n),(m∗,b∗))<ϵ, and d((mn,bn),(m,b))<ϵ. Then, we can go:
So, regardless of ϵ, d((m,b),(mB,bB)+(m∗,b∗))<3ϵ, so (mB,bB)+(m∗,b∗)=(m,b). So, we've written (m,b) as a sum of an sa-measure in B and an sa-measure, certifying that (m,b)∈B+Msa(X), so B+Msa(X) is closed.
Proof sketch: Show both subset inclusion directions. One is very easy, then we assume the second direction is false, and invoke the Hahn-Banach theorem to separate a point in the latter set from the former set. Then we show that the separating functional is a positive functional, so we have a positive functional where the additional point underperforms everything in B+Msa(X), which is impossible by the definition of the latter set.
Easy direction: We will show that B+Msa(X)⊆{M|∀f+∃M′∈B:f+(M)≥f+(M′)}
This is because a M∈(B+Msa(X)), can be written as M=MB+M∗. Let MB be our M′ of interest. Then, it is indeed true that for all f+, f+(M)=f+(MB)+f+(M∗)≥f+(MB)
Hard direction: Assume by contradiction that
B+Msa(X)⊂{M|∀f+∃M′∈B:f+(M)≥f+(M′)}
Then there's some M where ∀f+∃M′∈B:f+(M)≥f+(M′) and M∉B+Msa(X). B+Msa(X) is the upper completion of a closed set, so by the Compactness Lemma, it's closed, and since it's the Minkowski sum of convex sets, it's convex.
Now, we can use the variant of the Hahn-Banach theorem from the Wikipedia article on "Hahn-Banach theorem", in the "separation of a closed and compact set" section. Our single point M is compact, convex, nonempty, and disjoint from the closed convex set B+Msa(X). Banach spaces are locally convex, so we can invoke Hahn-Banach separation.
Therefore, there's some continuous linear functional ϕ s.t. ϕ(M)<infM′∈(B+Msa(X))ϕ(M′)
We will show that this linear functional is actually a positive functional!
Assume there's some sa-measure M∗ where ϕ(M∗)<0. Then we can pick a random MB∈B, and consider ϕ(MB+cM∗), where c is extremely large. MB+cM∗ lies in B+Msa(X), but it would also produce an extremely negative value for \phi which undershoots ϕ(M) which is impossible. So ϕ is a positive functional.
However, ϕ(M)<infM′∈(B+Msa(X))ϕ(M′), so ϕ(M)<infM′∈Bϕ(M′). But also, M fulfills the condition ∀f+∃M′∈B:f+(M)≥f+(M′), because of the set it came from. So, there must exist some M′∈B where ϕ(M)≥ϕ(M′). But, we have a contradiction, because ϕ(M)<infM′∈Bϕ(M′).
So, there cannot be any point in {M|∀f+∃M′∈B:f+(M)≥f+(M′)} that isn't in B+Msa(X). This establishes equality.
Lemma 3:For any closed set B⊆Msa(X) and point M∈B, the set ({M}−Msa(X))∩B is nonempty and compact.
Proof: It's easy to verify nonemptiness, because M is in the set. Also, it's closed because it's the intersection of two closed sets. B was assumed closed, and the other part is the Minkowski sum of {M} and −Msa(X), which is closed if −Msa(X) is, because it's just a shift of −Msa(X) (via a single point). −Msa(X) is closed because it's -1 times a closed set.
We will establish a bound on the m+(1) and b values of anything in the set, which lets us invoke the Compactness Lemma to show compactness, because it's a closed subset of a compact set.
Note that if M′∈({M}−Msa(X))∩B, then M′=M−M∗, so M′+M∗=M. Rewrite this as (m′,b′)+(m∗,b∗)=(m,b)
Because b′+b∗=b, we can bound b′ and b∗ by b. This transfers into a −b lower bound on m′−(1) and m∗−(1). Now, we can go:
m′+(1)+m′−(1)+m∗+(1)+m∗−(1)=m′(1)+m∗(1)=m(1)
=m+(1)+m−(1)≤m+(1)
Using worst-case values for m′−(1) and m∗−(1), we get:
m′+(1)+m′+(1)−2b≤m+(1)
m′+(1)≤m′+(1)+m∗+(1)≤m+(1)+2b
So, we have an upper bound of m+(1)+2b on m′+(1), and an upper bound of b on b′. Further, (m′,b′) was arbitrary in ({M}−Msa(X))∩B, so we have our bounds. This lets us invoke the Compactness Lemma, and conclude that said closed set is compact.
Lemma 4:If ≥ is a partial order on B where M′≥M iff there's some sa-measure M∗ where M=M′+M∗, then
∃M′>M↔(M∈B∧∃M′≠M:M′∈{M}−Msa(X))∩B)↔M is not minimal in B
Proof: ∃M′>M↔∃M′≠M:M′≥M
Also, M′≥M↔(M′,M∈B∧∃M∗:M=M′+M∗)
Also, ∃M∗:M=M′+M∗↔∃M∗:M−M∗=M′↔M′∈({M}−Msa(X))
Putting all this together, we get
(∃M′>M)↔(M∈B∧∃M′≠M:M′∈({M}−Msa(X))∩B)
And we're halfway there. Now for the second half.
M is not minimal in B↔M∈B∧(∃M′∈B:M′≠M∧(∃M∗:M=M′+M∗))
Also, ∃M∗:M=M′+M∗↔∃M∗:M−M∗=M′↔M′∈({M}−Msa(X))
Putting this together, we get
M is not minimal in B↔(M∈B∧∃M′≠M:M′∈({M}−Msa(X))∩B)
And the result has been proved.
Theorem 2:Given a nonempty closed set B, the set of minimal points Bmin is nonempty and all points in B are above a minimal point.
Proof sketch: First, we establish an partial order that's closely tied to the ordering on B, but flipped around, so minimal points in B are maximal elements. We show that it is indeed a partial order, letting us leverage Lemma 4 to translate between the partial order and the set B. Then, we show that every chain in the partial order has an upper bound via Lemma 3 and compactness arguments, letting us invoke Zorn's lemma to show that that everything in the partial order is below a maximal element. Then, we just do one last translation to show that minimal points in B perfectly correspond to maximal elements in our partial order.
Proof: first, impose a partial order on B, where M′≥M iff there's some sa-measure M∗ where M=M′+M∗. Notice that this flips the order. If an sa-measure is "below" another sa-measure in the sa-measure addition sense, it's above that sa-measure in this ordering. So a minimal point in B would be maximal in the partial order. We will show that it's indeed a partial order.
Reflexivity is immediate. M=M+(0,0), so M≥M.
For transitivity, assume M′′≥M′≥M. Then there's some M∗ and M′∗ s.t. M=M′+M∗, and M′=M′′+M′∗. Putting these together, we get M=M′′+(M∗+M′∗), and adding sa-measures gets you an sa-measure, so M′′≥M.
For antisymmetry, assume M′≥M and M≥M′. Then M=M′+M∗, and M′=M+M′∗. By substitution, M=M+(M∗+M′∗), so M′∗=−M∗. For all positive functionals, f+(M′∗)=f+(−M∗)=−f+(M∗), and since positive functionals are always nonnegative on sa-measures, the only way this can happen is if M∗ and M′∗ are 0, showing that M=M′.
Anyways, since we've shown that it's a partial order, all we now have to do is show that every chain has an upper bound in order to invoke Zorn's lemma to show that every point in B lies below some maximal element.
Fix some ordinal-indexed chain Mγ, and associate each of them with the set Sγ=({Mγ}+(−Msa(X)))∩B, which is compact by Lemma 3 and always contains Mγ.
The collection of Sγ also has the finite intersection property, because, fixing finitely many of them, we can consider a maximal γ∗, and Mγ∗ is in every associated set by:
Case 1: Some other Mγ equals Mγ∗, so Sγ=Sγ∗ and Mγ∗∈Sγ∗=Sγ.
Case 2: Mγ∗>Mγ, and by Lemma 4, Mγ∗∈({Mγ}−Msa(X))∩B.
Anyways, since all the Sγ are compact, and have the finite intersection property, we can intersect them all and get a nonempty set containing some point M∞. M∞ lies in B, because all the sets we intersected were subsets of B. Also, because M∞∈(Mγ−Msa(X))∩B for all γ in our chain, then if M∞≠Mγ, Lemma 4 lets us get M∞>Mγ, and if M∞=Mγ, then M∞≥Mγ. Thus, M∞ is an upper bound for our chain.
By Zorn's Lemma, because every chain has an upper bound, there are maximal elements in B, and every point in B has a maximal element above it.
To finish up, use Lemma 4 to get: M is maximal↔¬∃M′>M↔M is minimal in B
Proposition 3: Given a f∈C(X,[0,1]), and a B that is nonempty closed, inf(m,b)∈B(m(f)+b)=inf(m,b)∈Bmin(m(f)+b)
Direction 1: since Bmin is a subset of B, we get one direction easily, that
inf(m,b)∈B(m(f)+b)≤inf(m,b)∈Bmin(m(f)+b)
Direction 2: Take a M∈B. By Theorem 2, there is a Mmin∈Bmin s.t. M=Mmin+M∗. Applying our positive functional m(f)+b (by Proposition 1), we get that m(f)+b≥mmin(f)+bmin. Because every point in B has a point in Bmin which scores as low or lower according to the positive functional,
inf(m,b)∈B(m(f)+b)≥inf(m,b)∈Bmin(m(f)+b)
And this gives us our desired equality.
Proposition 4:Given a nonempty closed convex B, Bmin=(Buc)min and (Bmin)uc=Buc
Proof: First, we'll show Bmin=(Buc)min. We'll use the characterization in terms of the partial order ≤ we used for the Zorn's Lemma proof of Theorem 2. If a point M is in Buc, then it can be written as M=MB+M∗, so M≤MB. Since all points added in Buc lie below a preexisting point in B (according to the partial order from Theorem 2) the set of maximals (ie, set of minimal points) is completely unchanged when we add all the new points to the partial order via upper completion, so Bmin=(Buc)min.
For the second part, one direction is immediate. Bmin⊆B, so (Bmin)uc⊆Buc. For the reverse direction, take a point M∈Buc. It can be decomposed as MB+M∗, and then by Theorem 2, MB can be decomposed as Mmin+M′∗, so M=Mmin+(M∗+M′∗), so it lies in (Bmin)uc, and we're done.
Theorem 3:If the nonempty closed convex sets A and B have Amin≠Bmin, then there is some f∈C(X,[0,1]) where EA(f)≠EB(f)
Proof sketch: We show that upper completion is idempotent, and then use that to show that the upper completions of A and B are different. Then, we can use Hahn-Banach to separate a point of A from Buc (or vice-versa), and show that the separating functional is a positive functional. Finally, we use Theorem 1 to translate from a separating positive functional to different expectation values of some f∈C(X,[0,1])
Proof: Phase 1 is showing that upper completion is idempotent. (Buc)uc=Buc. One direction of this is easy, Buc⊆(Buc)uc. In the other direction, let M∈(Buc)uc. Then we can decompose M into M′+M∗, where M′∈Buc, and decompose that into MB+M′∗ where MB∈B, so M=MB+(M∗+M′∗) and M∈Buc.
Now for phase 2, we'll show that the minimal points of one set aren't in the upper completion of the other set. Assume, for contradiction, that this is false, so Amin⊆Buc and Bmin⊆Auc. Then, by idempotence, Proposition 4, and our subset assumption,
Auc=(Amin)uc⊆(Buc)uc=Buc
Swapping the A and B, the same argument holds, so Auc=Buc, so (Buc)min=(Auc)min.
Now, using this and Proposition 4, Bmin=(Buc)min=(Auc)min=Amin.
But wait, we have a contradiction, we said that the minimal points of B and A weren't the same! Therefore, either Bmin⊈Auc, or vice-versa. Without loss of generality, assume that Bmin⊈Auc.
Now for phase 3, Hahn-Banach separation to get a positive functional with different inf values. Take a point MB in Bmin that lies outside Auc. Now, use the Hahn-Banach separation of {MB} and Auc used in the proof of Proposition 2, to get a linear functional ϕ (which can be demonstrated to be a positive functional by the same argument as the proof of Proposition 2) where: ϕ(MB)<infM∈Aucϕ(M). Thus, infM∈Bϕ(M)<infM∈Aϕ(M), so infM∈Bϕ(M)≠infM∈Aϕ(M)
Said positive functional can't be 0, otherwise both sides would be 0. Thus, by Theorem 1, ϕ((m,b))=a(m(f)+b) where a>0, and f∈C(X,[0,1]). Swapping this out, we get:
inf(m,b)∈Ba(m(f)+b)≠inf(m′,b′)∈Aa(m′(f)+b′)
inf(m,b)∈B(m(f)+b)≠inf(m′,b′)∈A(m′(f)+b′)
and then this is EB(f)≠EA(f) So, we have crafted our f∈C(X,[0,1]) which distinguishes the two sets and we're done.
Corollary 1:If two nonempty closed convex upper-complete sets A and B are different, then there is some f∈C(X,[0,1]) whereEA(f)≠EB(f)
Proof: Either Amin≠Bmin, in which case we can apply Theorem 3 to separate them, or their sets of minimal points are the same. In that case, by Proposition 4 and upper completion, A=Auc=(Amin)uc=(Bmin)uc=Buc=B and we have a contradiction because the two set are different.
Theorem 4:If H is an infradistribution/bounded infradistribution, then h:f↦EH(f) is concave in f, monotone, uniformly continuous/Lipschitz, h(0)=0,h(1)=1, and if range(f)⊈[0,1], h(f)=−∞
Proof sketch: h(0)=0,h(1)=1 is trivial, as is uniform continuity from the weak bounded-minimal condition. For concavity and monotonicity, it's just some inequality shuffling, and for h(f)=∞ if f∈C(X),f∉C(X,[0,1]), we use upper completion to have its worst-case value be arbitrarily negative. Lipschitzness is much more difficult, and comprises the bulk of the proof. We get a duality between minimal points and hyperplanes in C(X)⊕R, show that all the hyperplanes we got from minimal points have the same Lipschitz constant upper bound, and then show that the chunk of space below the graph of h itself is the same as the chunk of space below all the hyperplanes we got from minimal points. Thus, h has the same (or lesser) Lipschitz constant as all the hyperplanes chopping out stuff above the graph of h.
Proof: For normalization, h(1)=EH(1)=1 and h(0)=EH(0)=0 by normalization for H. Getting the uniform continuity condition from the weak-bounded-minimal condition on an infradistribution H is also trivial, because the condition just says f↦EH(f) is uniformly continuous, and that's just h itself.
Let's show that h is concave over C(X,[0,1]), first. We're shooting for h(pf+(1−p)f′)≥ph(f)+(1−p)h(f′). To show this,
And we're done. The critical inequality in the middle came from all minimal points in an infradistribution having no negative component by positive-minimals, so swapping out a function for a greater function produces an increase in value.
Time for range(f)⊈[0,1]→h(f)=−∞. Let's say there exists an x s.t. f(x)>1. We can take an arbitrary sa-measure (m,b)∈H, and consider (m,b)+c(−δx,1), where δx is the point measure that's 1 on x, and c is extremely huge. The latter part is an sa-measure. But then,(m−cδx)(f)+(b+c)=m(f)+b+c(1−δx(f))=m(f)+b+c(1−f(x)). Since f(x)>1, and c is extremely huge, this is extremely negative. So, since there's sa-measures that make the function as negative as we wish in H by upper-completeness, inf(m,b)∈H(m(f)+b)=−∞ A very similar argument can be done if there's an x where f(x)<0, we just add in (cδx,0) to force arbitrarily negative values.
Now for Lipschitzness, which is by far the worst of all. A minimal point (m,b) induces an affine function hm,b (kinda like a hyperplane) of the form hm,b(f)=m(f)+b. Regardless of (m,b), as long as it came from a minimal point in H, hm,b≥h for functions with range in [0,1], because
hm,b(f)=m(f)+b≥inf(m,b)∈H(m(f)+b)=EH(f)=h(f)
Ok, so if a point is on-or-below the graph of h over C(X,[0,1]), then it's on-or-below the graph of hm,b for all (m,b)∈Hmin.
What about the other direction? Is it possible for a point (f,b′) to be strictly above the graph of h and yet ≤ all the graphs of hm,b? Well, no. Invoking Proposition 3,
So, there exists a minimal point (m,b)∈Hmin where b′>hm,b(f), so (f,b′) lies above the graph of hm,b.
Putting these two parts together, h's hypograph over C(X,[0,1]) is the same as the intersection of the hypographs of all these hm,b. If we can then show all the hm,b have a Lipschitz constant bounded above by some constant, then we get that h itself is Lipschitz with the same constant.
First, a minimal (m,b) must have m having no negative parts, so it can be written as λμ, and by bounded-minimals (since we have a bounded infradistribution), λ≤λ⊙. Now,
So, we get that: |hm,b(f)−hm,b(f′)|supx∈X|f(x)−f′(x)|≤λ⊙supx∈X|f(x)−f′(x)|supx∈X|f(x)−f′(x)|=λ⊙
Note that supx∈X|f(x)−f′(x)| is our distance metric between functions in C(X). This establishes that regardless of which minimal point we picked, hm,b is Lipschitz with Lipschitz constant ≤λ⊙, and since h=inf(m,b)∈Hminhm,b, then h itself has the same bound on its Lipschitz constant.
Lemma 5:∀m:inff∈C(X,[0,1])(m(f))=m−(1)
Proof sketch: We'll work in the Banach space L1(|m|) of L1 measurable functions w.r.t the absolute value of the signed measure m. Then, we consider the discontinuous (but L1) function that's 1 everywhere where m is negative. Continuous functions are dense in L1 measurable functions, so we can fix a sequence of continuous functions limiting to said indicator function. Then we just have to check that f↦m(f) is a bounded linear functional, and we get that there's a sequence of continuous functions f′n where m(f′n) limits to the measure of the indicator function that's 1 where everything is negative. Which is the same as the measure of the "always 1" function, but only on the negative parts, and we're done.
Consider the Banach space L1(|m|) of measurable functions w.r.t. the absolute value of the signed measure m, ie, |m|=m+−m−, which is a measure. It has a norm given by ||f||=∫X|f|d|m|. To begin with, we can consider the L1 indicator function 1m− that's 1 where the measure is negative. Note that
m(1m−)=∫X1m−dm=∫X1m−dm++∫X1m−dm−
=∫X0dm++∫X1dm−=∫X1dm−=m−(1)
Because continuous functions are dense in L1, we can fix a sequence of continuous functions fn limiting to 1m−. Then, just clip those continuous functions to [0,1], making a continuous function f′n. They'll get closer to 1m− that way, so the sequence f′n of continuous functions X→[0,1] limits to 1m− too.
We'll take a detour and show that m is a bounded linear functional L1(|m|)→R, with a Lipschitz constant of 1 or less.
So, m(f)∈[−1,1]. An f having a norm of 1 or less gets mapped to a number with a norm of 1 or less, so the Lipschitz constant of f↦m(f) is 1 or less. This implies continuity.
Now that we have all requisite components, fix some ϵ. There's some n where, for all greater n, d(1m−,f′n)<ϵ. Mapping them through f↦m(f), due to having a Lipschitz constant of 1 or less, then means that ϵ>|m(f′n)−m(1m−)|=m(f′n)−m(1m−)=m(f′n)−m−(1) because the value of 1-but-only-on-negative-parts is as-or-more negative than f′n on the measure, due to f′n being bounded in [0,1]. Summarizing, ϵ>m(f′n)−m−(1) for all n beyond a certain point, so, for all n beyond a certain point, m(f′n)<ϵ+m−(1)
So we have a sequence of functions in C(X,[0,1]) where m(f′n) limits to m−(1), and our signed measure was arbitrary. Therefore, we have our result that ∀m:inff∈C(X,[0,1])m(f)=m−(1).
Theorem 5: If h is a function C(X)→R that is concave, monotone, uniformly-continuous/Lipschitz, h(0)=0, h(1)=1, and range(f)∉[0,1]→h(f)=−∞, then it specifies a infradistribution/bounded infradistribution by: {(m,b)|b≥(h′)∗(m)}, where h′ is the function given by h′(−f)=−h(f), and (h′)∗ is the convex conjugate of h′. Also, going from a infradistribution to an h and back recovers exactly the infradistribution, and going from an h to a infradistribution and back recovers exactly h.
Proof sketch: This is an extremely long one. Phase 1 and 2 is showing isomorphism. One direction is reshuffling the definition of H until we get the definition of the set built from h′ via convex conjugate, showing that going H to h and back recovers your original set. In the other direction, we show that expectations w.r.t the set we built from H match up with h exactly.
Phase 3 is cleanup of the easy conditions. Nonemptiness is pretty easy to show, the induced set being a set of sa-measures is harder to show and requires moderately fancier arguments, and closure and convexity require looking at basic properties of functions and the convex conjugate. Upper completeness takes some equation shuffling to show but isn't too bad. The weak-minimal bound property is immediate, and normalization is fairly easy.
That just leaves the positive-minimal property and the bounded-minimal properties, respectively, which are nightmares. A lesser nightmare and a greater nightmare. For phase 4 to lay the groundwork for these, we establish an isomorphism between points in H and hyperplanes which lie above the graph of h, as well as a way of certifying that a point in H isn't minimal by what its hyperplane does.
Phase 5 is, for showing positive-minimals, we can tell whether a hyperplane corresponds to an a-measure, and given any hyperplane above the graph of h, construct a lower one that corresponds to a lower point in H that does correspond to an a-measure
Phase 6 is, for bounded-minimals, we take a hyperplane that may correspond to a minimal point, but which is too steep in certain directions. Then, we make an open set that fulfills the two roles of: if you enter it, you're too steep, or you overshoot the hyperplane of interest that you're trying to undershoot. Some fancy equation crunching and one application of Hahn-Banach later, we get a hyperplane that lies above h and doesn't enter our open set we crafted. So, in particular, it undershoots our hyperplane of interest, and isn't too steep. This certifies that our original "too steep" hyperplane didn't actually correspond to a minimal point, so all minimal points must have a bound on their λ values by the duality between hyperplanes above h and points in H.
Fix the convention that supf or inff is assumed to mean f∈C(X), we'll explicitly specify when f has bounds.
Phase 1: Let's show isomorphism. Our first direction is showing H to h and back is H exactly. By upper completion, and Proposition 2, we can also characterize H as
{M|∀f+∃M′∈H:f+(M)≥f+(M′)}
Using Theorem 1 to express all positive functionals as arising from an f∈C(X,[0,1]), and observing that the a constant in front doesn't change which stuff scores lower than which other stuff, so we might as well characterize everything in terms of f, H can also be expressed as
We can swap out C(X,[0,1]) for C(X), because, from the −∞ argument in Theorem 4, f going outside [0,1] means that inf(m′,b′)∈H(m′(f)+b′)=−∞. And then, our H can further be reexpressed as
{(m,b)|∀f:m(f)+b≥EH(f)}={(m,b)|∀f:b≥EH(f)−m(f)}
={(m,b)|b≥supf(EH(f)−m(f))}
Also, EH(f)=h(f)=−h′(−f), so we can rewrite this as:
and, by the definition of the convex conjugate(sup characterization) and the space of finite signed measures being the dual space of C(X), and m(f) being a functional applied to an element, this is {(m,b)|b≥(h′)∗(m)} So, our original set H is identical to the convex-conjugate set, when we go from H to h back to a set of sa-measures.
Proof Phase 2: In the reverse direction for isomorphism, assume that h fulfills the conditions. We want to show that E{(m,b)|b≥(h′)∗(m)}(f)=h(f), so let's begin.
Given an m, we have a natural candidate for minimizing the b, just set it equal to (h′)∗(m). So then we get infm(m(f)+(h′)∗(m))=infm((h′)∗(m)−m(−f))
And this is just... −(h′)∗∗(−f) (proof by Wikipedia article, check the inf characterization), and, because h is continuous over C(X,[0,1]), and concave, and −∞ everywhere outside the legit functions then h′ is continuous over C(X,[−1,0]), and convex, and ∞ everywhere outside the legit functions, so in particular, h′ is convex and lower-semicontinuous and proper, so h′=(h′)∗∗ by the Fenchel-Moreau Theorem. From that, we get
E{(m,b)|b≥(h′)∗(m)}(f)=−(h′)∗∗(−f)=−h′(−f)=h(f)
and we're done with isomorphism. Now that isomorphism has been established, let's show the relevant conditions hold. Namely, nonemptiness, closure, convexity, upper completion, normality, weak-bounded-minimals (phase 3) and positive-minimals (phase 5) and bounded-minimals (assuming h is Lipschitz) (phase 6) to finish off. The last two will be extremely hard.
Begin phase 3. Weak-bounded-minimals is easy by isomorphism. For our H′ we constructed, if f→EH′(f) wasn't uniformly continuous, then because EH′(f) equals h(f), we'd get a failure of uniform continuity for h, which was assumed.
By the way, the convex conjugate, (h′)∗(m), can be expressed as (by Wikipedia, sup charaacterization) supf(m(f)−h′(f))=supf(m(−f)−h′(−f))=supf(h(f)−m(f)) We can further restrict f to functions with range in [0,1], because if it was anything else, we'd get −∞. We'll be using (h′)∗(m)=supf∈C(X,[0,1])(h(f)−m(f)) (or the supf variant) repeatedly.
For nonemptiness, observe that (0,1) is present in H′ because, fixing an arbitrary f,
This is from our format of the convex conjugate, and h being normalized and monotone, so the highest it can be is 1 and it attains that value. Therefore, 1≥(h′)∗(0), so (0,1) is in the H′ we constructed.
For showing that our constructed set H′ lies in Msa(X), we have that, for a random (m++m−,b)∈H′, it has (by our characterization of (h′)∗(m))
b+m−(1)≥supf∈C(X,[0,1])(h(f)−(m++m−)(f))+m−(1)
≥supf∈C(X,[0,1])(−(m++m−)(f))+m−(1)
=m−(1)−inff∈C(X,[0,1])((m++m−)(f))=m−(1)−m−(1)=0
This is by the lower bound on b being (h′)∗(m++m−) and unpacking the convex conjugate, h(f)≥h(0)=0 by monotonicity and normalization, a reexpression of sup, and Lemma 5, respectively. b+m−(1)≥0 so it's an sa-measure.
For closure and convexity, by monotonicity of h, we have 0=−h(0)≥−h(f)≥−h(1)=−1 and h is continuous on C(X,[0,1]), concave, and −∞ everywhere else by assumption, so h′ is proper, continuous on C(X,[−1,0]), convex, and lower-semicontinuous in general because of the ∞ everywhere else, so, by the Wikipedia page on "Closed Convex Function", h′ is a closed convex function, and then by the Wikipedia page on "Convex Conjugate" in the Properties section, (h′)∗ is convex and closed. From the Wikipedia page on "Closed Convex Function", this means that the epigraph of (h′)∗ is closed, and also the epigraph of a convex function is convex. This takes care of closure and convexity for our H′
Time for upper-completeness. Assume that (m,b) lies in the epigraph. Our task now is to show that (m,b)+(m∗,b∗) lies in the epigraph. This is equivalent to showing that b+b∗≥(h′)∗(m+m∗). Note that b∗≥−m∗−(1), because (m∗,b∗) is an sa-measure. Let's begin.
This was done by unpacking the convex conjugate, splitting up m∗ into m∗+ and m∗−, locking two of the components in the sup to be an upper bound (which also gives the sup more flexibility on maximizing the other two components, so this is greater), packing up the convex conjugate, and using that b≥(h′)∗(m) because (m,b)∈H′
Normalization of the resulting set is easy. Going from h to a (maybe)-inframeasure H′ back to h is identity as established earlier, so all we have to do is show that a failure of normalization in a (maybe)-inframeasure makes the resulting h not normalized. Thus, if our h is normalized, and it makes an H′ that isn't normalized, then going back makes a non-normalized h, which contradicts isomorphism. So, assume there's a failure of normalization in H′. Then EH′(0)≠0, or EH′(1)≠1, so either h(0)≠0 or h(1)≠1 and we get a failure of normalization for h which is impossible. So H′ must be normalized.
Begin phase 4. First, continuous affine functionals ϕ that lie above the graph of h perfectly correspond to sa-measures in H′. This is because the continuous dual space of C(X) is the space of finite signed measures, so we can interpret ϕ−ϕ(0) as a finite signed measure, and ϕ(0) as the b term. In one direction, given an (m,b)∈H′,
ϕ(f)=m(f)+b≥inf(m,b)∈H′(m(f)+b)=EH′(f)=h(f)
so every point in H′ induces a continuous affine functional C(X)→R whose graph is above h.
In the other direction, from earlier, we can describe H′ as: {(m,b)|b≥supf(h(f)−m(f))}
and then, for (ϕ−ϕ(0),ϕ(0)),
supf(h(f)−(ϕ−ϕ(0))(f))=supf(h(f)−ϕ(f)+ϕ(0))≤ϕ(0)
because ϕ(f)≥h(f). So continuous affine functionals whose graph lies above the graph of h correspond to points in H′.
So, we have a link between affine functionals that lie above the graph of h, and points in H′. What would a minimal point correspond to? Well, a non-minimal point corresponds to (m,b)+(m∗,b∗), where the latter component is nonzero. There's somef+ where f+((m,b)+(m∗,b∗))>f+(m,b) due to the latter component being nonzero, and for all f+, f+((m,b)+(m∗,b∗))≥f+(m,b). Using Theorem 1 to translate positive functionals to f, this means that the ϕ induced by (m,b) lies below the affine functional induced by (m,b)+(m∗,b∗) over the f∈C(X,[0,1]). So, if there's a different affine functional ψ s.t. ∀f∈C(X,[0,1]):h(f)≤ψ(f)≤ϕ(f), then ϕ must correspond to a nonminimal point.
Further, we can characterize whether ϕ corresponds to an a-measure or not. For a measure, if you increase your function you're feeding in, you increase the value you get back out, f′≥f→ϕ(f′)≥ϕ(f). For a signed measure with some negative component, Lemma 5 says we can find some f′∈C(X,[0,1]) that attain negative value, so you can add one of those f′ to your f and get ϕ(f+f′)<ϕ(f). So, a ϕ corresponds to an a-measure exactly when it's monotone.
Phase 5: Proving positive-minimals. With these links in place, this means we just have to take any old point that's an sa-measure in H′, get a ϕ from it, it'll fulfill certain properties, and use those properties to find a ψ that lies below ϕ and above h on C(X,[0,1]) and is monotone, certifying that ψ corresponds to a point below our minimal-point of interest that's still in H′ but is an a-measure, so we have a contradiction.
To that end, fix a ϕ that corresponds to some point in H′ that's not an a-measure (in particular, it has a negative component), it lies above the graph of h.
Now, translate ϕ to a (mϕ,bϕ), where bϕ=ϕ(0), and mϕ(f)=ϕ(f)−ϕ(0). ϕ is minimized at some f. Since our ϕ corresponds to something that's not an a-measure, (mϕ)−(1)<0
Let our affine continuous functional ψ be defined as ψ(f)=(mϕ)+(f)+ϕ(0)+(mϕ)−(1).
In order to show that ψ corresponds to an a-measure below (mϕ,bϕ) in H′, we need three things. One is that ψ is monotone (is an a-measure), two is that it lies below ϕ over C(X,[0,1]) and three is that it lies above h. Take note of the fact that ϕ(0)+(mϕ)−(1)≥0, because ϕ(0)=bϕ.
For monotonicity of ψ, it's pretty easy. If f′≥f, then
ψ(f′)=ψ(f+(f′−f))=(mϕ)+(f+(f′−f))+ϕ(0)+(mϕ)−(1)
≥(mϕ)+(f)+ϕ(0)+(mϕ)−(1)=ψ(f)
and we're done with that part.
For being less than or equal to ϕ over C(X,[0,1]) (we know it's not the same as ϕ because ϕ isn't monotone and ψ is),
For being ≥h over C(X,[0,1]) it takes a somewhat more sophisticated argument. By Lemma 5, regardless of ϵ, there exists a f′ where mϕ(f′)<(mϕ)−(1)+ϵ. Then, we can go:
The last steps were done via the definition of ϕ, ϕ≥h, and h being monotonic.
So, ψ(f)+ϵ>h(f) for all ϵ and all f∈C(X,[0,1]) getting ψ(f)≥h(f) for f∈C(X) (because h is −∞ everywhere else)
Thus, ψ specifies an a-measure (ψ being monotone) that is below the sa-measure encoded by ϕ (by ϕ≥ψ over C(X,[0,1])), yet ψ≥h, so said point is in H′. This witnesses that there can be no minimal points in H′ that aren't a-measures. That just leaves getting the slope bound from Lipschitzness, the worst part of this whole proof.
Phase 6: Let λ⊙ be the Lipschitz constant for h. Fix a ϕ that corresponds to a minimal point with λ>λ⊙. This violates the Lipschitz bound when traveling from 0 to 1, so the Lipschitz bound is violated in some direction. Further, the graph of ϕ touches the graph of h at some point f∗∈C(X,[0,1]), because if it didn't, you could shift ϕ down further until it did touch, witnessing that the point ϕ came from wasn't minimal (you could sap more from the b term).
Now, if this point is minimal, it should be impossible to craft a ψ which is ≤ϕ over C(X,[0,1]), ≥h, and different from ϕ. We shall craft such a ψ, witnessing that said point isn't actually minimal. Further, said ψ won't violate the Lipschitz bound in any direction. Thus, all affine functionals corresponding to minimal points must obey the Lipschitz bound and be monotone, so they're a-measures with λ≤λ⊙.
In order to do this, we shall craft three sets in C(X)⊕R. A, B1, and B2.
Set A is {(f,b)|f∈C(X,[0,1]),b≤h(f)}. Pretty much, this set is the hypograph of h. It's obviously convex because h is concave, and the hypograph of a concave function is convex. It's closed because h is continuous.
Set B1 is {(f,b)|f∈C(X,(0,1)),b>ϕ(f)}. This could be thought of as the the interior of the epigraph of ϕ restricted to C(X,[0,1]). Undershooting this means you never exceed ϕ over C(X,[0,1]). First, it's open. This is because, due to f being continuous over a compact set X, the maximum and minimum are attained, so any f∈C(X,(0,1)) is bounded below 1 and above 0, so we've got a little bit of room to freely wiggle f in any direction. Further, since ϕ−ϕ(0) is a continuous linear functional on C(X) which is a Banach space, it's a bounded linear functional and has some Lipschitz constant (though it may exceed λ⊙), so we have a little bit of room to freely wiggle b as well. So B1 is open.
Also, B1 is convex, because a mixture of f and f′ that are bounded away from 0 and 1 is also bounded away from 0 and 1, and pb+(1−p)b′>pϕ(f)+(1−p)ϕ(f′)=ϕ(pf+(1−p)f′).
Set B2 is {(f,b)|b>λ⊙d(f,f∗)+ϕ(f∗)}. This could be thought of as an open cone with a point (it's missing that exact point, though) at (f∗,ϕ(f∗)), that opens straight up, and certifies a failure of the λ⊙ bound on slope. If an affine function includes the point (f∗,ϕ(f∗)) in its graph, then if it increases faster than λ⊙ in any direction, it'll land in this set. It's open because, given a point in it, we can freely wiggle the f and b values around a little bit in any direction, and stay in the set. Now we'll show it's convex. Given an (f,b) and (f′,b′) in it, due to C(X) being a Banach space (so it has a norm), we want to check whether pb+(1−p)b′>λ⊙d(pf+(1−p)f′,f∗)+ϕ(f∗).
Observe that (using the defining axioms for a norm)
Ok, so we've got a convex closed set and two convex opens. Now, consider B:=c.h(B1∪B2). The convex hull of an open set is open. We will show that A∩B=∅.
Assume this is false, and that they overlap. The point where they overlap, can then be written as a convex mixture of points from B1∪B2. However, B1 and B2 are both convex, so we can reduce it to a case where we're mixing one point (f,b) from B1 and one point (f′,b′) in B2. And (pf+(1−p)f′,pb+(1−p)b′)∈A.
If p=0, then we've just got a single point in B2. Also, ϕ(f∗)=h(f∗).
b′>λ⊙d(f′,f∗)+ϕ(f∗)=λ⊙d(f′,f∗)+h(f∗)≥h(f′)
This is because ϕ(f∗)=h(f∗) and h has a Lipschitz constant of λ⊙, so it can't increase as fast as we're demanding as we move from f∗ to f′, which stays in C(X,[0,1]). So (f′,b′)∉A.
If p=1, then we've just got a single point in B1. Then b>ϕ(f)≥h(f), so again, (f,b)∉A.
For the case where p isn't 0 or 1, we need a much more sophisticated argument. Remembering that (f,b)∈B1, and (f′,b′)∈B2, we will show that (pf+(1−p)f∗,pb+(1−p)ϕ(f∗)) lies strictly above the graph of h. Both f and f∗ lie in C(X,[0,1]), so their mix lies in the same set, so we don't have to worry about h being undefined there. Also, remember that ϕ≥h over C(X,[0,1]). Now,
The critical > is by the definition of B1, and (f,b)∈B1. So, the b term is strictly too high for this point (different than the one we care about) to land on the graph of h.
With the aid of this, we will consider "what slope do we have as we travel from (pf+(1−p)f∗,pb+(1−p)ϕ(f∗)) to (pf+(1−p)f′,pb+(1−p)b′)"? Said slope is
That critical > is by (f′,b′)∈B2 and the definition of B2.
So, if we start at (pf+(1−p)f∗,pb+(1−p)ϕ(f∗)) (and pf+(1−p)f∗ lies in C(X,[0,1])), we're above the graph of h. Then, we travel to (pf+(1−p)f′,pb+(1−p)b′), where pf+(1−p)f′∈C(X,[0,1]) by assumption that this point is in A, but while doing this, we ascend faster than λ⊙, the Lipschitz constant for h. So, our point of interest (pf+(1−p)f′,pb+(1−p)b′) lies above the graph of h and can't lie in A, and we have a contradiction.
Putting all this together, A∩B=∅. Since B is open, and they're both convex and nonempty, we can invoke Hahn-Banach (first version of the theorem in the "Separation of Sets" section)and conclude they're separated by some continuous linear functional ψL. Said linear functional must increase as b does, because (0,0)∈A, and (0,b) (for some sufficiently large b) lies in B2, thus in B. This means that given any f and a∈R to specify a level, we can find a unique b where ψL(f,b)=a.
So, any level set of this continuous linear functional we crafted can also be interpreted as an affine functional. There's a critical value of the level set that achieves the separation, ψL(f∗,ϕ(f∗)). This is because (f∗,ϕ(f∗))=(f∗,h(f∗))∈A, but (f∗,ϕ(f∗)+ϵ) is in B2, thus in B, for all ϵ. So we've uniquely pinned down which affine function ψ we're going for. Since the graph of ψ is a hyperplane separating A and B (It may touch the set A, just not cut into it, but it doesn't touch B), from looking at the definitions of A and B1 and B2, we can conclude:
From the definition of A, ψ(f)≥h(f), so ψ≥h over C(X,[0,1]).
From the definition of B1, ψ(f)≤ϕ(f) over C(X,(0,1)), and they're both continuous, so we can extend ψ(f)≤ϕ(f) to C(X,[0,1]) by continuity, so ψ≤ϕ over C(X,[0,1]).
Also, h(f∗)≤ψ(f∗)≤ϕ(f∗)=h(f∗), so ψ(f∗)=ϕ(f∗), and this, paired with the ability of B2 to detect whether an affine function exceeds the λ⊙ slope bound (as long as the graph of said function goes through (f∗,ϕ(f∗))), means that the graph of ψ not entering B2 certifies that its Lipschitz constant is λ⊙ or less. Since \phi does enter B2 due to violating the Lipschitz constant bound, this also certifies that ϕ≠ψ.
Putting it all together, given a ϕ which corresponds to a minimal point and violates the Lipschitz bound, we can find a ψ below it that's also above h, so said minimal point isn't actually minimal.
Therefore, if you were to translate a minimal point in the induced H into an affine function above h, it'd have to A: not violate the Lipschitz bound (otherwise we could undershoot it) and B: be monotone (otherwise we could undershoot it). Being monotone certifies that it's an a-measure, and having a Lipschitz constant of λ⊙ or less certifies that the λ of the a-measure is λ⊙ or less. We're finally done!
Fair upfront warning: This is not a particularly readable proof section (though much better than Section 2 about belief functions). There's dense notation, logical leaps due to illusion of transparency since I've spent a month getting fluent with these concepts, and a relative lack of editing since it's long. If you really want to read this, I'd suggest PM-ing me to get a link to MIRIxDiscord, where I'd be able to guide you through it and answer questions.
Proposition 1: If f∈C(X,[0,1]) then f+:(m,b)↦m(f)+b is a positive functional on Msa(X).
Proof Sketch: We just check three conditions. Linearity, being nonnegative on Msa(X), and continuity.
Linearity proof. Using a,a′ for constants,
f+(a(m,b)+a′(m′,b′))=f+(am+a′m′,ab+ab′)=(am+a′m′)(f)+ab+a′b′
=a(m(f)+b)+a′(m′(f)+b′)=af+(m,b)+a′f+(m′,b′)
So we have verified that f+(aM+a′M′)=af+(M)+a′f+(M′) and we have linearity.
Positivity proof: An sa-measure M, writeable as (m,b) has m uniquely writeable as a pair of finite measures m+ (all the positive regions) and a m− (all the negative regions) by the Jordan Decomposition Theorem, and b+m−(1)≥0. So,
f+(M)=m(f)+b=m+(f)+m−(f)+b≥0+m−(1)+b≥0
The first ≥ by 1≥f≥0, so the expectation of f is positive and m− is negative so taking the expectation of 1 is more negative. The second ≥ is by the condition on how m− relates to b.
Continuity proof: Fix a sequence (mn,bn) converging to (m,b). Obviously the b part converges, so now we just need to show that mn(f) converges to m(f). The metric we have on the space of finite signed measures is the KR-metric, which implies the thing we want. This only works for continuous f, not general f.
Theorem 1: Every positive functional on Msa(X) can be written as (m,b)↦c(m(f)+b), where c≥0, and f∈C(X,[0,1])
Proof Sketch: The first part is showing that it's impossible to have a positive functional where the b term doesn't matter, without the positive functional being the one that maps everything to 0. The second part of the proof is recovering our f by applying the positive functional to Dirac-delta measures δx, to see what the function must be on point x.
Part 1: Let's say f+ isn't 0, ie there's some nonzero (m,b) pair where f+(m,b)>0, and yet f+(0,1)=0 (which, by linearity, means that f+(0,b)=0 for all b). We'll show that this situation is impossible.
Then, 0<f+(m,b)=f+(m+,0)+f+(m−,b) by our starting assumption, and Jordan decomposition of m, along with linearity of positive functionals. Now, f+(m−,b)+f+(−2(m−),0)=f+(−(m−),b) because positive functionals are linear, and everything in that above equation is an sa-measure (flipping a negative measure makes a positive measure, which doesn't impose restrictions on the b term except that it be ≥0). And so, by nonnegativity of positive functionals on sa-measures, f+(m−,b)≤f+(−(m−),b). Using this, we get
f+(m+,0)+f+(m−,b)≤f+(m+,0)+f+(−(m−),b)
=f+(m+,0)+f+(−(m−),0)+f+(0,b)=f+(m+,0)+f+(−(m−),0)
Another use of linearity was invoked for the first = in the second line, and then the second = made use of our assumption that f+(0,b)=0 for all b.
At this point, we have derived that 0<f+(m+,0)+f+(−(m−),0). Both of these are positive measures. So, there exists some positive measure m′ where f+(m′,0)>0.
Now, observe that, for all b, 0=f+(0,b)=f+(m′,0)+f+(−(m′),b)
Let b be sufficiently huge to make (−(m′),b) into an sa-measure. Also, since f+(m′,0)>0, f+(−(m′),b)<0, which is impossible because positive functionals are nonnegative on all sa-measures. Contradiction. Due to the contradiction, if there's a nonzero positive functional, it must assign f+(0,1)>0, so let f+(0,1) be our c term.
Proof part 2: Let's try to extract our f. Let f(x):=f+(δx,0)f+(0,1) This is just recovering the value of the hypothesized f on x by feeding our positive functional the measure δx that assigns 1 value to x and nothing else, and scaling. Now, we just have to verify that this f is continuous and in [0,1].
For continuity, let xn limit to x. By the KR-metric we're using, (δxn,0) limits to (δx,0). By continuity of f+, f+(δxn,0) limits to f+(δx,0). Therefore, f(xn) limits to f(x) and we have continuity.
For a lower bound, f≥0, because f(x) is a ratio of two nonnegative numbers, and the denominator isn't 0.
Now we just have to show that f≤1. For contradiction, assume there's an x where f(x)>1. Then f+(δx,0)f+(0,1)>1, so f+(δx,0)>f+(0,1), and in particular, f+(0,1)−f+(δx,0)<0.
But then, f+(−(δx),1)+f+(δx,0)=f+(0,1), so f+(−(δx),1)=f+(0,1)−f+(δx,0)<0
However, (−(δx),1) is an sa-measure, because δx(1)+1=0, and must have nonnegative value, so we get a contradiction. Therefore, f∈C(X,[0,1]).
To wrap up, we can go:
f+(m,b)=f+(m,0)+f+(0,b)=f+(0,1)f+(0,1)(∫X(f+(δx,0))dm+f+(0,b))
=f+(0,1)(∫Xf+(δx,0)f+(0,1)dm+f+(0,b)f+(0,1))=c(∫Xf(x)dm+b)=c(m(f)+b)
And c≥0, and f∈C(X,[0,1]), so we're done.
Lemma 1: Compactness Lemma: Fixing some nonnegative constants λ◯ and b◯, the set of sa-measures where m+(1)∈[0,λ◯], b∈[0,b◯], is compact. Further, if a set lacks an upper bound on m+(1) or on b, it's not compact.
Proof Sketch: We fix an arbitrary sequence of sa-measures, and then use the fact that closed intervals are compact-complete and the space ΔX is compact-complete to isolate a suitable convergent subsequence. Since all sequences have a limit point, the set is compact. Then, we go in the other direction, and get a sequence with no limit points assuming either a lack of upper bounds on m+(1), or a lack of upper bounds on b.
Proof: Fix some arbitrary sequence Mn wandering about within this space, which breaks down into (m+n,0)+(m−n,bn), and then, since all measures are just a probability distribution scaled by the constant m(1), it further breaks down into (m+n(1)⋅μn,0)+(m−n(1)⋅μ′n,bn). Since bn+m−n(1)≥0, m−n(1) must be bounded in [−b◯,0].
Now, what we can do is extract a subseqence where bn ,m+n(1), m−n(1), μn, and μ′n all converge, by Tychonoff's Theorem (finite product, no axiom of choice required) Our three number sequences are all confined to a bounded interval, and our two probability sequences are wandering around within ΔX which is a compact complete metric space if X is. The limit of this subsequence is a limit point of the original sequence, since all its components are arbitrarily close to the components that make up Mn for large enough n in our subsequence.
The limiting value of m+(1) and b both obey their respective bounds, and the cone of sa-measures is closed, so the limit point is an sa-measure and respects the bounds too. Therefore the set is compact, because all sequences of points in it have a limit point.
In the other direction, assume a set B has unbounded b values. Then we can fix a sequence (mn,bn)∈B where bn increases without bound, so the a-measures can't converge. The same applies to all subsequences, so there's no limit point, so B isn't compact.
Now, assume a set B has bounded b values, call the least upper bound b⊙, but the value of m+(1) is unbounded. Fix a sequence (mn,bn)∈B where m+n(1) is unbounded above. Assume a convergent subsequence exists. Since bn+m−n(1)≥0, m−n(1) must be bounded in [−b⊙,0]. Then because mn(1)=m+n(1)+m−n(1)≥m+n(1)−b⊙, and the latter quantity is finite, mn(1) must be unbounded above. However, in order for the mn to limit to some m, limn→∞mn(1)=m(1), which results in a contradiction. Therefore, said convergent subsequence doesn't exist, and B is not compact.
Put together, we have a necessary-and-sufficient condition for a closed subset of Msa(X) to be compact. There must be an upper bound on b and m+(1), respectively.
Lemma 2: The upper completion of a closed set of sa-measures is closed.
Proof sketch: We'll take a convergent sequence (mn,bn) in the upper completion of B that limits to (m,b), and show that, in order for it to converge, the same sorts of bounds as the Compactness Lemma uses must apply. Then, breaking down (mn,bn) into (mBn,bBn)+(m∗n,b∗n), where (mBn,bBn)∈B, and (m∗n,b∗n) is an sa-measure, we'll transfer these Compactness-Lemma-enabling bounds to the sequences (mBn,bBn) and (m∗n,b∗n), to get that they're both wandering around in a compact set. Then, we just take a convergent subsequence of both, add the two limit points together, and get our limit point (m,b), witnessing that it's in the upper completion of B.
Proof: Let (mn,bn)∈B+Msa(X) limit to some (m,b). A convergent sequence (plus its one limit point) is a compact set of points, so, by the Compactness Lemma, there must be a b◯ and λ◯ that are upper bounds on the bn and m+n(1) values, respectively.
Now, for all n, break down (mn,bn) as (mBn,bBn)+(m∗n,b∗n), where (mBn,bBn)∈B, and (m∗n,b∗n) is an sa-measure.
Because bBn+b∗n=bn≤b◯, we can bound the bBn and b∗n quantities by b◯. This transfers into a −b◯ lower bound on mB−n(1) and m∗−n(1), respectively.
Now, we can go:
mB+n(1)+mB−n(1)+m∗+n(1)+m∗−n(1)=mBn(1)+m∗n(1)=mn(1)
=m+n(1)+m−n(1)≤m+n(1)≤λ◯
Using worst-case values for mB−n(1) and m∗−n(1), we get:
mB+n(1)+m∗+n(1)−2b◯≤λ◯
mB+n(1)+m∗+n(1)≤λ◯+2b◯
So, we have upper bounds on mB+n(1) and m∗+n(1) of λ◯+2b◯, respectively.
Due to the sequences (mBn,bBn) and (m∗n,b∗n) respecting bounds on b and m+(1) (b◯ and λ◯+2b◯ respectively), and wandering around within the closed sets B and Msa(X) respectively, we can use the Compactness Lemma and Tychonoff's theorem (finite product, no axiom of choice needed) to go "hey, there's a subsequence where both (mBn,bBn) and (m∗n,b∗n) converge, call the limit points (mB,bB) and (m∗,b∗). Since B and Msa(X) are closed, (mB,bB)∈B, and (m∗,b∗)∈Msa(X)."
Now, does (mB,bB)+(m∗,b∗)=(m,b)? Well, for any ϵ, there's some really large n where d((mBn,bBn),(mB,bB))<ϵ, d((m∗n,b∗n),(m∗,b∗))<ϵ, and d((mn,bn),(m,b))<ϵ. Then, we can go:
d((m,b),(mB,bB)+(m∗,b∗))≤d((m,b),(mn,bn))+d((mn,bn),(mB,bB)+(m∗,b∗))
=d((m,b),(mn,bn))+d((mBn,bBn)+(m∗n,b∗n),(mB,bB)+(m∗,b∗))
=d((m,b),(mn,bn))+||((mBn,bBn)+(m∗n,b∗n))−((mB,bB)+(m∗,b∗))||
=d((m,b),(mn,bn))+||((mBn,bBn)−(mB,bB))+((m∗n,b∗n)−(m∗,b∗))||
≤d((m,b),(mn,bn))+||(mBn,bBn)−(mB,bB)||+||(m∗n,b∗n)−(m∗,b∗)||
=d((m,b),(mn,bn))+d((mBn,bBn),(mB,bB))+d((m∗n,b∗n),(m∗,b∗))<3ϵ
So, regardless of ϵ, d((m,b),(mB,bB)+(m∗,b∗))<3ϵ, so (mB,bB)+(m∗,b∗)=(m,b). So, we've written (m,b) as a sum of an sa-measure in B and an sa-measure, certifying that (m,b)∈B+Msa(X), so B+Msa(X) is closed.
Proposition 2: For closed convex nonempty B,B+Msa(X)={M|∀f+∃M′∈B:f+(M)≥f+(M′)}
Proof sketch: Show both subset inclusion directions. One is very easy, then we assume the second direction is false, and invoke the Hahn-Banach theorem to separate a point in the latter set from the former set. Then we show that the separating functional is a positive functional, so we have a positive functional where the additional point underperforms everything in B+Msa(X), which is impossible by the definition of the latter set.
Easy direction: We will show that B+Msa(X)⊆{M|∀f+∃M′∈B:f+(M)≥f+(M′)}
This is because a M∈(B+Msa(X)), can be written as M=MB+M∗. Let MB be our M′ of interest. Then, it is indeed true that for all f+, f+(M)=f+(MB)+f+(M∗)≥f+(MB)
Hard direction: Assume by contradiction that
B+Msa(X)⊂{M|∀f+∃M′∈B:f+(M)≥f+(M′)}
Then there's some M where ∀f+∃M′∈B:f+(M)≥f+(M′) and M∉B+Msa(X). B+Msa(X) is the upper completion of a closed set, so by the Compactness Lemma, it's closed, and since it's the Minkowski sum of convex sets, it's convex.
Now, we can use the variant of the Hahn-Banach theorem from the Wikipedia article on "Hahn-Banach theorem", in the "separation of a closed and compact set" section. Our single point M is compact, convex, nonempty, and disjoint from the closed convex set B+Msa(X). Banach spaces are locally convex, so we can invoke Hahn-Banach separation.
Therefore, there's some continuous linear functional ϕ s.t. ϕ(M)<infM′∈(B+Msa(X))ϕ(M′)
We will show that this linear functional is actually a positive functional!
Assume there's some sa-measure M∗ where ϕ(M∗)<0. Then we can pick a random MB∈B, and consider ϕ(MB+cM∗), where c is extremely large. MB+cM∗ lies in B+Msa(X), but it would also produce an extremely negative value for \phi which undershoots ϕ(M) which is impossible. So ϕ is a positive functional.
However, ϕ(M)<infM′∈(B+Msa(X))ϕ(M′), so ϕ(M)<infM′∈Bϕ(M′). But also, M fulfills the condition ∀f+∃M′∈B:f+(M)≥f+(M′), because of the set it came from. So, there must exist some M′∈B where ϕ(M)≥ϕ(M′). But, we have a contradiction, because ϕ(M)<infM′∈Bϕ(M′).
So, there cannot be any point in {M|∀f+∃M′∈B:f+(M)≥f+(M′)} that isn't in B+Msa(X). This establishes equality.
Lemma 3: For any closed set B⊆Msa(X) and point M∈B, the set ({M}−Msa(X))∩B is nonempty and compact.
Proof: It's easy to verify nonemptiness, because M is in the set. Also, it's closed because it's the intersection of two closed sets. B was assumed closed, and the other part is the Minkowski sum of {M} and −Msa(X), which is closed if −Msa(X) is, because it's just a shift of −Msa(X) (via a single point). −Msa(X) is closed because it's -1 times a closed set.
We will establish a bound on the m+(1) and b values of anything in the set, which lets us invoke the Compactness Lemma to show compactness, because it's a closed subset of a compact set.
Note that if M′∈({M}−Msa(X))∩B, then M′=M−M∗, so M′+M∗=M. Rewrite this as (m′,b′)+(m∗,b∗)=(m,b)
Because b′+b∗=b, we can bound b′ and b∗ by b. This transfers into a −b lower bound on m′−(1) and m∗−(1). Now, we can go:
m′+(1)+m′−(1)+m∗+(1)+m∗−(1)=m′(1)+m∗(1)=m(1)
=m+(1)+m−(1)≤m+(1)
Using worst-case values for m′−(1) and m∗−(1), we get:
m′+(1)+m′+(1)−2b≤m+(1)
m′+(1)≤m′+(1)+m∗+(1)≤m+(1)+2b
So, we have an upper bound of m+(1)+2b on m′+(1), and an upper bound of b on b′. Further, (m′,b′) was arbitrary in ({M}−Msa(X))∩B, so we have our bounds. This lets us invoke the Compactness Lemma, and conclude that said closed set is compact.
Lemma 4: If ≥ is a partial order on B where M′≥M iff there's some sa-measure M∗ where M=M′+M∗, then
∃M′>M↔(M∈B∧∃M′≠M:M′∈{M}−Msa(X))∩B)↔M is not minimal in B
Proof: ∃M′>M↔∃M′≠M:M′≥M
Also, M′≥M↔(M′,M∈B∧∃M∗:M=M′+M∗)
Also, ∃M∗:M=M′+M∗↔∃M∗:M−M∗=M′↔M′∈({M}−Msa(X))
Putting all this together, we get
(∃M′>M)↔(M∈B∧∃M′≠M:M′∈({M}−Msa(X))∩B)
And we're halfway there. Now for the second half.
M is not minimal in B↔M∈B∧(∃M′∈B:M′≠M∧(∃M∗:M=M′+M∗))
Also, ∃M∗:M=M′+M∗↔∃M∗:M−M∗=M′↔M′∈({M}−Msa(X))
Putting this together, we get
M is not minimal in B↔(M∈B∧∃M′≠M:M′∈({M}−Msa(X))∩B)
And the result has been proved.
Theorem 2: Given a nonempty closed set B, the set of minimal points Bmin is nonempty and all points in B are above a minimal point.
Proof sketch: First, we establish an partial order that's closely tied to the ordering on B, but flipped around, so minimal points in B are maximal elements. We show that it is indeed a partial order, letting us leverage Lemma 4 to translate between the partial order and the set B. Then, we show that every chain in the partial order has an upper bound via Lemma 3 and compactness arguments, letting us invoke Zorn's lemma to show that that everything in the partial order is below a maximal element. Then, we just do one last translation to show that minimal points in B perfectly correspond to maximal elements in our partial order.
Proof: first, impose a partial order on B, where M′≥M iff there's some sa-measure M∗ where M=M′+M∗. Notice that this flips the order. If an sa-measure is "below" another sa-measure in the sa-measure addition sense, it's above that sa-measure in this ordering. So a minimal point in B would be maximal in the partial order. We will show that it's indeed a partial order.
Reflexivity is immediate. M=M+(0,0), so M≥M.
For transitivity, assume M′′≥M′≥M. Then there's some M∗ and M′∗ s.t. M=M′+M∗, and M′=M′′+M′∗. Putting these together, we get M=M′′+(M∗+M′∗), and adding sa-measures gets you an sa-measure, so M′′≥M.
For antisymmetry, assume M′≥M and M≥M′. Then M=M′+M∗, and M′=M+M′∗. By substitution, M=M+(M∗+M′∗), so M′∗=−M∗. For all positive functionals, f+(M′∗)=f+(−M∗)=−f+(M∗), and since positive functionals are always nonnegative on sa-measures, the only way this can happen is if M∗ and M′∗ are 0, showing that M=M′.
Anyways, since we've shown that it's a partial order, all we now have to do is show that every chain has an upper bound in order to invoke Zorn's lemma to show that every point in B lies below some maximal element.
Fix some ordinal-indexed chain Mγ, and associate each of them with the set Sγ=({Mγ}+(−Msa(X)))∩B, which is compact by Lemma 3 and always contains Mγ.
The collection of Sγ also has the finite intersection property, because, fixing finitely many of them, we can consider a maximal γ∗, and Mγ∗ is in every associated set by:
Case 1: Some other Mγ equals Mγ∗, so Sγ=Sγ∗ and Mγ∗∈Sγ∗=Sγ.
Case 2: Mγ∗>Mγ, and by Lemma 4, Mγ∗∈({Mγ}−Msa(X))∩B.
Anyways, since all the Sγ are compact, and have the finite intersection property, we can intersect them all and get a nonempty set containing some point M∞. M∞ lies in B, because all the sets we intersected were subsets of B. Also, because M∞∈(Mγ−Msa(X))∩B for all γ in our chain, then if M∞≠Mγ, Lemma 4 lets us get M∞>Mγ, and if M∞=Mγ, then M∞≥Mγ. Thus, M∞ is an upper bound for our chain.
By Zorn's Lemma, because every chain has an upper bound, there are maximal elements in B, and every point in B has a maximal element above it.
To finish up, use Lemma 4 to get: M is maximal↔¬∃M′>M↔M is minimal in B
Proposition 3: Given a f∈C(X,[0,1]), and a B that is nonempty closed, inf(m,b)∈B(m(f)+b)=inf(m,b)∈Bmin(m(f)+b)
Direction 1: since Bmin is a subset of B, we get one direction easily, that
inf(m,b)∈B(m(f)+b)≤inf(m,b)∈Bmin(m(f)+b)
Direction 2: Take a M∈B. By Theorem 2, there is a Mmin∈Bmin s.t. M=Mmin+M∗. Applying our positive functional m(f)+b (by Proposition 1), we get that m(f)+b≥mmin(f)+bmin. Because every point in B has a point in Bmin which scores as low or lower according to the positive functional,
inf(m,b)∈B(m(f)+b)≥inf(m,b)∈Bmin(m(f)+b)
And this gives us our desired equality.
Proposition 4: Given a nonempty closed convex B, Bmin=(Buc)min and (Bmin)uc=Buc
Proof: First, we'll show Bmin=(Buc)min. We'll use the characterization in terms of the partial order ≤ we used for the Zorn's Lemma proof of Theorem 2. If a point M is in Buc, then it can be written as M=MB+M∗, so M≤MB. Since all points added in Buc lie below a preexisting point in B (according to the partial order from Theorem 2) the set of maximals (ie, set of minimal points) is completely unchanged when we add all the new points to the partial order via upper completion, so Bmin=(Buc)min.
For the second part, one direction is immediate. Bmin⊆B, so (Bmin)uc⊆Buc. For the reverse direction, take a point M∈Buc. It can be decomposed as MB+M∗, and then by Theorem 2, MB can be decomposed as Mmin+M′∗, so M=Mmin+(M∗+M′∗), so it lies in (Bmin)uc, and we're done.
Theorem 3: If the nonempty closed convex sets A and B have Amin≠Bmin, then there is some f∈C(X,[0,1]) where EA(f)≠EB(f)
Proof sketch: We show that upper completion is idempotent, and then use that to show that the upper completions of A and B are different. Then, we can use Hahn-Banach to separate a point of A from Buc (or vice-versa), and show that the separating functional is a positive functional. Finally, we use Theorem 1 to translate from a separating positive functional to different expectation values of some f∈C(X,[0,1])
Proof: Phase 1 is showing that upper completion is idempotent. (Buc)uc=Buc. One direction of this is easy, Buc⊆(Buc)uc. In the other direction, let M∈(Buc)uc. Then we can decompose M into M′+M∗, where M′∈Buc, and decompose that into MB+M′∗ where MB∈B, so M=MB+(M∗+M′∗) and M∈Buc.
Now for phase 2, we'll show that the minimal points of one set aren't in the upper completion of the other set. Assume, for contradiction, that this is false, so Amin⊆Buc and Bmin⊆Auc. Then, by idempotence, Proposition 4, and our subset assumption,
Auc=(Amin)uc⊆(Buc)uc=Buc
Swapping the A and B, the same argument holds, so Auc=Buc, so (Buc)min=(Auc)min.
Now, using this and Proposition 4, Bmin=(Buc)min=(Auc)min=Amin.
But wait, we have a contradiction, we said that the minimal points of B and A weren't the same! Therefore, either Bmin⊈Auc, or vice-versa. Without loss of generality, assume that Bmin⊈Auc.
Now for phase 3, Hahn-Banach separation to get a positive functional with different inf values. Take a point MB in Bmin that lies outside Auc. Now, use the Hahn-Banach separation of {MB} and Auc used in the proof of Proposition 2, to get a linear functional ϕ (which can be demonstrated to be a positive functional by the same argument as the proof of Proposition 2) where: ϕ(MB)<infM∈Aucϕ(M). Thus, infM∈Bϕ(M)<infM∈Aϕ(M), so infM∈Bϕ(M)≠infM∈Aϕ(M)
Said positive functional can't be 0, otherwise both sides would be 0. Thus, by Theorem 1, ϕ((m,b))=a(m(f)+b) where a>0, and f∈C(X,[0,1]). Swapping this out, we get:
inf(m,b)∈Ba(m(f)+b)≠inf(m′,b′)∈Aa(m′(f)+b′)
inf(m,b)∈B(m(f)+b)≠inf(m′,b′)∈A(m′(f)+b′)
and then this is EB(f)≠EA(f) So, we have crafted our f∈C(X,[0,1]) which distinguishes the two sets and we're done.
Corollary 1: If two nonempty closed convex upper-complete sets A and B are different, then there is some f∈C(X,[0,1]) where EA(f)≠EB(f)
Proof: Either Amin≠Bmin, in which case we can apply Theorem 3 to separate them, or their sets of minimal points are the same. In that case, by Proposition 4 and upper completion, A=Auc=(Amin)uc=(Bmin)uc=Buc=B and we have a contradiction because the two set are different.
Theorem 4: If H is an infradistribution/bounded infradistribution, then h:f↦EH(f) is concave in f, monotone, uniformly continuous/Lipschitz, h(0)=0,h(1)=1, and if range(f)⊈[0,1], h(f)=−∞
Proof sketch: h(0)=0,h(1)=1 is trivial, as is uniform continuity from the weak bounded-minimal condition. For concavity and monotonicity, it's just some inequality shuffling, and for h(f)=∞ if f∈C(X),f∉C(X,[0,1]), we use upper completion to have its worst-case value be arbitrarily negative. Lipschitzness is much more difficult, and comprises the bulk of the proof. We get a duality between minimal points and hyperplanes in C(X)⊕R, show that all the hyperplanes we got from minimal points have the same Lipschitz constant upper bound, and then show that the chunk of space below the graph of h itself is the same as the chunk of space below all the hyperplanes we got from minimal points. Thus, h has the same (or lesser) Lipschitz constant as all the hyperplanes chopping out stuff above the graph of h.
Proof: For normalization, h(1)=EH(1)=1 and h(0)=EH(0)=0 by normalization for H. Getting the uniform continuity condition from the weak-bounded-minimal condition on an infradistribution H is also trivial, because the condition just says f↦EH(f) is uniformly continuous, and that's just h itself.
Let's show that h is concave over C(X,[0,1]), first. We're shooting for h(pf+(1−p)f′)≥ph(f)+(1−p)h(f′). To show this,
h(pf+(1−p)f′)=EH(pf+(1−p)f′)=inf(m,b)∈H(m(pf+(1−p)f′)+b)
=inf(m,b)∈H(p(m(f)+b)+(1−p)(m(f′)+b))
≥pinf(m,b)∈H(m(f)+b′)+(1−p)inf(m′,b′)∈H(m′(f′)+b′)
=pEH(f)+(1−p)EH(f′)=ph(f)+(1−p)h(f′)
And concavity has been proved.
Now for monotonicity. By Proposition 3 and Proposition 1,
∀f:inf(m,b)∈H(m(f)+b)=inf(m,b)∈Hmin(m(f)+b)
Now, let's say f′≥f. Then:
EH(f)=inf(m,b)∈H(m(f)+b)=inf(m,b)∈Hmin(m(f)+b)≤inf(m,b)∈Hmin(m(f′)+b)
=inf(m,b)∈H(m(f′)+b)=EH(f′)
And we're done. The critical inequality in the middle came from all minimal points in an infradistribution having no negative component by positive-minimals, so swapping out a function for a greater function produces an increase in value.
Time for range(f)⊈[0,1]→h(f)=−∞. Let's say there exists an x s.t. f(x)>1. We can take an arbitrary sa-measure (m,b)∈H, and consider (m,b)+c(−δx,1), where δx is the point measure that's 1 on x, and c is extremely huge. The latter part is an sa-measure. But then,(m−cδx)(f)+(b+c)=m(f)+b+c(1−δx(f))=m(f)+b+c(1−f(x)). Since f(x)>1, and c is extremely huge, this is extremely negative. So, since there's sa-measures that make the function as negative as we wish in H by upper-completeness, inf(m,b)∈H(m(f)+b)=−∞ A very similar argument can be done if there's an x where f(x)<0, we just add in (cδx,0) to force arbitrarily negative values.
Now for Lipschitzness, which is by far the worst of all. A minimal point (m,b) induces an affine function hm,b (kinda like a hyperplane) of the form hm,b(f)=m(f)+b. Regardless of (m,b), as long as it came from a minimal point in H, hm,b≥h for functions with range in [0,1], because
hm,b(f)=m(f)+b≥inf(m,b)∈H(m(f)+b)=EH(f)=h(f)
Ok, so if a point is on-or-below the graph of h over C(X,[0,1]), then it's on-or-below the graph of hm,b for all (m,b)∈Hmin.
What about the other direction? Is it possible for a point (f,b′) to be strictly above the graph of h and yet ≤ all the graphs of hm,b? Well, no. Invoking Proposition 3,
b′>h(f)=EH(f)=inf(m,b)∈H(m(f)+b)=inf(m,b)∈Hmin(m(f)+b)=inf(m,b)∈Hmin(hm,b(f))
So, there exists a minimal point (m,b)∈Hmin where b′>hm,b(f), so (f,b′) lies above the graph of hm,b.
Putting these two parts together, h's hypograph over C(X,[0,1]) is the same as the intersection of the hypographs of all these hm,b. If we can then show all the hm,b have a Lipschitz constant bounded above by some constant, then we get that h itself is Lipschitz with the same constant.
First, a minimal (m,b) must have m having no negative parts, so it can be written as λμ, and by bounded-minimals (since we have a bounded infradistribution), λ≤λ⊙. Now,
|hm,b(f)−hm,b(f′)|=|m(f)+b−m(f′)−b|=|m(f−f′)|≤m(|f−f′|)
=(λμ)(|f−f′|)=λ(μ)(|f−f′|)≤λsupx∈X|f(x)−f′(x)|≤λ⊙supx∈X|f(x)−f′(x)|
So, we get that: |hm,b(f)−hm,b(f′)|supx∈X|f(x)−f′(x)|≤λ⊙supx∈X|f(x)−f′(x)|supx∈X|f(x)−f′(x)|=λ⊙
Note that supx∈X|f(x)−f′(x)| is our distance metric between functions in C(X). This establishes that regardless of which minimal point we picked, hm,b is Lipschitz with Lipschitz constant ≤λ⊙, and since h=inf(m,b)∈Hminhm,b, then h itself has the same bound on its Lipschitz constant.
Lemma 5: ∀m:inff∈C(X,[0,1])(m(f))=m−(1)
Proof sketch: We'll work in the Banach space L1(|m|) of L1 measurable functions w.r.t the absolute value of the signed measure m. Then, we consider the discontinuous (but L1) function that's 1 everywhere where m is negative. Continuous functions are dense in L1 measurable functions, so we can fix a sequence of continuous functions limiting to said indicator function. Then we just have to check that f↦m(f) is a bounded linear functional, and we get that there's a sequence of continuous functions f′n where m(f′n) limits to the measure of the indicator function that's 1 where everything is negative. Which is the same as the measure of the "always 1" function, but only on the negative parts, and we're done.
Consider the Banach space L1(|m|) of measurable functions w.r.t. the absolute value of the signed measure m, ie, |m|=m+−m−, which is a measure. It has a norm given by ||f||=∫X|f|d|m|. To begin with, we can consider the L1 indicator function 1m− that's 1 where the measure is negative. Note that
m(1m−)=∫X1m−dm=∫X1m−dm++∫X1m−dm−
=∫X0dm++∫X1dm−=∫X1dm−=m−(1)
Because continuous functions are dense in L1, we can fix a sequence of continuous functions fn limiting to 1m−. Then, just clip those continuous functions to [0,1], making a continuous function f′n. They'll get closer to 1m− that way, so the sequence f′n of continuous functions X→[0,1] limits to 1m− too.
We'll take a detour and show that m is a bounded linear functional L1(|m|)→R, with a Lipschitz constant of 1 or less.
First, m(af+a′f′)=am(f)+a′m(f′), trivially, establishing linearity.
As for the boundedness, if ||f||≤1, then ∫X|f|d|m|≤1, so:
1≥∫X|f|d|m|=∫Xsup(f,0)d|m|−∫Xinf(f,0)d|m|
=∫Xsup(f,0)dm++∫Xsup(f,0)d|m−|−∫Xinf(f,0)dm+−∫Xinf(f,0)d|m−|
=∣∣∫Xsup(f,0)dm+∣∣+∣∣−∫Xsup(f,0)d|m−|∣∣+∣∣∫Xinf(f,0)dm+∣∣+∣∣−∫Xinf(f,0)d|m−|∣∣
≥∣∣∫Xsup(f,0)dm+−∫Xsup(f,0)d|m−|+∫Xinf(f,0)dm+−∫Xinf(f,0)d|m−|∣∣
=∣∣∫Xsup(f,0)dm+∫Xinf(f,0)dm∣∣=∣∣∫Xfdm∣∣=|m(f)|
So, m(f)∈[−1,1]. An f having a norm of 1 or less gets mapped to a number with a norm of 1 or less, so the Lipschitz constant of f↦m(f) is 1 or less. This implies continuity.
Now that we have all requisite components, fix some ϵ. There's some n where, for all greater n, d(1m−,f′n)<ϵ. Mapping them through f↦m(f), due to having a Lipschitz constant of 1 or less, then means that ϵ>|m(f′n)−m(1m−)|=m(f′n)−m(1m−)=m(f′n)−m−(1) because the value of 1-but-only-on-negative-parts is as-or-more negative than f′n on the measure, due to f′n being bounded in [0,1]. Summarizing, ϵ>m(f′n)−m−(1) for all n beyond a certain point, so, for all n beyond a certain point, m(f′n)<ϵ+m−(1)
So we have a sequence of functions in C(X,[0,1]) where m(f′n) limits to m−(1), and our signed measure was arbitrary. Therefore, we have our result that ∀m:inff∈C(X,[0,1])m(f)=m−(1).
Theorem 5: If h is a function C(X)→R that is concave, monotone, uniformly-continuous/Lipschitz, h(0)=0, h(1)=1, and range(f)∉[0,1]→h(f)=−∞, then it specifies a infradistribution/bounded infradistribution by: {(m,b)|b≥(h′)∗(m)}, where h′ is the function given by h′(−f)=−h(f), and (h′)∗ is the convex conjugate of h′. Also, going from a infradistribution to an h and back recovers exactly the infradistribution, and going from an h to a infradistribution and back recovers exactly h.
Proof sketch: This is an extremely long one. Phase 1 and 2 is showing isomorphism. One direction is reshuffling the definition of H until we get the definition of the set built from h′ via convex conjugate, showing that going H to h and back recovers your original set. In the other direction, we show that expectations w.r.t the set we built from H match up with h exactly.
Phase 3 is cleanup of the easy conditions. Nonemptiness is pretty easy to show, the induced set being a set of sa-measures is harder to show and requires moderately fancier arguments, and closure and convexity require looking at basic properties of functions and the convex conjugate. Upper completeness takes some equation shuffling to show but isn't too bad. The weak-minimal bound property is immediate, and normalization is fairly easy.
That just leaves the positive-minimal property and the bounded-minimal properties, respectively, which are nightmares. A lesser nightmare and a greater nightmare. For phase 4 to lay the groundwork for these, we establish an isomorphism between points in H and hyperplanes which lie above the graph of h, as well as a way of certifying that a point in H isn't minimal by what its hyperplane does.
Phase 5 is, for showing positive-minimals, we can tell whether a hyperplane corresponds to an a-measure, and given any hyperplane above the graph of h, construct a lower one that corresponds to a lower point in H that does correspond to an a-measure
Phase 6 is, for bounded-minimals, we take a hyperplane that may correspond to a minimal point, but which is too steep in certain directions. Then, we make an open set that fulfills the two roles of: if you enter it, you're too steep, or you overshoot the hyperplane of interest that you're trying to undershoot. Some fancy equation crunching and one application of Hahn-Banach later, we get a hyperplane that lies above h and doesn't enter our open set we crafted. So, in particular, it undershoots our hyperplane of interest, and isn't too steep. This certifies that our original "too steep" hyperplane didn't actually correspond to a minimal point, so all minimal points must have a bound on their λ values by the duality between hyperplanes above h and points in H.
Fix the convention that supf or inff is assumed to mean f∈C(X), we'll explicitly specify when f has bounds.
Phase 1: Let's show isomorphism. Our first direction is showing H to h and back is H exactly. By upper completion, and Proposition 2, we can also characterize H as
{M|∀f+∃M′∈H:f+(M)≥f+(M′)}
Using Theorem 1 to express all positive functionals as arising from an f∈C(X,[0,1]), and observing that the a constant in front doesn't change which stuff scores lower than which other stuff, so we might as well characterize everything in terms of f, H can also be expressed as
{(m,b)|∀f∈C(X,[0,1]):m(f)+b≥inf(m′,b′)∈H(m′(f)+b′)}
We can swap out C(X,[0,1]) for C(X), because, from the −∞ argument in Theorem 4, f going outside [0,1] means that inf(m′,b′)∈H(m′(f)+b′)=−∞. And then, our H can further be reexpressed as
{(m,b)|∀f:m(f)+b≥EH(f)}={(m,b)|∀f:b≥EH(f)−m(f)}
={(m,b)|b≥supf(EH(f)−m(f))}
Also, EH(f)=h(f)=−h′(−f), so we can rewrite this as:
{(m,b)|b≥sup−f(m(−f)−h′(−f))}={(m,b)|b≥supf(m(f)−h′(f))}
and, by the definition of the convex conjugate(sup characterization) and the space of finite signed measures being the dual space of C(X), and m(f) being a functional applied to an element, this is {(m,b)|b≥(h′)∗(m)} So, our original set H is identical to the convex-conjugate set, when we go from H to h back to a set of sa-measures.
Proof Phase 2: In the reverse direction for isomorphism, assume that h fulfills the conditions. We want to show that E{(m,b)|b≥(h′)∗(m)}(f)=h(f), so let's begin.
E{(m,b)|b≥(h′)∗(m)}(f)=inf(m,b):b≥(h′)∗(m)(m(f)+b)
Given an m, we have a natural candidate for minimizing the b, just set it equal to (h′)∗(m). So then we get infm(m(f)+(h′)∗(m))=infm((h′)∗(m)−m(−f))
And this is just... −(h′)∗∗(−f) (proof by Wikipedia article, check the inf characterization), and, because h is continuous over C(X,[0,1]), and concave, and −∞ everywhere outside the legit functions then h′ is continuous over C(X,[−1,0]), and convex, and ∞ everywhere outside the legit functions, so in particular, h′ is convex and lower-semicontinuous and proper, so h′=(h′)∗∗ by the Fenchel-Moreau Theorem. From that, we get
E{(m,b)|b≥(h′)∗(m)}(f)=−(h′)∗∗(−f)=−h′(−f)=h(f)
and we're done with isomorphism. Now that isomorphism has been established, let's show the relevant conditions hold. Namely, nonemptiness, closure, convexity, upper completion, normality, weak-bounded-minimals (phase 3) and positive-minimals (phase 5) and bounded-minimals (assuming h is Lipschitz) (phase 6) to finish off. The last two will be extremely hard.
Begin phase 3. Weak-bounded-minimals is easy by isomorphism. For our H′ we constructed, if f→EH′(f) wasn't uniformly continuous, then because EH′(f) equals h(f), we'd get a failure of uniform continuity for h, which was assumed.
By the way, the convex conjugate, (h′)∗(m), can be expressed as (by Wikipedia, sup charaacterization) supf(m(f)−h′(f))=supf(m(−f)−h′(−f))=supf(h(f)−m(f)) We can further restrict f to functions with range in [0,1], because if it was anything else, we'd get −∞. We'll be using (h′)∗(m)=supf∈C(X,[0,1])(h(f)−m(f)) (or the supf variant) repeatedly.
For nonemptiness, observe that (0,1) is present in H′ because, fixing an arbitrary f,
(h′)∗(0)=supf∈C(X,[0,1])(h(f)−0(f))=supf∈C(X,[0,1])h(f)=1
This is from our format of the convex conjugate, and h being normalized and monotone, so the highest it can be is 1 and it attains that value. Therefore, 1≥(h′)∗(0), so (0,1) is in the H′ we constructed.
For showing that our constructed set H′ lies in Msa(X), we have that, for a random (m++m−,b)∈H′, it has (by our characterization of (h′)∗(m))
b+m−(1)≥supf∈C(X,[0,1])(h(f)−(m++m−)(f))+m−(1)
≥supf∈C(X,[0,1])(−(m++m−)(f))+m−(1)
=m−(1)−inff∈C(X,[0,1])((m++m−)(f))=m−(1)−m−(1)=0
This is by the lower bound on b being (h′)∗(m++m−) and unpacking the convex conjugate, h(f)≥h(0)=0 by monotonicity and normalization, a reexpression of sup, and Lemma 5, respectively. b+m−(1)≥0 so it's an sa-measure.
For closure and convexity, by monotonicity of h, we have 0=−h(0)≥−h(f)≥−h(1)=−1 and h is continuous on C(X,[0,1]), concave, and −∞ everywhere else by assumption, so h′ is proper, continuous on C(X,[−1,0]), convex, and lower-semicontinuous in general because of the ∞ everywhere else, so, by the Wikipedia page on "Closed Convex Function", h′ is a closed convex function, and then by the Wikipedia page on "Convex Conjugate" in the Properties section, (h′)∗ is convex and closed. From the Wikipedia page on "Closed Convex Function", this means that the epigraph of (h′)∗ is closed, and also the epigraph of a convex function is convex. This takes care of closure and convexity for our H′
Time for upper-completeness. Assume that (m,b) lies in the epigraph. Our task now is to show that (m,b)+(m∗,b∗) lies in the epigraph. This is equivalent to showing that b+b∗≥(h′)∗(m+m∗). Note that b∗≥−m∗−(1), because (m∗,b∗) is an sa-measure. Let's begin.
(h′)∗(m+m∗)=supf∈C(X,[0,1])(h(f)−(m+m∗)(f))
=supf∈C(X,[0,1])(h(f)−m(f)−m∗+(f)−m∗−(f))≤supf∈C(X,[0,1])(h(f)−m(f)+b∗)
=b∗+supf∈C(X,[0,1])(h(f)−m(f))=b∗+(h′)∗(m)≤b∗+b
This was done by unpacking the convex conjugate, splitting up m∗ into m∗+ and m∗−, locking two of the components in the sup to be an upper bound (which also gives the sup more flexibility on maximizing the other two components, so this is greater), packing up the convex conjugate, and using that b≥(h′)∗(m) because (m,b)∈H′
Normalization of the resulting set is easy. Going from h to a (maybe)-inframeasure H′ back to h is identity as established earlier, so all we have to do is show that a failure of normalization in a (maybe)-inframeasure makes the resulting h not normalized. Thus, if our h is normalized, and it makes an H′ that isn't normalized, then going back makes a non-normalized h, which contradicts isomorphism. So, assume there's a failure of normalization in H′. Then EH′(0)≠0, or EH′(1)≠1, so either h(0)≠0 or h(1)≠1 and we get a failure of normalization for h which is impossible. So H′ must be normalized.
Begin phase 4. First, continuous affine functionals ϕ that lie above the graph of h perfectly correspond to sa-measures in H′. This is because the continuous dual space of C(X) is the space of finite signed measures, so we can interpret ϕ−ϕ(0) as a finite signed measure, and ϕ(0) as the b term. In one direction, given an (m,b)∈H′,
ϕ(f)=m(f)+b≥inf(m,b)∈H′(m(f)+b)=EH′(f)=h(f)
so every point in H′ induces a continuous affine functional C(X)→R whose graph is above h.
In the other direction, from earlier, we can describe H′ as: {(m,b)|b≥supf(h(f)−m(f))}
and then, for (ϕ−ϕ(0),ϕ(0)),
supf(h(f)−(ϕ−ϕ(0))(f))=supf(h(f)−ϕ(f)+ϕ(0))≤ϕ(0)
because ϕ(f)≥h(f). So continuous affine functionals whose graph lies above the graph of h correspond to points in H′.
So, we have a link between affine functionals that lie above the graph of h, and points in H′. What would a minimal point correspond to? Well, a non-minimal point corresponds to (m,b)+(m∗,b∗), where the latter component is nonzero. There's some f+ where f+((m,b)+(m∗,b∗))>f+(m,b) due to the latter component being nonzero, and for all f+, f+((m,b)+(m∗,b∗))≥f+(m,b). Using Theorem 1 to translate positive functionals to f, this means that the ϕ induced by (m,b) lies below the affine functional induced by (m,b)+(m∗,b∗) over the f∈C(X,[0,1]). So, if there's a different affine functional ψ s.t. ∀f∈C(X,[0,1]):h(f)≤ψ(f)≤ϕ(f), then ϕ must correspond to a nonminimal point.
Further, we can characterize whether ϕ corresponds to an a-measure or not. For a measure, if you increase your function you're feeding in, you increase the value you get back out, f′≥f→ϕ(f′)≥ϕ(f). For a signed measure with some negative component, Lemma 5 says we can find some f′∈C(X,[0,1]) that attain negative value, so you can add one of those f′ to your f and get ϕ(f+f′)<ϕ(f). So, a ϕ corresponds to an a-measure exactly when it's monotone.
Phase 5: Proving positive-minimals. With these links in place, this means we just have to take any old point that's an sa-measure in H′, get a ϕ from it, it'll fulfill certain properties, and use those properties to find a ψ that lies below ϕ and above h on C(X,[0,1]) and is monotone, certifying that ψ corresponds to a point below our minimal-point of interest that's still in H′ but is an a-measure, so we have a contradiction.
To that end, fix a ϕ that corresponds to some point in H′ that's not an a-measure (in particular, it has a negative component), it lies above the graph of h.
Now, translate ϕ to a (mϕ,bϕ), where bϕ=ϕ(0), and mϕ(f)=ϕ(f)−ϕ(0). ϕ is minimized at some f. Since our ϕ corresponds to something that's not an a-measure, (mϕ)−(1)<0
Let our affine continuous functional ψ be defined as ψ(f)=(mϕ)+(f)+ϕ(0)+(mϕ)−(1).
In order to show that ψ corresponds to an a-measure below (mϕ,bϕ) in H′, we need three things. One is that ψ is monotone (is an a-measure), two is that it lies below ϕ over C(X,[0,1]) and three is that it lies above h. Take note of the fact that ϕ(0)+(mϕ)−(1)≥0, because ϕ(0)=bϕ.
For monotonicity of ψ, it's pretty easy. If f′≥f, then
ψ(f′)=ψ(f+(f′−f))=(mϕ)+(f+(f′−f))+ϕ(0)+(mϕ)−(1)
≥(mϕ)+(f)+ϕ(0)+(mϕ)−(1)=ψ(f)
and we're done with that part.
For being less than or equal to ϕ over C(X,[0,1]) (we know it's not the same as ϕ because ϕ isn't monotone and ψ is),
ψ(f)=(mϕ)+(f)+ϕ(0)+(mϕ)−(1)≤(mϕ)+(f)+ϕ(0)+(mϕ)−(f)
=mϕ(f)+ϕ(0)=ϕ(f)−ϕ(0)+ϕ(0)=ϕ(f)
For being ≥h over C(X,[0,1]) it takes a somewhat more sophisticated argument. By Lemma 5, regardless of ϵ, there exists a f′ where mϕ(f′)<(mϕ)−(1)+ϵ. Then, we can go:
ψ(f)+ϵ>ψ(f)+mϕ(f′)−(mϕ)−(1)
=(mϕ)+(f)+ϕ(0)+(mϕ)−(1)+mϕ(f′)−(mϕ)−(1)=(mϕ)+(f)+ϕ(0)+mϕ(f′)
=(mϕ)+(f+f′)+ϕ(0)+(mϕ)−(f′)≥(mϕ)+(sup(f,f′))+ϕ(0)+(mϕ)−(sup(f,f′))
=mϕ(sup(f,f′))+ϕ(0)=ϕ(sup(f,f′))≥h(sup(f,f′))≥h(f)
The last steps were done via the definition of ϕ, ϕ≥h, and h being monotonic.
So, ψ(f)+ϵ>h(f) for all ϵ and all f∈C(X,[0,1]) getting ψ(f)≥h(f) for f∈C(X) (because h is −∞ everywhere else)
Thus, ψ specifies an a-measure (ψ being monotone) that is below the sa-measure encoded by ϕ (by ϕ≥ψ over C(X,[0,1])), yet ψ≥h, so said point is in H′. This witnesses that there can be no minimal points in H′ that aren't a-measures. That just leaves getting the slope bound from Lipschitzness, the worst part of this whole proof.
Phase 6: Let λ⊙ be the Lipschitz constant for h. Fix a ϕ that corresponds to a minimal point with λ>λ⊙. This violates the Lipschitz bound when traveling from 0 to 1, so the Lipschitz bound is violated in some direction. Further, the graph of ϕ touches the graph of h at some point f∗∈C(X,[0,1]), because if it didn't, you could shift ϕ down further until it did touch, witnessing that the point ϕ came from wasn't minimal (you could sap more from the b term).
Now, if this point is minimal, it should be impossible to craft a ψ which is ≤ϕ over C(X,[0,1]), ≥h, and different from ϕ. We shall craft such a ψ, witnessing that said point isn't actually minimal. Further, said ψ won't violate the Lipschitz bound in any direction. Thus, all affine functionals corresponding to minimal points must obey the Lipschitz bound and be monotone, so they're a-measures with λ≤λ⊙.
In order to do this, we shall craft three sets in C(X)⊕R. A, B1, and B2.
Set A is {(f,b)|f∈C(X,[0,1]),b≤h(f)}. Pretty much, this set is the hypograph of h. It's obviously convex because h is concave, and the hypograph of a concave function is convex. It's closed because h is continuous.
Set B1 is {(f,b)|f∈C(X,(0,1)),b>ϕ(f)}. This could be thought of as the the interior of the epigraph of ϕ restricted to C(X,[0,1]). Undershooting this means you never exceed ϕ over C(X,[0,1]). First, it's open. This is because, due to f being continuous over a compact set X, the maximum and minimum are attained, so any f∈C(X,(0,1)) is bounded below 1 and above 0, so we've got a little bit of room to freely wiggle f in any direction. Further, since ϕ−ϕ(0) is a continuous linear functional on C(X) which is a Banach space, it's a bounded linear functional and has some Lipschitz constant (though it may exceed λ⊙), so we have a little bit of room to freely wiggle b as well. So B1 is open.
Also, B1 is convex, because a mixture of f and f′ that are bounded away from 0 and 1 is also bounded away from 0 and 1, and pb+(1−p)b′>pϕ(f)+(1−p)ϕ(f′)=ϕ(pf+(1−p)f′).
Set B2 is {(f,b)|b>λ⊙d(f,f∗)+ϕ(f∗)}. This could be thought of as an open cone with a point (it's missing that exact point, though) at (f∗,ϕ(f∗)), that opens straight up, and certifies a failure of the λ⊙ bound on slope. If an affine function includes the point (f∗,ϕ(f∗)) in its graph, then if it increases faster than λ⊙ in any direction, it'll land in this set. It's open because, given a point in it, we can freely wiggle the f and b values around a little bit in any direction, and stay in the set. Now we'll show it's convex. Given an (f,b) and (f′,b′) in it, due to C(X) being a Banach space (so it has a norm), we want to check whether pb+(1−p)b′>λ⊙d(pf+(1−p)f′,f∗)+ϕ(f∗).
Observe that (using the defining axioms for a norm)
pb+(1−p)b′>p(λ⊙d(f,f∗)+ϕ(f∗))+(1−p)(λ⊙d(f′,f∗)+ϕ(f∗))
=λ⊙(pd(f,f∗)+(1−p)d(f′,f∗))+ϕ(f∗)=λ⊙(p||f−f∗||+(1−p)||f′−f∗||)+ϕ(f∗)
=λ⊙(||pf−pf∗||+||(1−p)f′−(1−p)f∗||)+ϕ(f∗)
≥λ⊙(||pf−pf∗+(1−p)f′−(1−p)f∗||)+ϕ(f∗)
=λ⊙(||pf+(1−p)f′−f∗||)+ϕ(f∗)=λ⊙d(pf+(1−p)f′,f∗)+ϕ(f∗)
So, B2 is convex.
Ok, so we've got a convex closed set and two convex opens. Now, consider B:=c.h(B1∪B2). The convex hull of an open set is open. We will show that A∩B=∅.
Assume this is false, and that they overlap. The point where they overlap, can then be written as a convex mixture of points from B1∪B2. However, B1 and B2 are both convex, so we can reduce it to a case where we're mixing one point (f,b) from B1 and one point (f′,b′) in B2. And (pf+(1−p)f′,pb+(1−p)b′)∈A.
If p=0, then we've just got a single point in B2. Also, ϕ(f∗)=h(f∗).
b′>λ⊙d(f′,f∗)+ϕ(f∗)=λ⊙d(f′,f∗)+h(f∗)≥h(f′)
This is because ϕ(f∗)=h(f∗) and h has a Lipschitz constant of λ⊙, so it can't increase as fast as we're demanding as we move from f∗ to f′, which stays in C(X,[0,1]). So (f′,b′)∉A.
If p=1, then we've just got a single point in B1. Then b>ϕ(f)≥h(f), so again, (f,b)∉A.
For the case where p isn't 0 or 1, we need a much more sophisticated argument. Remembering that (f,b)∈B1, and (f′,b′)∈B2, we will show that (pf+(1−p)f∗,pb+(1−p)ϕ(f∗)) lies strictly above the graph of h. Both f and f∗ lie in C(X,[0,1]), so their mix lies in the same set, so we don't have to worry about h being undefined there. Also, remember that ϕ≥h over C(X,[0,1]). Now,
pb+(1−p)ϕ(f∗)>pϕ(f)+(1−p)ϕ(f∗)=ϕ(pf+(1−p)f∗)≥h(pf+(1−p)f∗)
The critical > is by the definition of B1, and (f,b)∈B1. So, the b term is strictly too high for this point (different than the one we care about) to land on the graph of h.
With the aid of this, we will consider "what slope do we have as we travel from (pf+(1−p)f∗,pb+(1−p)ϕ(f∗)) to (pf+(1−p)f′,pb+(1−p)b′)"? Said slope is
(pb+(1−p)b′)−(pb+(1−p)ϕ(f∗))d(pf+(1−p)f′,pf+(1−p)f∗)=(1−p)(b′−ϕ(f∗))||(pf+(1−p)f′)−(pf+(1−p)f∗)||
=(1−p)(b′−ϕ(f∗))(1−p)||f′−f∗||=b′−ϕ(f∗)d(f′,f∗)>λ⊙d(f′,f∗)+ϕ(f∗)−ϕ(f∗)d(f′,f∗)=λ⊙
That critical > is by (f′,b′)∈B2 and the definition of B2.
So, if we start at (pf+(1−p)f∗,pb+(1−p)ϕ(f∗)) (and pf+(1−p)f∗ lies in C(X,[0,1])), we're above the graph of h. Then, we travel to (pf+(1−p)f′,pb+(1−p)b′), where pf+(1−p)f′∈C(X,[0,1]) by assumption that this point is in A, but while doing this, we ascend faster than λ⊙, the Lipschitz constant for h. So, our point of interest (pf+(1−p)f′,pb+(1−p)b′) lies above the graph of h and can't lie in A, and we have a contradiction.
Putting all this together, A∩B=∅. Since B is open, and they're both convex and nonempty, we can invoke Hahn-Banach (first version of the theorem in the "Separation of Sets" section)and conclude they're separated by some continuous linear functional ψL. Said linear functional must increase as b does, because (0,0)∈A, and (0,b) (for some sufficiently large b) lies in B2, thus in B. This means that given any f and a∈R to specify a level, we can find a unique b where ψL(f,b)=a.
So, any level set of this continuous linear functional we crafted can also be interpreted as an affine functional. There's a critical value of the level set that achieves the separation, ψL(f∗,ϕ(f∗)). This is because (f∗,ϕ(f∗))=(f∗,h(f∗))∈A, but (f∗,ϕ(f∗)+ϵ) is in B2, thus in B, for all ϵ. So we've uniquely pinned down which affine function ψ we're going for. Since the graph of ψ is a hyperplane separating A and B (It may touch the set A, just not cut into it, but it doesn't touch B), from looking at the definitions of A and B1 and B2, we can conclude:
From the definition of A, ψ(f)≥h(f), so ψ≥h over C(X,[0,1]).
From the definition of B1, ψ(f)≤ϕ(f) over C(X,(0,1)), and they're both continuous, so we can extend ψ(f)≤ϕ(f) to C(X,[0,1]) by continuity, so ψ≤ϕ over C(X,[0,1]).
Also, h(f∗)≤ψ(f∗)≤ϕ(f∗)=h(f∗), so ψ(f∗)=ϕ(f∗), and this, paired with the ability of B2 to detect whether an affine function exceeds the λ⊙ slope bound (as long as the graph of said function goes through (f∗,ϕ(f∗))), means that the graph of ψ not entering B2 certifies that its Lipschitz constant is λ⊙ or less. Since \phi does enter B2 due to violating the Lipschitz constant bound, this also certifies that ϕ≠ψ.
Putting it all together, given a ϕ which corresponds to a minimal point and violates the Lipschitz bound, we can find a ψ below it that's also above h, so said minimal point isn't actually minimal.
Therefore, if you were to translate a minimal point in the induced H into an affine function above h, it'd have to A: not violate the Lipschitz bound (otherwise we could undershoot it) and B: be monotone (otherwise we could undershoot it). Being monotone certifies that it's an a-measure, and having a Lipschitz constant of λ⊙ or less certifies that the λ of the a-measure is λ⊙ or less. We're finally done!
The next proofs are here.