It is advised to contact me if you wish to read this, the proofs aren't very edited.
Lemma 1:If X is a Polish space and C is a compact subset and f,f′∈CB(X), and supx∈C|f(x)−f′(x)|≤ϵ, then there is a third bounded continuous function f′′which fulfills the following three properties: First, d(f,f′′)≤ϵ. Second, f′′↓C=f′↓C. Third, d(f′,f′′)≤d(f,f′).
To prove this, we will use the Michael selection theorem to craft a continuous function with these properties. Accordingly, let the set-valued function ψ:X→R be defined as: if x∈C, ψ(x)={f′(x)}, and if x∉C, then: ψ(x):=[f(x)−ϵ,f(x)+ϵ]∩[f′(x)−d(f,f′),f′(x)+d(f,f′)] Assuming there was a continous function f′′ where f′′(x)∈ψ(x), it'd get us our desired results. This is because: d(f,f′′)=supx|f(x)−f′′(x)|=sup(supx∈C|f(x)−f′′(x)|,supx∉C|f(x)−f′′(x)|) And then, because f′′ is a selection from ψ, we can make these quantities bigger by selecting from ψ in the supremum as well, to get: ≤sup(supx∈C,y∈ψ(x)|f(x)−y|,supx∉C,y∈ψ(x)|f(x)−y|) And then we can use the definition of ψ(x) in the two cases (the latter case is because y∈ψ(x) implies y∈[f(x)−ϵ,f(x)+ϵ]) to get: =sup(supx∈C|f(x)−f′(x)|,supx∉C,y∈ψ(x)ϵ) And then we use the fact that we abbreviated the supremum of the difference between f and f′ as ϵ to get: =sup(ϵ,ϵ)=ϵ So, we'd have one of our three results, that d(f,f′′)≤ϵ. Our second desired result that f′′ mimicks f′ on C is trivial by the definition of ψ. And finally, d(f′,f′′)=supx|f′(x)−f′′(x)|=sup(supx∈C|f′(x)−f′′(x)|,supx∉C|f′(x)−f′′(x)|) And then, we do the ususual transition from f'' to the \psi function that it's a selection of, ≤sup(supx∈C,y∈ψ(x)|f′(x)−y|,supx∉C,y∈ψ(x)|f′(x)−y|) And then, because ψ(x) is always f′(x) when x∈C, and y is always bounded in [f′(x)−d(f,f′),f′(x)+d(f,f′)] for the latter part, this turns into: ≤sup(0,d(f,f′))=d(f,f′) So we have our desired result of d(f′,f′′)≤d(f,f′). As we have established that all three results follow from finding a continuous selection of ψ, we just have to do that now. For this, we will be using the Michael selection theorem. In order to invoke it, we need to check that X is paracompact (all Polish spaces are paracompact), that R is a Banach space (it is), and verify some conditions on the sets produced.
We need that the function ψ(x) is nonempty for all x. This is trivial by the definition of ψ(x) for x∈C, but for the other case, we need to verify that [f(x)−ϵ,f(x)+ϵ]∩[f′(x)−d(f,f′),f′(x)+d(f,f′)] Is a nonempty set. This is easy because f(x) witnesses nonemptiness. We need that the function ψ(x) is convex for all x. This is easy because it's the intersection of two convex sets when x∉C, which is the trickier case to check. We also need that the function ψ(x) is closed for all x, which is true because it's either a point or it's an intersection of two closed sets.
All that we're missing in order to invoke the Michael Selection theorem to get a continuous function that works is verifying lower hemicontinuity for the function ψ.
Lower hemicontinuity is: If xn limits to x, and y∈ψ(x), there's some subsequence xm and ym where ym∈ψ(xm), and ym limits to y.
In order to show this, we can take three different cases.
The first possible case is where infinitely many of the xn lie in C, so the limit point x must lie in C as well. Then, xm can be the subsequence which lies in C, and ym can be the subsequence f′(xm), which is the only possible choice of value that lies in ψ(xm). Due to continuity of f′, this obviously converges to f′(x), which is the only possible choice of value for ψ(x) because x∈C.
The second possible case is where only finitely many of the xn lie in C, but yet the limit point x lies in C as well, like limiting to the border of a closed set from outside the closed set. Let xm be the subsequence where you get rid of all the x's in the approximating sequence that lie in C. Notice that \psi(xm), instead of being written as [f(xm)−ϵ,f(xm)+ϵ]∩[f′(xm)−d(f,f′),f′(xm)+d(f,f′)] Can be written as the single interval [sup(f(xm)−ϵ,f′(xm)−d(f,f′)),inf(f(xm)+ϵ,f′(xm)+d(f,f′))] And now, let ym be defined as: sup(sup(f(xm)−ϵ,f′(xm)−d(f,f′)),inf(f′(xm),inf(f(xm)+ϵ,f′(xm)+d(f,f′)))) Due to continuity of all these functions, and xm limiting to x, the limiting y value is: sup(sup(f(x)−ϵ,f′(x)−d(f,f′)),inf(f′(x),inf(f(x)+ϵ,f′(x)+d(f,f′)))) Now, because f(x)+ϵ=f(x)+supx∈C|f(x)−f′(x)|≥f′(x) and f′(x)+d(f,f′)=f(x)+supx|f(x)−f′(x)|≥f′(x) Therefore, inf(f(x)+ϵ,f′(x)+d(f,f′))≥f′(x) Therefore, inf(f′(x),inf(f(x)+ϵ,f′(x)+d(f,f′)))=f′(x) So our limiting y value reduces to: sup(sup(f(x)−ϵ,f′(x)−d(f,f′)),f′(x)) Also, by similar arguments, we have: f(x)−ϵ=f(x)−supx∈C|f(x)−f′(x)|≤f′(x) and f′(x)−d(f,f′)=f(x)−supx|f(x)−f′(x)|≤f′(x) Therefore sup(f(x)−ϵ,f′(x)−d(f,f′))≤f′(x) So, our limiting y value reduces to merely f′(x), ie, our point selected from ψ(x), and we've shown lower hemicontinuity in this case. That just leaves one last case left over, the case where x∉C.
Again, as before, in this case, ψ(x) is the interval [sup(f(x)−ϵ,f′(x)−d(f,f′)),inf(f(x)+ϵ,f′(x)+d(f,f′))] Let p be how close the point y is to the bottom (p=1 for the bottom, or the top, when p=0). For your sequence xm limiting to x, it's made by picking out all the xn which lie in C, only finitely many, and the ym are given by: p(sup(f(xm)−ϵ,f′(xm)−d(f,f′)))+(1−p)(inf(f(xm)+ϵ,f′(xm)+d(f,f′))) Because of the continuity of the functions sup(f−ϵ,f′−d(f,f′)) (supremum of two continuous functions) and inf(f+ϵ,f′+d(f,f′)) (inf of two continuous functions), the ym limit to: p(sup(f(x)−ϵ,f′(x)−d(f,f′)))+(1−p)(inf(f(x)+ϵ,f′(x)+d(f,f′))) Which is just y. So, we have lower hemicontinuity in this last case.
And therefore, we have lower hemicontinuity overall for ψ, and so ψ has a continuous selection function by the Michael Selection Theorem, and said selection function fulfills the requisite properties.
Lemma 2:For all f and f′, if a functional h:CB(X)→R has a set C as an ϵ1-almost-support, and has supx∈C|f(x)−f′(x)|≤ϵ2, then |h(f)−h(f′)|≤λ⊙ϵ1+ϵ2⋅d(f,f′) where λ⊙ is the Lipschitz constant of h.
Proof: Via Lemma 1, there's a function f′′ with the properties that: d(f,f′′)≤ϵ1 f′↓C=f′′↓C d(f′,f′′)≤d(f,f′) Now, we can go: |h(f)−h(f′)|≤|h(f)−h(f′′)|+|h(f′′)−h(f′)|≤λ⊙d(f,f′′)+ϵ2d(f′,f′′) ≤λ⊙ϵ1+ϵ2d(f,f′) And we're done. The critical steps in the second inequality were due to Lipschitzness of h, and the fact that f′′ and f′ agree on C which is an ϵ2-almost-support for h, respectively.
Proposition 1:For a continuous function L:X→[0,1], the metric d|L completely metrizes the set {x|f(x)>0} equipped with the subspace topology.
To do this, we'll need four parts. First, we need to show that d|L is even a metric. Second, we'll need to show that for all open balls in the d|L metric, you can fit an open ball in the original metric within it (so all the open sets induced by the d|L metric were present in the original subspace topology). Second, we'll need to show that for all open balls in the original metric that lie within the support of L, we can fit an open ball induced by the d|L metric within it (so all the open sets in the original subspace topology can be induced by the d|L metric), and parts 2 and 3 let us show that the d|L metric induces the subspace topology on the support of L. Finally, part 4 is showing that any Cauchy sequence in the d|L metric is Cauchy in the original complete metric, so we can't have any cases of missing limits points, and showing that all Cauchy sequences in the support of L, according to the d|L metric, must have their limit point lying in the support of L, so limits never lead outside of the support of L. This then shows that d|L is a complete metrization of the support of L (it induces the same topology as the subspace topology), and all limit points lie in the same space,
So, to begin with, if X is your original Polish space, pick a complete metrization of the space. Then, ensure that the maximum distance is 1 (this can always be done while preserving the exact same Cauchy sequences, and not affecting the topology in any way, and it's still a metric), and call that d.
Also, there's some continuous function L:X→[0,1] that you'll be updating. The set support(L) is defined as {x|L(x)>0}, and is clearly an open subset of X because the preimage of (0,2) (an open set) must be open due to the continuity of L.
To show it's a metric, we first need symmetry (which is obvious, because d and inf and absolute value of difference are all symmetric). For identity of indiscernibles, the forward direction, the fact that L is bounded below 1 proves that inf(1L(x),1L(y))≥1, always, so for (d|L)(x,y) to be 0, then d(x,y)=0, so x=y. For the reverse direction, (d|L)(x,x) must be 0, because the equation for distance would reduce to 0⋅1+0.
That just leaves the triangle inequality, and here, we'll critically use the fact that the original metric was clipped so it's never above 1. Our goal is to show that (d|L)(x,z)≤(d|L)(x,y)+(d|L)(y,z) Without loss of generality, we can assume that 1L(x)≤1L(z) (otherwise flip x and z), so we can split into three cases, which are: 1L(x)≤1L(y)≤1L(z) 1L(x)≤1L(z)≤1L(y) 1L(y)≤1L(x)≤1L(z) For the first case, we have: (d|L)(x,z)=d(x,z)⋅inf(1L(x),1L(z))+∣∣1L(x)−1L(z)∣∣ =d(x,z)⋅1L(x)+∣∣1L(x)−1L(z)∣∣≤d(x,z)⋅1L(x)+∣∣1L(x)−1L(y)∣∣+∣∣1L(y)−1L(z)∣∣ ≤(d(x,y)+d(y,z))⋅1L(x)+∣∣1L(x)−1L(y)∣∣+∣∣1L(y)−1L(z)∣∣ ≤d(x,y)⋅1L(x)+d(y,z)⋅1L(y)+∣∣1L(x)−1L(y)∣∣+∣∣1L(y)−1L(z)∣∣ =d(x,y)⋅inf(1L(x),1L(y))+∣∣1L(x)−1L(y)∣∣+d(y,z)⋅inf(1L(y),1L(z))+∣∣1L(y)−1L(z)∣∣ =(d|L)(x,y)+(d|L)(y,z) For the second case, we can do the same exact argument, just replace that 1L(y) in the second line with 1L(z). The third case is the tricky one, we'll work backwards. Remember that our inequalities are: 1L(y)≤1L(x)≤1L(z) Now, let's proceed. (d|L)(x,y)+(d|L)(y,z) =d(x,y)⋅inf(1L(x),1L(y))+∣∣1L(x)−1L(y)∣∣+d(y,z)⋅inf(1L(y),1L(z))+∣∣1L(y)−1L(z)∣∣ =d(x,y)⋅1L(y)+∣∣1L(x)−1L(y)∣∣+d(y,z)⋅1L(y)+∣∣1L(y)−1L(z)∣∣ =d(x,y)⋅1L(y)+1L(x)−1L(y)+d(y,z)⋅1L(y)+1L(z)−1L(y) =1L(y)(d(x,y)+d(y,z))+1L(x)+1L(z)−2L(y)≥1L(y)d(x,z)+1L(x)+1L(z)−2L(y) =(1L(y)+1L(x)−1L(x))d(x,z)+1L(x)+1L(z)−2L(y) Then we do some regrouping, to yield: =1L(x)d(x,z)+(1L(x)−1L(y))(1−d(x,z))+1L(z)−1L(y) At point, we remember that d(x,z)≤1 because we bounded our initial distance, and 1L(x)≥1L(y) because of the problem case we're in, to get: ≥1L(x)d(x,z)+1L(z)−1L(y) And then, we just remember that 1L(y)≤1L(x)≤1L(z) in this problem case, to get: =d(x,z)inf(1L(x),1L(z))+∣∣1L(z)−1L(y)∣∣=(d|L)(x,z) And we're done with the triangle inequality, so d|L is indeed a metric.
Well, is it a complete metric for support(L)? Well, by looking at the definition of d|L, and remembering that 1L(x) is always 1 or more because the likelihood function is bounded in [0,1], we have that: (d|L)(x,y)≥d(x,y) When x and y are in the support of L, so any Cauchy sequence in d|L must also be Cauchy in d. Now, either a Cauchy sequence has its limit point also lying in the support of L, in which case we're good, or it has its limit point lying on the edge of the support of L (and outside the set), where L(x)=0. However, in that case, the sequence in d|L cannot be Cauchy, because 1L(xn) would diverge to infinity due to the continuity of L and the limit point being on the edge where L(x)=0, so it's impossible that you could have some finite point that'd be close to all the rest, the absolute value term in the distance would forbid Cauchyness. So, it's a complete metric for the support of L.
All that remains is to show that it induces the same topology as the subspace topology that the open set support(L) should have. Because the support of L is itself open, and the space X is metrized by the distance metric d, any open set in the support of L can be written as the intersection of an open set in X and the open set support(L), and so it's open in X, and can be written as the union of tiny open balls in the original distance metric.
So, we'll proceed by showing that any open ball (w.r.t. the d metric) within the support of L, can fit some open ball (w.r.t. the d|L metric) in it that engulfs the center point, and any open ball (w.r.t. the d|L metric) within the support of L, can fit some open ball (w.r.t. the d metric) in it that engulfs the center point.
If we can do it, than because any open set in the subspace topology can be written as the union of a bunch of open balls centered at each point in the open set according to the d metric, and each of those open balls according to the d metric has their center point engulfed by a smaller open ball according to the d|L metric, we'd be able to build the open set in the subspace topology out of a union of open sets in the d|L induced topology.
And also, any open set in the d|L-induced topology can be written as a union of infinitely many open balls centered at each relevant point, and because each of those open balls has their center point engulfed by a smaller open ball according to the d metric, so we can build any open set in the d|L-inducd topology out of a union of open sets in the original topology.
The net result is that the d|L induced topology and the subspace topology have the same open sets. So, let's get working on showing that we can fit the two sorts of balls inside each other.
In one direction, if you have a ball of size ϵ according to the metric d, then because (d|L)(x,y)≥d(x,y) always, a ball of size ϵ according to the metric d|L centered at the same point will fit entirely within the original ball, so we have one half of our topology argument done.
In the other direction, if you have a ball of size ϵ according to the metric d|L, around some point x, then there's some δ distance around x where the function 1L only varies by ϵ3.
At this point, we can then fit a ball of size min(δ,ϵL(x)3) around the point x (according to the original d metric), and it will lie within the ball of size ϵ around the point x according to the d|L metric. The reason for this is: (d|L)(x,y)=d(x,y)inf(1L(x),1L(y))+∣∣1L(x)−1L(y)∣∣ And then, because we selected our distance between x and y so that 1L(x) and 1L(y) only differ by ϵ3 at most (because a ball of size δ suffices to accomplish this), we have: ≤ϵL(x)3(1L(x)+ϵ3)+ϵ3=ϵ3+ϵ2L(x)9+ϵ3 And then, because L(x)≤1 and ϵ is very small, we have ≤ϵ3+ϵ3+ϵ3=ϵ So, this small of a ball in the original distance metric suffices to slot entirely inside a ball of size ϵ centered at x in the d|L metric.
And that's all we need!
Theorem 1:The set of infradistributions (set form) is isomorphic to the set of infradistributions (functional form). The H→h part of the isomorphism is given by h(f)=inf(m,b)∈Hm(f)+b, and the h→H part of the isomorphism is given by H={(m,b)|b≥(h′)∗(m)}, where h′(f)=−h(−f) and (h′)∗ is the convex conjugate of h′.
Our first order of business is establishing the isomorphism. Our first direction is H to h and back is H exactly. By upper completion, and reproved analogues of Proposition 2 and Theorem 1 from "Basic inframeasure theory", which an interested party can reprove if they want to see it, we can characterize H as {(m,b)|∀f∈CB(X):m(f)+b≥inf(m′,b′)∈H(m′(f)+b′)} And then, our H can further be reexpressed as {(m,b)|∀f∈CB(X):m(f)+b≥EH(f)} {(m,b)|∀f∈CB(X):b≥EH(f)−m(f)} {(m,b)|b≥supf∈CB(X)(EH(f)−m(f))} Also, EH(f)=h(f)=−h′(−f), so we can rewrite this as: {(m,b)|b≥sup(−f)∈CB(X)(m(−f)−h′(−f))} and, by the definition of the convex conjugate, and the space of finite signed measures being the dual space of CB(X), and m(−f) being a functional applied to an element, this is... {(m,b)|b≥(h′)∗(m)} So, our original set H is identical to the convex-conjugate set, when we go from H to h back to a set of sa-measures.
Proof Phase 2: In the reverse direction for isomorphism, assume that h fulfills the conditions (we'll really only need continuity and concavity) We want to show that \mathbb{E}_{\{(m,b)|b\ge(h')^{*}(m)\}}(f)=h(f) Let's begin. E{(m,b)|b≥(h′)∗(m)}(f)=inf(m,b):b≥(h′)∗(m)(m(f)+b) Given an m, we have a natural candidate for minimizing the b, just set it equal to (h′)∗(m). So then we get infm(m(f)+(h′)∗(m))=infm((h′)∗(m)−m(−f)) And this is just... −(h′)∗∗(−f), and, because h is continuous over CB(X), and concave, then h′ is continuous over CB(X), and convex, so h′=(h′)∗∗. From that, we get E{(m,b)|b≥(h′)∗(m)}(f)=−(h′)∗∗(−f)=−h′(−f)=h(f) and we're done with isomorphism.
So, in our first direction, we're going to derive the conditions on the functional from the condition on the set, so we can assume nonemptiness, closure, convexity, upper completion, projected-compactness, and normalization, and derive monotonicity, concavity, normalization, Lipschitzness, and compact almost-support (CAS) from that.
For monotonicity, remember that all points in the infradistribution set are a-measures, so if f′≥f, then h(f′)=inf(m,b)∈Hm(f′)+b≥inf(m,b)∈Hm(f)+b=h(f)
We could do that because all the measure components are actual measures.
For concavity, h(pf+(1−p)f′)=inf(m,b)∈Hm(pf+(1−p)f′)+b =inf(m,b)∈H(pm(f)+(1−p)m(f′)+pb+(1−p)b) ≥inf(m,b)∈H(pm(f)+pb)+inf(m,b)∈H((1−p)m(f′)+(1−p)b) =pinf(m,b)∈H(m(f)+b)+(1−p)inf(m,b)∈H(m(f′)+b)=ph(f)+(1−p)h(f′) And we're done with that. For normalization, h(0)=inf(m,b)∈Hm(0)+b=inf(m,b)∈Hb=0 And h(1)=inf(m,b)∈Hm(1)+b=inf(λμ,b)∈Hλμ(1)+b=inf(λμ,b)∈Hλ+b=1 So we have normalization.
For Lipschitzness, we first observe that compact-projection (the minimal points, when projected down to their measure components, make a set with compact closure) enforces that there's an upper bound on the λ value of a minimal point (λμ,b)∈Hmin, because otherwise you could pick a sequence with unbounded λ, and it'd have no convergent subsequence of measures, which contradicts precompactness of the minimal points projected down to their measure components.
Then, we observe that points in H correspond perfectly to hyperplanes that lie above the graph of h, and a minimal point is "you shift your hyperplane down as much as you can as much as you can until you can't shift it down any more without starting to cut into the function h". Further, for every function f∈CB(X), you can make a hyperplane tangent to the function h at that point by the Hahn-Banach theorem, which must correspond to a minimal point.
Putting it together, the hypograph of h is exactly the region below all its tangent hyperplanes. And we know all the tangent hyperplanes correspond to minimal points, and their Lipschitz constants correspond to the λ value of the minimal points. Which are bounded. So, Compact-Projection in H implies h is Lipschitz.
Finally, we'll want compact almost-support. A set of measures is compact iff the amount of measure is upper-bounded, and, for all ϵ, there is a compact set Cϵ⊆X where all the measures m have <ϵ measure outside of Cϵ.
So, given that the set of measures corresponding to H is compact by the compact-projection property, we want to show that the functional h has compact almost-support. To do this, we'll observe that if h is the inf of a bunch of functions, and all functions think two different points are only a little ways apart in value, then h must think they're only a little distance apart in value. Keeping that in mind, we have:
|dh(f;f′)|=limδ→0|h(f+δf′)−h(f)|δ=limδ→0|inf(m,b)∈H(m(f+δf′)+b)−inf(m′,b′)∈H(m′(f)+b′)|δ And then, we can think of a minimal point as corresponding to a hyperplane ϕ and ψ, and h is the inf of all of them, so to bound the distance between these two values, we just need to assess the maximum size of the gap between those values over all minimal points/tangent hyperplanes. Thus, we can get: limδ→0|inf(m,b)∈H(m(f+δf′)+b)−inf(m′,b′)∈H(m′(f)+b′)|δ ≤limδ→0sup(m,b)∈H|(m(f+δf′)+b)−(m(f)+b))|δ And then, we can do some canceling and get: =limδ→0sup(m,b)∈H|m(δf′)|δ=limδ→0sup(m,b)∈H|δm(f′)|δ=limδ→0sup(m,b)∈H|m(f′)|=sup(m,b)∈H|m(f′)| And then, because f' was selected to be 0 on Cϵ, which makes up all but ϵ of the measure for all measures present in H, we can upper-bound |m(f′)| by ϵ||f′||, so we have that f′↓Cϵ=0→∀f:|dh(f;f′)|≤ϵ||f′|| And so, Cϵ is a compact ϵ-almost-support for h, and this argument works for all ϵ, so h is CAS, and that's the last condition we need. Thus, if H is an infradistribution (set form), the expectation functional h is an infradistribution (expectation form). Now for the other direction, where we assume monotonicity, concavity, normalization, Lipschitzness, and CAS on an infradistribution (expectation form) and show that the induced form fulfills nonemptiness, convexity, closure, upper completion, projection-compactness, normalization, and being a set of a-measures.
Remember, our specification of the corresponding set was: {(m,b)|b≥(h′)∗(m)} Where h′ is the function given by h′(−f)=−h(f), and (h′)∗ is the convex conjugate of h′.
First, being a nonempty set of a-measures. Because there's an isomorphism linking points of the set and hyperplanes above the graph of h, we just need to establish that no hyperplanes above the graph of h can slope down in the direction of a nonnegative function (as this certifies that the measure component must be an actual measure), and no hyperplanes above the graph of h can assign 0 a value below 0 (as this corresponds to the b term, and can be immediately shown by normalization).
What we do is go "assume there's a ϕ where the linear functional corresponding to ϕ isn't a measure, ie, there's some nonnegative function f where ϕ(f)<ϕ(0)". Well, because of monotonicity for h (one of the assumed properties), we have h(0)≤h(f)≤h(2f)≤h(3f).... And, because all affine functionals are made by taking a linear functional and displacing it, ϕ(0)>ϕ(f)>ϕ(2f)>ϕ(3f)..., decreases at a linear rate, so eventually the hyperplane and h cross over, but ϕ was assumed to be above h always, so we have a contradiction.
Therefore, all hyperplanes above h must have their linear functional component corresponding to an actual measure, ie, being an a-measure. And we get nonemptiness from the concavity of h, so we can pick any function and use the Hahn-Banach theorem to make a tangent hyperplane to h that touches at that point, certifying nonemptiness.
By the way, the convex conjugate, (h′)∗(m), can be reexpressed as supf(h(f)−m(f)).
For closure and convexity: By monotonicity of h and normalization, 0=−h(0)≥−h(f)≥−h(1)=−1, and h is continuous (Lipschitz) on CB(X), and concave, so h′ is proper, continuous on CB(X), and convex, so, by the Wikipedia page on "Closed Convex Function", h′ is a closed convex function, and then by the Wikipedia page on "Convex Conjugate" in the Properties section, (h′)∗ is convex and closed. From the Wikipedia page on "Closed Convex Function", this means that the epigraph of (h′)∗ is closed, and also the epigraph of a convex function is convex. This takes care of closure and convexity for our H.
Time for upper-completeness. Assume that (m,b) lies in the epigraph. Our task now is to show that (m,b)+(0,b′) lies in the epigraph. This is equivalent to showing that b+b′≥(h′)∗(m). Let's begin. (h′)∗(m)≤b≤b+b′ And we're done.
Normalization of the resulting set is easy. Going from h to a (maybe)-inframeasure H back to h is identity as established earlier, so all we have to do is show that a failure of normalization in a (maybe)-inframeasure makes the resulting h not normalized. Thus, if our h is normalized, and it makes an H that isn't normalized, then going back makes a non-normalized h, which contradicts isomorphism. So, assume there's a failure of normalization in H. Then EH(0)≠0, or EH(1)≠1, so either h(0)≠0 or h(1)≠1 and we get a failure of normalization for h which is impossible. So H must be normalized.
That just leaves compact-projection. We know that a set of measures is precompact iff there's a bound on their λ values, and for all ϵ, there's a compact set Cϵ⊆X where all the measure components have <ϵ measure outside of that set.
First, we can observe that no hyperplane above h can have a Lipschitz constant above the maximal Lipschitz constant for the function h, because if it increased more steeply in some direction, you could go in the other direction to decrease as steeply as possible, and h would be constrained to decrease strictly less steeply in that direction, so if you went far enough in that direction, your hyperplane and h would intersect, which is impossible. Thus, Lipschitzness of h enforces that there can be no point in the set H with too much measure, which gives us one half of compact-projection for H.
For the other half, CAS for h ensures that for all ϵ, there is a compact set Cϵ where f′↓Cϵ=0→|dh(f;f′)|≤ϵ||f′|| What we'll do is establish that no hyperplane lying above h can have a slope more than ϵ in the direction of a function that's in [0,1] and is 0 on Cϵ. Let f′ be such a function fulfilling those properties that makes some hyperplane above h slope down too hard. Then dh(0;−f′)≥−ϵ Because going in the direction of a negative function decreases your value, so the Gateaux derivative would be negative, and −f′ is in [−1,0], and we have CAS on h.
Now, we can realize that as we travel from 0 to −f′ to −2f′ to −3f′, our vector of travel is always in the direction of −f′, which can't be too negative. Each additional −f′ added drops the value of h by at most ϵ||f′||. However, each additional −f′ added drops the value of ψ (our assumed functional that's sloping down too hard in the −f′ direction) by more than that quantity, so eventually ψ will cross over to be lower than h, so ψ can't correspond to an a-measure in H, and we have a contradiction.
Therefore, regardless of the point in H, its measure component must assign any function that's 0 on Cϵ and bounded in [0,1] a value of ϵ at most. We can then realize that this can only happen to a measure that assigns ϵ or less measure to the set Cϵ (otherwise you can back off from your continuous function to a discontinuous indicator function).
Thus, given our ϵ, we've found a compact set Cϵ where the measure component of all points in H assign ϵ or less value to the outside of that set, and this can be done for all ϵ, certifying the last missing piece for compact-projection of H (because the projection is precompact iff the set of measures is bounded above in amount of measure present, and for all ϵ, there's a compact set Cϵ where all the measures assign ≤ϵ measure outside of that set.
And that's the last condition we need to conclude that the set form of an infradistribution (functional form) is an infradistribution (set form), and we're done.
Proposition 2:For any infradistribution h there is a unique closed set S which is the intersection of all supports for h, and is a support itself.
Proof sketch: The proof splits into three parts. First, we show that if B is a support and B′ is an ϵ-almost support, then B∩B′ is a ϵ-almost support, by picking two functions which agree on B∩B′ and using them to construct a third function which has similar expectations to both of them, due to agreeing with the first function on B and agreeing with the second function on B′, so our arbitrary functions which agree on the intersection have similar expectation values. A support is precisely an ϵ-almost support for all ϵ, so actual supports are preserved under finite intersection.
Second, we show that supports are preserved under countable intersections, which critically relies on compact ϵ-almost supports for all ϵ. Roughly, that part of the proof proceeds by using our compact almost-support in order to make a sequence of compact almost-supports nested in each other, and showing they converge in Hausdorff distance, so two functions which agree on the intersection of all the compact sets are very close to agreeing on some finite almost-support and have similar expectation values, so the intersection of compact sets is an almost-support. The property of being an almost-support is hereditary upwards, and we can show our countable intersection of compact sets is a subset of the countable intersection we're interested in, so it's an almost-suppport. However, we can shrink ϵ to 0 so it's a support.
Finally, we show that supports are preserved under arbitrary intersections by using strong Lindelofness of X (because it's Polish) to reduce the arbitrary intersection case to the countable intersection case which has already been solved.
For the finite intersection case, let B be a support and B′ be an ϵ-almost support. Our task is to show that B∩B′ is an ϵ-almost support, by showing that two functions which agree on it can't be more than ϵ apart in expectation value.
Pick any two functions f and f′ where f↓B∩B′=f′↓B∩B′. Let's define a f′′ on B∪B′ as follows. If x∈B, then f′′(x):=f(x). If x∈B′, and x∉B, then f′′(x):=f′(x). Our first task is to show that f′′ is continuous.
If xn limits to x, we can split into three cases. Our first case is where x∈B, and x∉B′. Then past a certain n, xn will always not be in B′ (complement of a closed set is open), and so its values are dictated by f, which is continuous, and we have continuity of f′′ at that point. The second case where x∈B′ and x∉B is symmetric, but with the values being dictated by f′ instead. Our third case where x∈B∩B′ is the only one which takes some care. As xn limits to x, then both f′(xn) and f(xn) (not the restrictions, the original functions!) limit to f(x)=f′(x) (because f and f′ agree on B∩B′) so no matter how our xn toggles back and forth between B and B′, we have continuity.
Now that we know f′′ is continuous on B∩B′, we can use the Tietze Extension Theorem to extend f′′−f′ to a continuous function on the entire space X, because the union of two closed sets is closed, and we can ensure that the extension stays in [−d(f,f′),d(f,f′)] as well. (because f′′ is copying either f or f′), call this extension f∗.
Now, because (f′+f∗)↓B=(f′+(f′′−f′))↓B=f′′↓B=f↓B and B is a support, then h(f′+f∗)=h(f). However, since (f′+f∗)↓B′=(f′+f′′−f′)↓B′=f′′↓B′=f′↓B′ (because f′ and f agree on B′∩B), then |h(f′+f∗)−h(f′)|<ϵd(f′+f∗,f′)=ϵsupx|f′(x)+f∗(x)−f′(x)|=ϵsupx|f∗(x)|≤ϵd(f,f′) The last inequality is because f∗ was selected to be bounded in the relevant range, and the first inequality is because f′+f∗ and f′ are identical on B′ which is a ϵ-almost-support.
Thus, |h(f)−h(f′)|≤|h(f)−h(f′+f∗)|+|h(f′+f∗)−h(f′)|<ϵd(f,f′) That last inequality was because f and f+f∗ agree on B which is a support, and for the latter, we already derived that inequality. Because f and f′ were arbitrary functions which agreed on B∩B′, and B was an arbitrary support and B′ was an arbitrary ϵ-almost-support, we have that the intersection of any support and ϵ-almost-support is an ϵ-almost support. As a special case of this, the intersection of two supports is a support, so being a support is preserved under finite intersection.
Now for supports being preserved under countable intersection. Fix a countable family of supports, Bi. We will show that ⋂iBi is a support.
First, by our "compact almost-support" property, for all ϵ, there's a compact set Cϵ where any two functions agreeing on Cϵ will only differ in expectation by ϵd(f,f′). Fix an arbitrary ϵ to use through the rest of the argument, which induces a compact set Cϵ.
Let Si be defined as Cϵ∩⋂j≤iBj, and Sω:=⋂iSi. Polish spaces are Hausdorff, so the intersection of compact sets with closed sets is compact. Therefore, the Si are compact and all nested in each other. All the Si are ϵ-supports, because they're the intersection of finitely many supports (which by the earlier result is a support) intersected with an ϵ-support.
We will show that Si converge to Sω in Hausdorff-distance. Assume that they don't converge in Hausdorff-distance. Then there's some δ where all Si have a point more than δ distance away from the set Sω. We can pick a sequence of those points from the Si, and they're all in S0 which is compact, so we can isolate a convergent subsequence. Then, for all i, the subsequence enters the closed set Si and doesn't leave, so the limit point must be in all the Si and thus in Sω, contradicting that the sequence of points is always δ away from Sω. This shows that the Si converge to Sω in Hausdorff-distance.
At this point fix a f and f′ which agree with each other on Sω. f and f′ restricted to any Si (which is compact) must be uniformly continuous, and as we restrict to smaller and smaller Si, the function saying how close two points need to be for f and f′ to change little gets more tolerant.
Pick any δ<<ϵ you want. By uniform continuity for the restrictions of f and f′ to a compact set we can find some γ>0 where points in S0 (a superset of all later Si) must be that close for f and f′ to only vary by δ2 between the two points. And for any γ there's some i where Si is within γ Hausdorff-distance of Sω because they limit to each other in Hausdorff-distance. Any point in Si is only γ away from Sω, so f and f′ are only 2⋅δ2=δ or less apart from each other on suitably late Si.
By Lemma 2, since f and f′ are only δ away from each other on Si which is an ϵ-almost support, we have: |h(f)−h(f′)|≤λ⊙δ+ϵd(f,f′) And, because δ was arbitrary, we can take the limit as it approaches 0 to get: |h(f)−h(f′)|≤ϵd(f,f′) Also, f and f′ were arbitrary functions which agreed on Sω, so Sω is an ϵ-almost-support.
Sω=⋂iSi=⋂i(Cϵ∩⋂j≤iBj)=Cϵ∩⋂iBi⊆⋂iBi Being an almost-support is hereditary upwards, because two functions which agree on a superset agree on a subset and are therefore close. So, ⋂iBi is an ϵ-almost-support. But ϵ was arbitrary, so it's a support, and we have that countable intersections of supports are supports.
Finally, we'll show that any uncountable intersection of supports is identical to a countable intersection of supports. Let's say you intersect ℵα closed sets which are supports. Biject that with the ordinal ωα by the well-ordering principle, so now all supports are indexed like Bγ with γ<ωα. Our task is now to write ⋂γ<ωαBγ as a countable intersection of closed supports.
Separable metric spaces are strongly Lindelof, which implies that every open set can be written as a countable union of open sets from a countable basis. By taking the complement, we can get the converse. Every closed set can be written as a countable intersection of closed sets from a countable basis. Call the countable basis closed sets Ai.
Now, define Fγ:={Ai|Bγ⊆Ai}. It is the family of basis closed sets which Bγ is a subset of. Because every closed set is the intersection of closed sets from the basis, Bγ=⋂Fγ.
Now, let's consider ⋂(⋃γ<ωαFγ) Ie, the set made by intersecting all basis closed sets which contain some Bγ within them. We will show that this equals ⋂γ<ωαBγ, our arbitrary intersection of choice.
In one direction, x∈⋂γ<ωαBγ→∀γ<ωα:x∈Bγ Now, assume ∃γ<ωα:Bγ⊆Ai. From this, we derive x∈Ai. The i was arbitrary, so we've derived: ∀i:(∃γ<ωα:Bγ⊆Ai)→x∈Ai Which is the same as: x∈⋂i:(∃γ<ωα:Ai⊆Bγ)Ai which is the same as: x∈⋂{Ai|∃γ<ωα:Bγ⊆Ai} which is the same as: x∈⋂(⋃γ<ωα{Ai|Bγ⊆Ai}) which is the same as: x∈⋂(⋃γ<ωαFγ) So, there's one direction done, we just showed that: ⋂γ<ωαBγ⊆⋂(⋃γ<ωαFγ) For the other direction, observe that x∉⋂γ<ωαBγ→∃γ<ωα:x∉Bγ Now, because Bγ=⋂Fγ, ∃γ<ωα:x∉⋂Fγ which is equivalent to: ∃γ<ωα:∃i:(Sγ⊆Ai∧x∉Ai) Swapping the quantifiers, we get: ∃i:(∃γ<ωα:Sγ⊆Ai)∧x∉Ai Which can be reexpressed as: ¬∀i:(∃γ<ωα:(Sγ⊆Ai)→x∈Ai And by our work in the first direction of trying to establish equality between the two sets (specifically, all our equivalences towards the end), this is the same as: x∉⋂(⋃γ<ωαFγ) So, we have our bidirectional implication and equality, ⋂γ<ωαBγ=⋂(⋃γ<ωαFγ) We're trying to show the former set is a support. It can be written as the intersection of a bunch of Ai by our equality above, but how does that help us? Well, if an Ai is present in ⋃γ<ωαFγ, then it's present in some Fγ. Which means that it's a closed superset of the corresponding Bγ. Which, by assumption, is a support. So, all our Ai that we're intersecting are supports, since the property of being a support is closed upwards. There are countably many Ai, so we have rewritten our uncountable intersection of supports as a countable intersection of supports, which we know from earlier is a support.
Therefore, the intersection of all supports is a support.
Proposition 3:h(af)=ah(f) for all a≥0 iff all minimal points in H have b=0.
Proof sketch: From LF-duality, we recall our usual technique to prove things about minimal points. If you have a proposed minimal point violating a property, which corresponds to a hyperplane ϕ≥h, and we can find a hyperplane ψ where h≤ψ≤ϕ, and ψ isn't ϕ itself, then ϕ isn't an actual minimal point. We'll take a ϕ where ϕ(0)>0, and undershoot it. The reverse direction of the iff proof is much easier.
Proof: Assume that there's a minimal point in H with b>0. This corresponds to a minimal hyperplane ϕ where ϕ≥hand ϕ(0)>0. Now, consider the sets A and B which are defined as follows: A:={(1−p)(f,b)|f∈CB(X),b>ϕ(f),p∈[0,1)} B:={(f,b)|f∈CB(X),b≤h(f)} the set B is the hypograph of h, and the set A is like the convex hull of the region above ϕ, and the point (0,0), but not the actual point (0,0) itself.
B is the hypograph of a concave continuous function, so it's obviously convex and closed. As for A, it is routine but rather tedious to verify that it is convex, this is left for the reader. Further, A is open. Given any point (1−p)(f,b) in it, we can make a tiny little open ball around (f,b) where everything in that ball has b>ϕ(f), and scale down the ball by multiplying by (1−p). Because p isn't 1, the ball doesn't collapse to a point when scaled, so this gets you a tiny little open ball around your arbitrary point in A, showing it's open.
Now, we must show that A and B are disjoint to invoke Hahn-Banach. Assume there's some point (1−p)(f,b) that lies in both of them at once. Then, (1−p)b≤h((1−p)f) by the defining condition of B, and p∈[0,1) and b>ϕ(f). And also, since the hyperplane corresponding to ϕ lies above the graph of h, ϕ(f)≥h(f). Putting these together, we get: (1−p)b>(1−p)ψ(f)≥(1−p)h(f)=h((1−p)f)≥(1−p)b The first strict inequality is because p isn't 1 or higher, and b>ψ(f). The ≥ is because ψ≥h, the critical equality is because we're assuming that h(af)=ah(f). And the ≥ is because of the defining condition of B. This is impossible, we just showed a number is above itself. So A and B are disjoint.
A and B are disjoint, both convex, and A is open, so we can invoke Hahn-Banach and separate them with a hyperplane ψ. At the start we assumed that ϕ was a minimal point, and ϕ(0)>0, so if we can show that h≤ψ≤ϕ over CB(X), and ψ≠ϕ, then ϕ isn't minimal and we have a contradiction, so all minimal points must map 0 to 0 (it can't be less because any hyperplane corresponding to a minimal point must lie above h and h(0)=0 by normalization)
Now, assuming that there is a f∈CB(X) where ψ(f)>ϕ(f), then (1−0)(f,ψ(f)) lies in A because 0∈[0,1) and ψ(f)>ϕ(f). Thus, ψ cuts into the set A, but it doesn't because it's a separating hyperplane, so we have a contradiction and ψ≤ϕ over CB(X) . Also, since ψ separates A and B, it doesn't cut into the set B, which is the hypograph of h, so h≤ψ.
Now that we have h≤ψ≤ϕ, all we need is to show that ψ(0)=0 to wrap things up. First, ψ≥h and h(0)=0, so ψ(0)≥0. Second, there are points arbitrarily close to (0,0) that lie within the set A, by letting p be very very close to 1. So, if ψ(0)>0, then (0,0) would be strictly below ψ, which is incompatible with it having points arbitrarily close to it that lie within A. Thus, ψ(0)=0.
Hang on, ϕ is a minimal and ϕ(0)>0, so ψ≠ϕ. Also, ψ≤ϕ. Thus, ϕ isn't minimal and we have a contradiction. The flaw must be with our very first assumption, that there's a minimal point with b>0. So all minimal points must have b=0, and we're done with the first half, h(af)=ah(f) implies all minimals have b=0.
In the other direction, assume all minimal points have b=0. Then, for a≥0 h(af)=EH(af)=inf(m,b)∈H(m(af)+b)=inf(m,b)∈Hmin(m(af)+b) =inf(m,b)∈Hminm(af)=inf(m,b)∈Hminam(f)=ainf(m,b)∈Hminm(f) =ainf(m,b)∈Hmin(m(f)+b)=ainf(m,b)∈H(m(f)+b)=aEH(f)=ah(f) And we've shown the equivalence.
Proposition 4:|h(f)−h(f′)|d(f,f′)≤1 iff all minimal points in H have λ≤1.
We know from past proofs in Basic Inframeasure Theory that the maximum Lipschitz constant of a function is the maximum λ value of one of the minimal points, so this is immediate.
Proposition 5:h(1+af)=1−a+ah(1+f) iff all minimal points in H have λ+b=1.
Proof sketch: This is very similar to the proof of proposition 3 of homogenity, the same proof path occurs, except we assume there's a minimal point with λ+b>1 and derive a contradiction from that, and then it's easy to clean up the other direction from there.
Proof: Assume that there's a minimal point in H with λ+b>1. This corresponds to a minimal hyperplane ϕ where ϕ≥h and ϕ(1)>1. Now, consider the sets A and B which are defined as follows: A:={p(1,1)+(1−p)(f,b)|f∈CB(X),b>ϕ(f),p∈[0,1)} B:={(f,b)|f∈CB(X),b≤h(f)} The set B is the hypograph of h, and the set A is like the convex hull of the region above ϕ, and the point (1,1), but not the actual point (1,1) itself.
B is the hypograph of a concave continuous function, so it's obviously convex and closed. As for A, it is routine but rather tedious to verify that it is convex, this is left for the reader. Further, A is open. Given any point p(1,1)+(1−p)(f,b) in it, we can make a tiny little open ball around (f,b) where everything in that ball has b>ϕ(f), and scale down the ball by multiplying by (1−p) and adding p(1,1) which shifts it. Because p isn't 1, the ball doesn't collapse to a point when scaled, so this gets you a tiny little open ball around your arbitrary point in A, showing it's open.
Now, we must show that A and B are disjoint to invoke Hahn-Banach. Assume there's some point p(1,1)+(1−p)(f,b) that lies in both of them at once. Then, p+(1−p)b≤h(p+(1−p)f) by the defining condition of B, and p∈[0,1) and b>ϕ(f). And also, since the hyperplane corresponding to ϕ lies above the graph of h, ϕ(f)≥h(f). Putting these together, we get: p+(1−p)b>p+(1−p)ψ(f)≥p+(1−p)h(f)=1−(1−p)+(1−p)h(1+(f−1)) =h(1+(1−p)(f−1))=h(p+(1−p)f)≥p+(1−p)b The first strict inequality is because p isn't 1 or higher, and b>ψ(f). The ≥ is because ψ≥h. The equality in the first line is just expanding accordingly. The critical equality that starts off the second line is because our infradistribution is cohomogenous, so 1−a+ah(1+f)=h(1+af), and we specialize this accordingly. Then the next equality is just expanding, canceling out like terms, and reexpressing, and the final ≥ is because of the defining condition of B. As a whole, this is impossible, we just showed a number is above itself. So A and B are disjoint.
A and B are disjoint, both convex, and A is open, so we can invoke Hahn-Banach and separate them with a hyperplane ψ. At the start we assumed that ϕ was a minimal point, and ϕ(1)>1, so if we can show that h≤ψ≤ϕ and ψ≠ϕ, then ϕ isn't minimal and we have a contradiction, so all minimal points must map 1 to 1 (it can't be less because any hyperplane corresponding to a minimal point must lie above h and h(1)=1)
Now, assuming there is a f where ψ(f)>ϕ(f) so 0(1,1)+(1−0)(f,ψ(f))=(f,ψ(f)) lies in A because 0∈[0,1) and ψ(f)>ϕ(f). Thus, ψ cuts into the set A, but it doesn't because it's a separating hyperplane, so we have a contradiction and ψ≤ϕ. Also, since ψ separates A and B, it doesn't cut into the set B, which is the hypograph of h, so h≤ψ.
Now that we have h≤ψ≤ϕ, all we need is to show that ψ(1)=1 to wrap things up. First, ψ≥h and h(1)=1, so ψ(1)≥1. Second, there are points arbitrarily close to (1,1) that lie within the set A, by letting p be very very close to 1. So, if ψ(1)>1, then (1,1) would be strictly below ψ, which is incompatible with it having points arbitrarily close to it that lie within A. Thus, ψ(1)=1.
Hang on, ϕ is a minimal and ϕ(1)>1, so ψ≠ϕ. Also, ψ≤ϕ. Thus, ϕ isn't minimal and we have a contradiction. The flaw must be with our very first assumption, that there's a minimal point with λ+b>1. So all minimal points must have λ+b=1, and we're done with the first half, h(1+af)=1−a+ah(1+f) implies all minimals have b=0.
In the other direction, assume all minimal points have λ+b=1. Then h(1+af)=EH(1+af)=inf(m,b)∈H(m(1+af)+b) =inf(λμ,b)∈Hmin(λμ(1+af)+b)=inf(λμ,b)∈Hmin(λ+aλμ(f)+b)=inf(λμ,b)∈Hmin(1+aλμ(f)) =inf(λμ,b)∈Hmin(1+aλμ(f)+a(λ+b−1))=inf(λμ,b)∈Hmin(1−a+aλ+aλμ(f)+ab) =1−a+ainf(λμ,b)∈Hmin(λ+λμ(f)+b)=1−a+ainf(λμ,b)∈Hmin(λμ(1+f)+b) =1−a+ainf(m,b)∈H(m(1+f)+b)=1−a+aEH(1+f)=1−a+ah(1+f) And we've shown the equivalence.
Proposition 6:h(c+f)=c+h(f)iff all minimal points in H have λ=1 iffh(c)=c.
Proof sketch: To go from h(c)=c to "all minimal points have λ=1" requires a contrapositive proof where we assume that there are minimal points with λ≠1, and show it's incompatible with h(c)=c. Showing that λ=1 for all minimal points implies the full form of C-additivity only takes some equality shuffling, and then it's trivial that h(c+f)=c+h(f) implies h(c)=c, just let f be 0.
Proof: Assume there's a minimal point with λ>1. Then the corresponding hyperplane slopes down at a rate of <−1 starting at 0 in the −c direction while h only slopes down at a rate of -1 in that same direction due to h(c)=c. So, eventually, the hyperplane crosses the graph of h, witnessing that it can't actually be a hyperplane above the graph of h and we have a contradiction. Similar arguments dispatch the existence of a minimal point with λ<1, because the hyperplane slopes up at a rate of <1 in the +c direction starting at 0, while h slopes up at a rate of 1 in that same direction due to h(c)=c, so eventually the hyperplane crosses the graph of h and we have another contradiction. So, h(c)=c implies all minimal points have λ=1.
In the other direction, h(c+f)=EH(c+f)=inf(m,b)∈H(m(c+f)+b) =inf(m,b)∈Hmin(m(c+f)+b)=inf(m,b)∈Hmin(c+m(f)+b)=c+inf(m,b)∈Hmin(m(f)+b) =c+inf(m,b)∈H(m(f)+b)=c+EH(f)=c+h(f)
Finally, it's trivial that h(c+f)=c+h(f) implies h(c)=c.
Proposition 7:h(c+af)=c+ah(f) iff all minimal points in H have λ=1 and b=0.
This proof will proceed by showing that h(c+af)=c+ah(f) iff both homogenity and C-additivity hold. We know from earlier that homogenity and C-additivity are equivalent to b=0 and λ=1 respectively, so if we can show "crispness iff homogenity and C-additivity", we're done.
In one direction, h(c+af)=c+ah(f) implies both homogenity and crispness by taking c=0 and a=1 in the two respective cases. In the other direction, h(c+af)=c+h(af)=c+ah(f) by homogenity and C-additivity, and we've derived crispness, so we're done.
Proposition 8:h(f)=infx∈Cf(x) iff the set of minimal points of H corresponds to the set of probability distributions supported on C.
In one direction, if the set of minimal points entirely consists of probability distributions supported on C and only those, then h(f)=EH(f)=inf(m,b)∈Hm(f)+b=inf(λμ,b)∈Hmin(λμ(f)+b) =infμ∈ΔCμ(f)=infx∈Cδx(f)=infx∈Cf(x) In the other direction, we can take any infradistribution that maps a function to its minimum value over C and use the above equalities to conclude that the infradistribution generated by all probability distributions supported over C perfectly duplicates the expectation values of the infradistribution, and so the two sets must be equivalent modulo closure, convex hull, and upper completion. The space of distributions over C is closed and convex, and so the same applies to the upper completion of it, and it equals the infradistribution set corresponding to H.
Proposition 9:All sharp infradistributions are extreme points in the space of infradistributions.
Assume we can make a sharp infradistribution H corresponding to a compact set C as a probabilistic mix of crisp infradistributions. If one of the crisp infradistributions has a minimal point (probability distribution) which isn't supported on C, then you could pick that minimal point, and mix it with other minimal points/probability distributions to make a minimal point/probability distribution in the set Hmin which has some support outside of C, which is impossible, because Hmin is just ΔC.
Therefor, all crisp infradistributions which mix to make the sharp infradistribution H induced by C must have all their probability distribution minimal points supported entirely on C.
Now, pick any point x∈C. Consider the dirac-delta distribution on x, δx. It lies in ΔC, so (δx,0)∈Hmin. This set is a mix of the other sets corresponding to crisp infradistributions, so there must be a probability distribution from each of the crisp infradistributions that mix to make δx. This can only happen if every component in the mix is δx itself.
Thus, given any x∈C, all the crisp infradistributions that mix to make the sharp infradistribution induced by C must include (δx,0) in them. Thus, all crisp infradistributions that mix to make a sharp infradistribution must contain all the dirac-deltas of points in C within them, and by closure and convexity, all the crisp infradistributions contain all of ΔC within their minimal points.
So, all these crisp infradistributions that mix to make our sharp H lack every probability distribution that isn't supported on C, and contain every probability distribution that's supported on C, so they're all equal to our sharp infradistribution H itself. Thus, H is extreme in the space of crisp infradistributions.
This argument works for any sharp infradistribution, so they're all extreme in the space of crisp infradistributions.
It is advised to contact me if you wish to read this, the proofs aren't very edited.
Lemma 1: If X is a Polish space and C is a compact subset and f,f′∈CB(X), and supx∈C|f(x)−f′(x)|≤ϵ, then there is a third bounded continuous function f′′which fulfills the following three properties: First, d(f,f′′)≤ϵ. Second, f′′↓C=f′↓C. Third, d(f′,f′′)≤d(f,f′).
To prove this, we will use the Michael selection theorem to craft a continuous function with these properties. Accordingly, let the set-valued function ψ:X→R be defined as: if x∈C, ψ(x)={f′(x)}, and if x∉C, then:
ψ(x):=[f(x)−ϵ,f(x)+ϵ]∩[f′(x)−d(f,f′),f′(x)+d(f,f′)]
Assuming there was a continous function f′′ where f′′(x)∈ψ(x), it'd get us our desired results. This is because:
d(f,f′′)=supx|f(x)−f′′(x)|=sup(supx∈C|f(x)−f′′(x)|,supx∉C|f(x)−f′′(x)|)
And then, because f′′ is a selection from ψ, we can make these quantities bigger by selecting from ψ in the supremum as well, to get:
≤sup(supx∈C,y∈ψ(x)|f(x)−y|,supx∉C,y∈ψ(x)|f(x)−y|)
And then we can use the definition of ψ(x) in the two cases (the latter case is because y∈ψ(x) implies y∈[f(x)−ϵ,f(x)+ϵ]) to get:
=sup(supx∈C|f(x)−f′(x)|,supx∉C,y∈ψ(x)ϵ)
And then we use the fact that we abbreviated the supremum of the difference between f and f′ as ϵ to get:
=sup(ϵ,ϵ)=ϵ
So, we'd have one of our three results, that d(f,f′′)≤ϵ. Our second desired result that f′′ mimicks f′ on C is trivial by the definition of ψ. And finally,
d(f′,f′′)=supx|f′(x)−f′′(x)|=sup(supx∈C|f′(x)−f′′(x)|,supx∉C|f′(x)−f′′(x)|)
And then, we do the ususual transition from f'' to the \psi function that it's a selection of,
≤sup(supx∈C,y∈ψ(x)|f′(x)−y|,supx∉C,y∈ψ(x)|f′(x)−y|)
And then, because ψ(x) is always f′(x) when x∈C, and y is always bounded in [f′(x)−d(f,f′),f′(x)+d(f,f′)] for the latter part, this turns into:
≤sup(0,d(f,f′))=d(f,f′)
So we have our desired result of d(f′,f′′)≤d(f,f′). As we have established that all three results follow from finding a continuous selection of ψ, we just have to do that now. For this, we will be using the Michael selection theorem. In order to invoke it, we need to check that X is paracompact (all Polish spaces are paracompact), that R is a Banach space (it is), and verify some conditions on the sets produced.
We need that the function ψ(x) is nonempty for all x. This is trivial by the definition of ψ(x) for x∈C, but for the other case, we need to verify that
[f(x)−ϵ,f(x)+ϵ]∩[f′(x)−d(f,f′),f′(x)+d(f,f′)]
Is a nonempty set. This is easy because f(x) witnesses nonemptiness. We need that the function ψ(x) is convex for all x. This is easy because it's the intersection of two convex sets when x∉C, which is the trickier case to check. We also need that the function ψ(x) is closed for all x, which is true because it's either a point or it's an intersection of two closed sets.
All that we're missing in order to invoke the Michael Selection theorem to get a continuous function that works is verifying lower hemicontinuity for the function ψ.
Lower hemicontinuity is: If xn limits to x, and y∈ψ(x), there's some subsequence xm and ym where ym∈ψ(xm), and ym limits to y.
In order to show this, we can take three different cases.
The first possible case is where infinitely many of the xn lie in C, so the limit point x must lie in C as well. Then, xm can be the subsequence which lies in C, and ym can be the subsequence f′(xm), which is the only possible choice of value that lies in ψ(xm). Due to continuity of f′, this obviously converges to f′(x), which is the only possible choice of value for ψ(x) because x∈C.
The second possible case is where only finitely many of the xn lie in C, but yet the limit point x lies in C as well, like limiting to the border of a closed set from outside the closed set. Let xm be the subsequence where you get rid of all the x's in the approximating sequence that lie in C. Notice that \psi(xm), instead of being written as
[f(xm)−ϵ,f(xm)+ϵ]∩[f′(xm)−d(f,f′),f′(xm)+d(f,f′)]
Can be written as the single interval
[sup(f(xm)−ϵ,f′(xm)−d(f,f′)),inf(f(xm)+ϵ,f′(xm)+d(f,f′))]
And now, let ym be defined as:
sup(sup(f(xm)−ϵ,f′(xm)−d(f,f′)),inf(f′(xm),inf(f(xm)+ϵ,f′(xm)+d(f,f′))))
Due to continuity of all these functions, and xm limiting to x, the limiting y value is:
sup(sup(f(x)−ϵ,f′(x)−d(f,f′)),inf(f′(x),inf(f(x)+ϵ,f′(x)+d(f,f′))))
Now, because
f(x)+ϵ=f(x)+supx∈C|f(x)−f′(x)|≥f′(x)
and
f′(x)+d(f,f′)=f(x)+supx|f(x)−f′(x)|≥f′(x)
Therefore,
inf(f(x)+ϵ,f′(x)+d(f,f′))≥f′(x)
Therefore,
inf(f′(x),inf(f(x)+ϵ,f′(x)+d(f,f′)))=f′(x)
So our limiting y value reduces to:
sup(sup(f(x)−ϵ,f′(x)−d(f,f′)),f′(x))
Also, by similar arguments, we have:
f(x)−ϵ=f(x)−supx∈C|f(x)−f′(x)|≤f′(x)
and
f′(x)−d(f,f′)=f(x)−supx|f(x)−f′(x)|≤f′(x)
Therefore
sup(f(x)−ϵ,f′(x)−d(f,f′))≤f′(x)
So, our limiting y value reduces to merely f′(x), ie, our point selected from ψ(x), and we've shown lower hemicontinuity in this case. That just leaves one last case left over, the case where x∉C.
Again, as before, in this case, ψ(x) is the interval
[sup(f(x)−ϵ,f′(x)−d(f,f′)),inf(f(x)+ϵ,f′(x)+d(f,f′))]
Let p be how close the point y is to the bottom (p=1 for the bottom, or the top, when p=0). For your sequence xm limiting to x, it's made by picking out all the xn which lie in C, only finitely many, and the ym are given by:
p(sup(f(xm)−ϵ,f′(xm)−d(f,f′)))+(1−p)(inf(f(xm)+ϵ,f′(xm)+d(f,f′)))
Because of the continuity of the functions sup(f−ϵ,f′−d(f,f′)) (supremum of two continuous functions) and inf(f+ϵ,f′+d(f,f′)) (inf of two continuous functions), the ym limit to:
p(sup(f(x)−ϵ,f′(x)−d(f,f′)))+(1−p)(inf(f(x)+ϵ,f′(x)+d(f,f′)))
Which is just y. So, we have lower hemicontinuity in this last case.
And therefore, we have lower hemicontinuity overall for ψ, and so ψ has a continuous selection function by the Michael Selection Theorem, and said selection function fulfills the requisite properties.
Lemma 2: For all f and f′, if a functional h:CB(X)→R has a set C as an ϵ1-almost-support, and has supx∈C|f(x)−f′(x)|≤ϵ2, then
|h(f)−h(f′)|≤λ⊙ϵ1+ϵ2⋅d(f,f′) where λ⊙ is the Lipschitz constant of h.
Proof: Via Lemma 1, there's a function f′′ with the properties that:
d(f,f′′)≤ϵ1
f′↓C=f′′↓C
d(f′,f′′)≤d(f,f′)
Now, we can go:
|h(f)−h(f′)|≤|h(f)−h(f′′)|+|h(f′′)−h(f′)|≤λ⊙d(f,f′′)+ϵ2d(f′,f′′)
≤λ⊙ϵ1+ϵ2d(f,f′)
And we're done. The critical steps in the second inequality were due to Lipschitzness of h, and the fact that f′′ and f′ agree on C which is an ϵ2-almost-support for h, respectively.
Proposition 1: For a continuous function L:X→[0,1], the metric d|L completely metrizes the set {x|f(x)>0} equipped with the subspace topology.
To do this, we'll need four parts. First, we need to show that d|L is even a metric. Second, we'll need to show that for all open balls in the d|L metric, you can fit an open ball in the original metric within it (so all the open sets induced by the d|L metric were present in the original subspace topology). Second, we'll need to show that for all open balls in the original metric that lie within the support of L, we can fit an open ball induced by the d|L metric within it (so all the open sets in the original subspace topology can be induced by the d|L metric), and parts 2 and 3 let us show that the d|L metric induces the subspace topology on the support of L. Finally, part 4 is showing that any Cauchy sequence in the d|L metric is Cauchy in the original complete metric, so we can't have any cases of missing limits points, and showing that all Cauchy sequences in the support of L, according to the d|L metric, must have their limit point lying in the support of L, so limits never lead outside of the support of L. This then shows that d|L is a complete metrization of the support of L (it induces the same topology as the subspace topology), and all limit points lie in the same space,
So, to begin with, if X is your original Polish space, pick a complete metrization of the space. Then, ensure that the maximum distance is 1 (this can always be done while preserving the exact same Cauchy sequences, and not affecting the topology in any way, and it's still a metric), and call that d.
Also, there's some continuous function L:X→[0,1] that you'll be updating. The set support(L) is defined as {x|L(x)>0}, and is clearly an open subset of X because the preimage of (0,2) (an open set) must be open due to the continuity of L.
Now, define
(d|L)(x,y):=d(x,y)⋅inf(1L(x),1L(y))+∣∣1L(x)−1L(y)∣∣
To show it's a metric, we first need symmetry (which is obvious, because d and inf and absolute value of difference are all symmetric). For identity of indiscernibles, the forward direction, the fact that L is bounded below 1 proves that inf(1L(x),1L(y))≥1, always, so for (d|L)(x,y) to be 0, then d(x,y)=0, so x=y. For the reverse direction, (d|L)(x,x) must be 0, because the equation for distance would reduce to 0⋅1+0.
That just leaves the triangle inequality, and here, we'll critically use the fact that the original metric was clipped so it's never above 1. Our goal is to show that
(d|L)(x,z)≤(d|L)(x,y)+(d|L)(y,z)
Without loss of generality, we can assume that 1L(x)≤1L(z) (otherwise flip x and z), so we can split into three cases, which are:
1L(x)≤1L(y)≤1L(z)
1L(x)≤1L(z)≤1L(y)
1L(y)≤1L(x)≤1L(z)
For the first case, we have:
(d|L)(x,z)=d(x,z)⋅inf(1L(x),1L(z))+∣∣1L(x)−1L(z)∣∣
=d(x,z)⋅1L(x)+∣∣1L(x)−1L(z)∣∣≤d(x,z)⋅1L(x)+∣∣1L(x)−1L(y)∣∣+∣∣1L(y)−1L(z)∣∣
≤(d(x,y)+d(y,z))⋅1L(x)+∣∣1L(x)−1L(y)∣∣+∣∣1L(y)−1L(z)∣∣
≤d(x,y)⋅1L(x)+d(y,z)⋅1L(y)+∣∣1L(x)−1L(y)∣∣+∣∣1L(y)−1L(z)∣∣
=d(x,y)⋅inf(1L(x),1L(y))+∣∣1L(x)−1L(y)∣∣+d(y,z)⋅inf(1L(y),1L(z))+∣∣1L(y)−1L(z)∣∣
=(d|L)(x,y)+(d|L)(y,z)
For the second case, we can do the same exact argument, just replace that 1L(y) in the second line with 1L(z). The third case is the tricky one, we'll work backwards. Remember that our inequalities are:
1L(y)≤1L(x)≤1L(z)
Now, let's proceed.
(d|L)(x,y)+(d|L)(y,z)
=d(x,y)⋅inf(1L(x),1L(y))+∣∣1L(x)−1L(y)∣∣+d(y,z)⋅inf(1L(y),1L(z))+∣∣1L(y)−1L(z)∣∣
=d(x,y)⋅1L(y)+∣∣1L(x)−1L(y)∣∣+d(y,z)⋅1L(y)+∣∣1L(y)−1L(z)∣∣
=d(x,y)⋅1L(y)+1L(x)−1L(y)+d(y,z)⋅1L(y)+1L(z)−1L(y)
=1L(y)(d(x,y)+d(y,z))+1L(x)+1L(z)−2L(y)≥1L(y)d(x,z)+1L(x)+1L(z)−2L(y)
=(1L(y)+1L(x)−1L(x))d(x,z)+1L(x)+1L(z)−2L(y)
Then we do some regrouping, to yield:
=1L(x)d(x,z)+(1L(x)−1L(y))(1−d(x,z))+1L(z)−1L(y)
At point, we remember that d(x,z)≤1 because we bounded our initial distance, and 1L(x)≥1L(y) because of the problem case we're in, to get:
≥1L(x)d(x,z)+1L(z)−1L(y)
And then, we just remember that 1L(y)≤1L(x)≤1L(z) in this problem case, to get:
=d(x,z)inf(1L(x),1L(z))+∣∣1L(z)−1L(y)∣∣=(d|L)(x,z)
And we're done with the triangle inequality, so d|L is indeed a metric.
Well, is it a complete metric for support(L)? Well, by looking at the definition of d|L, and remembering that 1L(x) is always 1 or more because the likelihood function is bounded in [0,1], we have that:
(d|L)(x,y)≥d(x,y)
When x and y are in the support of L, so any Cauchy sequence in d|L must also be Cauchy in d. Now, either a Cauchy sequence has its limit point also lying in the support of L, in which case we're good, or it has its limit point lying on the edge of the support of L (and outside the set), where L(x)=0. However, in that case, the sequence in d|L cannot be Cauchy, because 1L(xn) would diverge to infinity due to the continuity of L and the limit point being on the edge where L(x)=0, so it's impossible that you could have some finite point that'd be close to all the rest, the absolute value term in the distance would forbid Cauchyness. So, it's a complete metric for the support of L.
All that remains is to show that it induces the same topology as the subspace topology that the open set support(L) should have. Because the support of L is itself open, and the space X is metrized by the distance metric d, any open set in the support of L can be written as the intersection of an open set in X and the open set support(L), and so it's open in X, and can be written as the union of tiny open balls in the original distance metric.
So, we'll proceed by showing that any open ball (w.r.t. the d metric) within the support of L, can fit some open ball (w.r.t. the d|L metric) in it that engulfs the center point, and any open ball (w.r.t. the d|L metric) within the support of L, can fit some open ball (w.r.t. the d metric) in it that engulfs the center point.
If we can do it, than because any open set in the subspace topology can be written as the union of a bunch of open balls centered at each point in the open set according to the d metric, and each of those open balls according to the d metric has their center point engulfed by a smaller open ball according to the d|L metric, we'd be able to build the open set in the subspace topology out of a union of open sets in the d|L induced topology.
And also, any open set in the d|L-induced topology can be written as a union of infinitely many open balls centered at each relevant point, and because each of those open balls has their center point engulfed by a smaller open ball according to the d metric, so we can build any open set in the d|L-inducd topology out of a union of open sets in the original topology.
The net result is that the d|L induced topology and the subspace topology have the same open sets. So, let's get working on showing that we can fit the two sorts of balls inside each other.
In one direction, if you have a ball of size ϵ according to the metric d, then because (d|L)(x,y)≥d(x,y) always, a ball of size ϵ according to the metric d|L centered at the same point will fit entirely within the original ball, so we have one half of our topology argument done.
In the other direction, if you have a ball of size ϵ according to the metric d|L, around some point x, then there's some δ distance around x where the function 1L only varies by ϵ3.
At this point, we can then fit a ball of size min(δ,ϵL(x)3) around the point x (according to the original d metric), and it will lie within the ball of size ϵ around the point x according to the d|L metric. The reason for this is:
(d|L)(x,y)=d(x,y)inf(1L(x),1L(y))+∣∣1L(x)−1L(y)∣∣
And then, because we selected our distance between x and y so that 1L(x) and 1L(y) only differ by ϵ3 at most (because a ball of size δ suffices to accomplish this), we have:
≤ϵL(x)3(1L(x)+ϵ3)+ϵ3=ϵ3+ϵ2L(x)9+ϵ3
And then, because L(x)≤1 and ϵ is very small, we have
≤ϵ3+ϵ3+ϵ3=ϵ
So, this small of a ball in the original distance metric suffices to slot entirely inside a ball of size ϵ centered at x in the d|L metric.
And that's all we need!
Theorem 1: The set of infradistributions (set form) is isomorphic to the set of infradistributions (functional form). The H→h part of the isomorphism is given by h(f)=inf(m,b)∈Hm(f)+b, and the h→H part of the isomorphism is given by H={(m,b)|b≥(h′)∗(m)}, where h′(f)=−h(−f) and (h′)∗ is the convex conjugate of h′.
Our first order of business is establishing the isomorphism. Our first direction is H to h and back is H exactly. By upper completion, and reproved analogues of Proposition 2 and Theorem 1 from "Basic inframeasure theory", which an interested party can reprove if they want to see it, we can characterize H as
{(m,b)|∀f∈CB(X):m(f)+b≥inf(m′,b′)∈H(m′(f)+b′)}
And then, our H can further be reexpressed as
{(m,b)|∀f∈CB(X):m(f)+b≥EH(f)}
{(m,b)|∀f∈CB(X):b≥EH(f)−m(f)}
{(m,b)|b≥supf∈CB(X)(EH(f)−m(f))}
Also, EH(f)=h(f)=−h′(−f), so we can rewrite this as:
{(m,b)|b≥sup(−f)∈CB(X)(m(−f)−h′(−f))}
and, by the definition of the convex conjugate, and the space of finite signed measures being the dual space of CB(X), and m(−f) being a functional applied to an element, this is...
{(m,b)|b≥(h′)∗(m)}
So, our original set H is identical to the convex-conjugate set, when we go from H to h back to a set of sa-measures.
Proof Phase 2: In the reverse direction for isomorphism, assume that h fulfills the conditions (we'll really only need continuity and concavity)
We want to show that
\mathbb{E}_{\{(m,b)|b\ge(h')^{*}(m)\}}(f)=h(f)
Let's begin.
E{(m,b)|b≥(h′)∗(m)}(f)=inf(m,b):b≥(h′)∗(m)(m(f)+b)
Given an m, we have a natural candidate for minimizing the b, just set it equal to (h′)∗(m). So then we get
infm(m(f)+(h′)∗(m))=infm((h′)∗(m)−m(−f))
And this is just... −(h′)∗∗(−f), and, because h is continuous over CB(X), and concave, then h′ is continuous over CB(X), and convex, so h′=(h′)∗∗. From that, we get
E{(m,b)|b≥(h′)∗(m)}(f)=−(h′)∗∗(−f)=−h′(−f)=h(f)
and we're done with isomorphism.
So, in our first direction, we're going to derive the conditions on the functional from the condition on the set, so we can assume nonemptiness, closure, convexity, upper completion, projected-compactness, and normalization, and derive monotonicity, concavity, normalization, Lipschitzness, and compact almost-support (CAS) from that.
For monotonicity, remember that all points in the infradistribution set are a-measures, so if f′≥f, then
h(f′)=inf(m,b)∈Hm(f′)+b≥inf(m,b)∈Hm(f)+b=h(f)
We could do that because all the measure components are actual measures.
For concavity,
h(pf+(1−p)f′)=inf(m,b)∈Hm(pf+(1−p)f′)+b
=inf(m,b)∈H(pm(f)+(1−p)m(f′)+pb+(1−p)b)
≥inf(m,b)∈H(pm(f)+pb)+inf(m,b)∈H((1−p)m(f′)+(1−p)b)
=pinf(m,b)∈H(m(f)+b)+(1−p)inf(m,b)∈H(m(f′)+b)=ph(f)+(1−p)h(f′)
And we're done with that. For normalization,
h(0)=inf(m,b)∈Hm(0)+b=inf(m,b)∈Hb=0
And
h(1)=inf(m,b)∈Hm(1)+b=inf(λμ,b)∈Hλμ(1)+b=inf(λμ,b)∈Hλ+b=1
So we have normalization.
For Lipschitzness, we first observe that compact-projection (the minimal points, when projected down to their measure components, make a set with compact closure) enforces that there's an upper bound on the λ value of a minimal point (λμ,b)∈Hmin, because otherwise you could pick a sequence with unbounded λ, and it'd have no convergent subsequence of measures, which contradicts precompactness of the minimal points projected down to their measure components.
Then, we observe that points in H correspond perfectly to hyperplanes that lie above the graph of h, and a minimal point is "you shift your hyperplane down as much as you can as much as you can until you can't shift it down any more without starting to cut into the function h". Further, for every function f∈CB(X), you can make a hyperplane tangent to the function h at that point by the Hahn-Banach theorem, which must correspond to a minimal point.
Putting it together, the hypograph of h is exactly the region below all its tangent hyperplanes. And we know all the tangent hyperplanes correspond to minimal points, and their Lipschitz constants correspond to the λ value of the minimal points. Which are bounded. So, Compact-Projection in H implies h is Lipschitz.
Finally, we'll want compact almost-support. A set of measures is compact iff the amount of measure is upper-bounded, and, for all ϵ, there is a compact set Cϵ⊆X where all the measures m have <ϵ measure outside of Cϵ.
So, given that the set of measures corresponding to H is compact by the compact-projection property, we want to show that the functional h has compact almost-support. To do this, we'll observe that if h is the inf of a bunch of functions, and all functions think two different points are only a little ways apart in value, then h must think they're only a little distance apart in value. Keeping that in mind, we have:
|dh(f;f′)|=limδ→0|h(f+δf′)−h(f)|δ=limδ→0|inf(m,b)∈H(m(f+δf′)+b)−inf(m′,b′)∈H(m′(f)+b′)|δ
And then, we can think of a minimal point as corresponding to a hyperplane ϕ and ψ, and h is the inf of all of them, so to bound the distance between these two values, we just need to assess the maximum size of the gap between those values over all minimal points/tangent hyperplanes. Thus, we can get:
limδ→0|inf(m,b)∈H(m(f+δf′)+b)−inf(m′,b′)∈H(m′(f)+b′)|δ
≤limδ→0sup(m,b)∈H|(m(f+δf′)+b)−(m(f)+b))|δ
And then, we can do some canceling and get:
=limδ→0sup(m,b)∈H|m(δf′)|δ=limδ→0sup(m,b)∈H|δm(f′)|δ=limδ→0sup(m,b)∈H|m(f′)|=sup(m,b)∈H|m(f′)|
And then, because f' was selected to be 0 on Cϵ, which makes up all but ϵ of the measure for all measures present in H, we can upper-bound |m(f′)| by ϵ||f′||, so we have that
f′↓Cϵ=0→∀f:|dh(f;f′)|≤ϵ||f′||
And so, Cϵ is a compact ϵ-almost-support for h, and this argument works for all ϵ, so h is CAS, and that's the last condition we need. Thus, if H is an infradistribution (set form), the expectation functional h is an infradistribution (expectation form).
Now for the other direction, where we assume monotonicity, concavity, normalization, Lipschitzness, and CAS on an infradistribution (expectation form) and show that the induced form fulfills nonemptiness, convexity, closure, upper completion, projection-compactness, normalization, and being a set of a-measures.
Remember, our specification of the corresponding set was:
{(m,b)|b≥(h′)∗(m)}
Where h′ is the function given by h′(−f)=−h(f), and (h′)∗ is the convex conjugate of h′.
First, being a nonempty set of a-measures. Because there's an isomorphism linking points of the set and hyperplanes above the graph of h, we just need to establish that no hyperplanes above the graph of h can slope down in the direction of a nonnegative function (as this certifies that the measure component must be an actual measure), and no hyperplanes above the graph of h can assign 0 a value below 0 (as this corresponds to the b term, and can be immediately shown by normalization).
What we do is go "assume there's a ϕ where the linear functional corresponding to ϕ isn't a measure, ie, there's some nonnegative function f where ϕ(f)<ϕ(0)". Well, because of monotonicity for h (one of the assumed properties), we have h(0)≤h(f)≤h(2f)≤h(3f).... And, because all affine functionals are made by taking a linear functional and displacing it, ϕ(0)>ϕ(f)>ϕ(2f)>ϕ(3f)..., decreases at a linear rate, so eventually the hyperplane and h cross over, but ϕ was assumed to be above h always, so we have a contradiction.
Therefore, all hyperplanes above h must have their linear functional component corresponding to an actual measure, ie, being an a-measure. And we get nonemptiness from the concavity of h, so we can pick any function and use the Hahn-Banach theorem to make a tangent hyperplane to h that touches at that point, certifying nonemptiness.
By the way, the convex conjugate, (h′)∗(m), can be reexpressed as supf(h(f)−m(f)).
For closure and convexity: By monotonicity of h and normalization, 0=−h(0)≥−h(f)≥−h(1)=−1, and h is continuous (Lipschitz) on CB(X), and concave, so h′ is proper, continuous on CB(X), and convex, so, by the Wikipedia page on "Closed Convex Function", h′ is a closed convex function, and then by the Wikipedia page on "Convex Conjugate" in the Properties section, (h′)∗ is convex and closed. From the Wikipedia page on "Closed Convex Function", this means that the epigraph of (h′)∗ is closed, and also the epigraph of a convex function is convex. This takes care of closure and convexity for our H.
Time for upper-completeness. Assume that (m,b) lies in the epigraph. Our task now is to show that (m,b)+(0,b′) lies in the epigraph. This is equivalent to showing that b+b′≥(h′)∗(m). Let's begin.
(h′)∗(m)≤b≤b+b′
And we're done.
Normalization of the resulting set is easy. Going from h to a (maybe)-inframeasure H back to h is identity as established earlier, so all we have to do is show that a failure of normalization in a (maybe)-inframeasure makes the resulting h not normalized. Thus, if our h is normalized, and it makes an H that isn't normalized, then going back makes a non-normalized h, which contradicts isomorphism. So, assume there's a failure of normalization in H. Then EH(0)≠0, or EH(1)≠1, so either h(0)≠0 or h(1)≠1 and we get a failure of normalization for h which is impossible. So H must be normalized.
That just leaves compact-projection. We know that a set of measures is precompact iff there's a bound on their λ values, and for all ϵ, there's a compact set Cϵ⊆X where all the measure components have <ϵ measure outside of that set.
First, we can observe that no hyperplane above h can have a Lipschitz constant above the maximal Lipschitz constant for the function h, because if it increased more steeply in some direction, you could go in the other direction to decrease as steeply as possible, and h would be constrained to decrease strictly less steeply in that direction, so if you went far enough in that direction, your hyperplane and h would intersect, which is impossible. Thus, Lipschitzness of h enforces that there can be no point in the set H with too much measure, which gives us one half of compact-projection for H.
For the other half, CAS for h ensures that for all ϵ, there is a compact set Cϵ where
f′↓Cϵ=0→|dh(f;f′)|≤ϵ||f′||
What we'll do is establish that no hyperplane lying above h can have a slope more than ϵ in the direction of a function that's in [0,1] and is 0 on Cϵ. Let f′ be such a function fulfilling those properties that makes some hyperplane above h slope down too hard. Then
dh(0;−f′)≥−ϵ
Because going in the direction of a negative function decreases your value, so the Gateaux derivative would be negative, and −f′ is in [−1,0], and we have CAS on h.
Now, we can realize that as we travel from 0 to −f′ to −2f′ to −3f′, our vector of travel is always in the direction of −f′, which can't be too negative. Each additional −f′ added drops the value of h by at most ϵ||f′||. However, each additional −f′ added drops the value of ψ (our assumed functional that's sloping down too hard in the −f′ direction) by more than that quantity, so eventually ψ will cross over to be lower than h, so ψ can't correspond to an a-measure in H, and we have a contradiction.
Therefore, regardless of the point in H, its measure component must assign any function that's 0 on Cϵ and bounded in [0,1] a value of ϵ at most. We can then realize that this can only happen to a measure that assigns ϵ or less measure to the set Cϵ (otherwise you can back off from your continuous function to a discontinuous indicator function).
Thus, given our ϵ, we've found a compact set Cϵ where the measure component of all points in H assign ϵ or less value to the outside of that set, and this can be done for all ϵ, certifying the last missing piece for compact-projection of H (because the projection is precompact iff the set of measures is bounded above in amount of measure present, and for all ϵ, there's a compact set Cϵ where all the measures assign ≤ϵ measure outside of that set.
And that's the last condition we need to conclude that the set form of an infradistribution (functional form) is an infradistribution (set form), and we're done.
Proposition 2: For any infradistribution h there is a unique closed set S which is the intersection of all supports for h, and is a support itself.
Proof sketch: The proof splits into three parts. First, we show that if B is a support and B′ is an ϵ-almost support, then B∩B′ is a ϵ-almost support, by picking two functions which agree on B∩B′ and using them to construct a third function which has similar expectations to both of them, due to agreeing with the first function on B and agreeing with the second function on B′, so our arbitrary functions which agree on the intersection have similar expectation values. A support is precisely an ϵ-almost support for all ϵ, so actual supports are preserved under finite intersection.
Second, we show that supports are preserved under countable intersections, which critically relies on compact ϵ-almost supports for all ϵ. Roughly, that part of the proof proceeds by using our compact almost-support in order to make a sequence of compact almost-supports nested in each other, and showing they converge in Hausdorff distance, so two functions which agree on the intersection of all the compact sets are very close to agreeing on some finite almost-support and have similar expectation values, so the intersection of compact sets is an almost-support. The property of being an almost-support is hereditary upwards, and we can show our countable intersection of compact sets is a subset of the countable intersection we're interested in, so it's an almost-suppport. However, we can shrink ϵ to 0 so it's a support.
Finally, we show that supports are preserved under arbitrary intersections by using strong Lindelofness of X (because it's Polish) to reduce the arbitrary intersection case to the countable intersection case which has already been solved.
For the finite intersection case, let B be a support and B′ be an ϵ-almost support. Our task is to show that B∩B′ is an ϵ-almost support, by showing that two functions which agree on it can't be more than ϵ apart in expectation value.
Pick any two functions f and f′ where f↓B∩B′=f′↓B∩B′. Let's define a f′′ on B∪B′ as follows. If x∈B, then f′′(x):=f(x). If x∈B′, and x∉B, then f′′(x):=f′(x). Our first task is to show that f′′ is continuous.
If xn limits to x, we can split into three cases. Our first case is where x∈B, and x∉B′. Then past a certain n, xn will always not be in B′ (complement of a closed set is open), and so its values are dictated by f, which is continuous, and we have continuity of f′′ at that point. The second case where x∈B′ and x∉B is symmetric, but with the values being dictated by f′ instead. Our third case where x∈B∩B′ is the only one which takes some care. As xn limits to x, then both f′(xn) and f(xn) (not the restrictions, the original functions!) limit to f(x)=f′(x) (because f and f′ agree on B∩B′) so no matter how our xn toggles back and forth between B and B′, we have continuity.
Now that we know f′′ is continuous on B∩B′, we can use the Tietze Extension Theorem to extend f′′−f′ to a continuous function on the entire space X, because the union of two closed sets is closed, and we can ensure that the extension stays in [−d(f,f′),d(f,f′)] as well. (because f′′ is copying either f or f′), call this extension f∗.
Now, because
(f′+f∗)↓B=(f′+(f′′−f′))↓B=f′′↓B=f↓B
and B is a support, then h(f′+f∗)=h(f). However, since
(f′+f∗)↓B′=(f′+f′′−f′)↓B′=f′′↓B′=f′↓B′
(because f′ and f agree on B′∩B), then
|h(f′+f∗)−h(f′)|<ϵd(f′+f∗,f′)=ϵsupx|f′(x)+f∗(x)−f′(x)|=ϵsupx|f∗(x)|≤ϵd(f,f′)
The last inequality is because f∗ was selected to be bounded in the relevant range, and the first inequality is because f′+f∗ and f′ are identical on B′ which is a ϵ-almost-support.
Thus,
|h(f)−h(f′)|≤|h(f)−h(f′+f∗)|+|h(f′+f∗)−h(f′)|<ϵd(f,f′)
That last inequality was because f and f+f∗ agree on B which is a support, and for the latter, we already derived that inequality. Because f and f′ were arbitrary functions which agreed on B∩B′, and B was an arbitrary support and B′ was an arbitrary ϵ-almost-support, we have that the intersection of any support and ϵ-almost-support is an ϵ-almost support. As a special case of this, the intersection of two supports is a support, so being a support is preserved under finite intersection.
Now for supports being preserved under countable intersection. Fix a countable family of supports, Bi. We will show that ⋂iBi is a support.
First, by our "compact almost-support" property, for all ϵ, there's a compact set Cϵ where any two functions agreeing on Cϵ will only differ in expectation by ϵd(f,f′). Fix an arbitrary ϵ to use through the rest of the argument, which induces a compact set Cϵ.
Let Si be defined as Cϵ∩⋂j≤iBj, and Sω:=⋂iSi. Polish spaces are Hausdorff, so the intersection of compact sets with closed sets is compact. Therefore, the Si are compact and all nested in each other. All the Si are ϵ-supports, because they're the intersection of finitely many supports (which by the earlier result is a support) intersected with an ϵ-support.
We will show that Si converge to Sω in Hausdorff-distance. Assume that they don't converge in Hausdorff-distance. Then there's some δ where all Si have a point more than δ distance away from the set Sω. We can pick a sequence of those points from the Si, and they're all in S0 which is compact, so we can isolate a convergent subsequence. Then, for all i, the subsequence enters the closed set Si and doesn't leave, so the limit point must be in all the Si and thus in Sω, contradicting that the sequence of points is always δ away from Sω. This shows that the Si converge to Sω in Hausdorff-distance.
At this point fix a f and f′ which agree with each other on Sω. f and f′ restricted to any Si (which is compact) must be uniformly continuous, and as we restrict to smaller and smaller Si, the function saying how close two points need to be for f and f′ to change little gets more tolerant.
Pick any δ<<ϵ you want. By uniform continuity for the restrictions of f and f′ to a compact set we can find some γ>0 where points in S0 (a superset of all later Si) must be that close for f and f′ to only vary by δ2 between the two points. And for any γ there's some i where Si is within γ Hausdorff-distance of Sω because they limit to each other in Hausdorff-distance. Any point in Si is only γ away from Sω, so f and f′ are only 2⋅δ2=δ or less apart from each other on suitably late Si.
By Lemma 2, since f and f′ are only δ away from each other on Si which is an ϵ-almost support, we have:
|h(f)−h(f′)|≤λ⊙δ+ϵd(f,f′)
And, because δ was arbitrary, we can take the limit as it approaches 0 to get:
|h(f)−h(f′)|≤ϵd(f,f′)
Also, f and f′ were arbitrary functions which agreed on Sω, so Sω is an ϵ-almost-support.
Sω=⋂iSi=⋂i(Cϵ∩⋂j≤iBj)=Cϵ∩⋂iBi⊆⋂iBi
Being an almost-support is hereditary upwards, because two functions which agree on a superset agree on a subset and are therefore close. So, ⋂iBi is an ϵ-almost-support. But ϵ was arbitrary, so it's a support, and we have that countable intersections of supports are supports.
Finally, we'll show that any uncountable intersection of supports is identical to a countable intersection of supports. Let's say you intersect ℵα closed sets which are supports. Biject that with the ordinal ωα by the well-ordering principle, so now all supports are indexed like Bγ with γ<ωα. Our task is now to write ⋂γ<ωαBγ as a countable intersection of closed supports.
Separable metric spaces are strongly Lindelof, which implies that every open set can be written as a countable union of open sets from a countable basis. By taking the complement, we can get the converse. Every closed set can be written as a countable intersection of closed sets from a countable basis. Call the countable basis closed sets Ai.
Now, define Fγ:={Ai|Bγ⊆Ai}. It is the family of basis closed sets which Bγ is a subset of. Because every closed set is the intersection of closed sets from the basis, Bγ=⋂Fγ.
Now, let's consider ⋂(⋃γ<ωαFγ)
Ie, the set made by intersecting all basis closed sets which contain some Bγ within them. We will show that this equals ⋂γ<ωαBγ, our arbitrary intersection of choice.
In one direction,
x∈⋂γ<ωαBγ→∀γ<ωα:x∈Bγ
Now, assume ∃γ<ωα:Bγ⊆Ai. From this, we derive x∈Ai. The i was arbitrary, so we've derived:
∀i:(∃γ<ωα:Bγ⊆Ai)→x∈Ai
Which is the same as:
x∈⋂i:(∃γ<ωα:Ai⊆Bγ)Ai
which is the same as:
x∈⋂{Ai|∃γ<ωα:Bγ⊆Ai}
which is the same as:
x∈⋂(⋃γ<ωα{Ai|Bγ⊆Ai})
which is the same as:
x∈⋂(⋃γ<ωαFγ)
So, there's one direction done, we just showed that:
⋂γ<ωαBγ⊆⋂(⋃γ<ωαFγ)
For the other direction, observe that
x∉⋂γ<ωαBγ→∃γ<ωα:x∉Bγ
Now, because Bγ=⋂Fγ,
∃γ<ωα:x∉⋂Fγ
which is equivalent to:
∃γ<ωα:∃i:(Sγ⊆Ai∧x∉Ai)
Swapping the quantifiers, we get:
∃i:(∃γ<ωα:Sγ⊆Ai)∧x∉Ai
Which can be reexpressed as:
¬∀i:(∃γ<ωα:(Sγ⊆Ai)→x∈Ai
And by our work in the first direction of trying to establish equality between the two sets (specifically, all our equivalences towards the end), this is the same as:
x∉⋂(⋃γ<ωαFγ)
So, we have our bidirectional implication and equality,
⋂γ<ωαBγ=⋂(⋃γ<ωαFγ)
We're trying to show the former set is a support. It can be written as the intersection of a bunch of Ai by our equality above, but how does that help us? Well, if an Ai is present in ⋃γ<ωαFγ, then it's present in some Fγ. Which means that it's a closed superset of the corresponding Bγ. Which, by assumption, is a support. So, all our Ai that we're intersecting are supports, since the property of being a support is closed upwards. There are countably many Ai, so we have rewritten our uncountable intersection of supports as a countable intersection of supports, which we know from earlier is a support.
Therefore, the intersection of all supports is a support.
Proposition 3: h(af)=ah(f) for all a≥0 iff all minimal points in H have b=0.
Proof sketch: From LF-duality, we recall our usual technique to prove things about minimal points. If you have a proposed minimal point violating a property, which corresponds to a hyperplane ϕ≥h, and we can find a hyperplane ψ where h≤ψ≤ϕ, and ψ isn't ϕ itself, then ϕ isn't an actual minimal point. We'll take a ϕ where ϕ(0)>0, and undershoot it. The reverse direction of the iff proof is much easier.
Proof: Assume that there's a minimal point in H with b>0. This corresponds to a minimal hyperplane ϕ where ϕ≥hand ϕ(0)>0. Now, consider the sets A and B which are defined as follows:
A:={(1−p)(f,b)|f∈CB(X),b>ϕ(f),p∈[0,1)}
B:={(f,b)|f∈CB(X),b≤h(f)}
the set B is the hypograph of h, and the set A is like the convex hull of the region above ϕ, and the point (0,0), but not the actual point (0,0) itself.
B is the hypograph of a concave continuous function, so it's obviously convex and closed. As for A, it is routine but rather tedious to verify that it is convex, this is left for the reader. Further, A is open. Given any point (1−p)(f,b) in it, we can make a tiny little open ball around (f,b) where everything in that ball has b>ϕ(f), and scale down the ball by multiplying by (1−p). Because p isn't 1, the ball doesn't collapse to a point when scaled, so this gets you a tiny little open ball around your arbitrary point in A, showing it's open.
Now, we must show that A and B are disjoint to invoke Hahn-Banach. Assume there's some point (1−p)(f,b) that lies in both of them at once. Then, (1−p)b≤h((1−p)f) by the defining condition of B, and p∈[0,1) and b>ϕ(f). And also, since the hyperplane corresponding to ϕ lies above the graph of h, ϕ(f)≥h(f). Putting these together, we get:
(1−p)b>(1−p)ψ(f)≥(1−p)h(f)=h((1−p)f)≥(1−p)b
The first strict inequality is because p isn't 1 or higher, and b>ψ(f). The ≥ is because ψ≥h, the critical equality is because we're assuming that h(af)=ah(f). And the ≥ is because of the defining condition of B. This is impossible, we just showed a number is above itself. So A and B are disjoint.
A and B are disjoint, both convex, and A is open, so we can invoke Hahn-Banach and separate them with a hyperplane ψ. At the start we assumed that ϕ was a minimal point, and ϕ(0)>0, so if we can show that h≤ψ≤ϕ over CB(X), and ψ≠ϕ, then ϕ isn't minimal and we have a contradiction, so all minimal points must map 0 to 0 (it can't be less because any hyperplane corresponding to a minimal point must lie above h and h(0)=0 by normalization)
Now, assuming that there is a f∈CB(X) where ψ(f)>ϕ(f), then (1−0)(f,ψ(f)) lies in A because 0∈[0,1) and ψ(f)>ϕ(f). Thus, ψ cuts into the set A, but it doesn't because it's a separating hyperplane, so we have a contradiction and ψ≤ϕ over CB(X) . Also, since ψ separates A and B, it doesn't cut into the set B, which is the hypograph of h, so h≤ψ.
Now that we have h≤ψ≤ϕ, all we need is to show that ψ(0)=0 to wrap things up. First, ψ≥h and h(0)=0, so ψ(0)≥0. Second, there are points arbitrarily close to (0,0) that lie within the set A, by letting p be very very close to 1. So, if ψ(0)>0, then (0,0) would be strictly below ψ, which is incompatible with it having points arbitrarily close to it that lie within A. Thus, ψ(0)=0.
Hang on, ϕ is a minimal and ϕ(0)>0, so ψ≠ϕ. Also, ψ≤ϕ. Thus, ϕ isn't minimal and we have a contradiction. The flaw must be with our very first assumption, that there's a minimal point with b>0. So all minimal points must have b=0, and we're done with the first half, h(af)=ah(f) implies all minimals have b=0.
In the other direction, assume all minimal points have b=0. Then, for a≥0
h(af)=EH(af)=inf(m,b)∈H(m(af)+b)=inf(m,b)∈Hmin(m(af)+b)
=inf(m,b)∈Hminm(af)=inf(m,b)∈Hminam(f)=ainf(m,b)∈Hminm(f)
=ainf(m,b)∈Hmin(m(f)+b)=ainf(m,b)∈H(m(f)+b)=aEH(f)=ah(f)
And we've shown the equivalence.
Proposition 4: |h(f)−h(f′)|d(f,f′)≤1 iff all minimal points in H have λ≤1.
We know from past proofs in Basic Inframeasure Theory that the maximum Lipschitz constant of a function is the maximum λ value of one of the minimal points, so this is immediate.
Proposition 5: h(1+af)=1−a+ah(1+f) iff all minimal points in H have λ+b=1.
Proof sketch: This is very similar to the proof of proposition 3 of homogenity, the same proof path occurs, except we assume there's a minimal point with λ+b>1 and derive a contradiction from that, and then it's easy to clean up the other direction from there.
Proof: Assume that there's a minimal point in H with λ+b>1. This corresponds to a minimal hyperplane ϕ where ϕ≥h and ϕ(1)>1. Now, consider the sets A and B which are defined as follows:
A:={p(1,1)+(1−p)(f,b)|f∈CB(X),b>ϕ(f),p∈[0,1)}
B:={(f,b)|f∈CB(X),b≤h(f)}
The set B is the hypograph of h, and the set A is like the convex hull of the region above ϕ, and the point (1,1), but not the actual point (1,1) itself.
B is the hypograph of a concave continuous function, so it's obviously convex and closed. As for A, it is routine but rather tedious to verify that it is convex, this is left for the reader. Further, A is open. Given any point p(1,1)+(1−p)(f,b) in it, we can make a tiny little open ball around (f,b) where everything in that ball has b>ϕ(f), and scale down the ball by multiplying by (1−p) and adding p(1,1) which shifts it. Because p isn't 1, the ball doesn't collapse to a point when scaled, so this gets you a tiny little open ball around your arbitrary point in A, showing it's open.
Now, we must show that A and B are disjoint to invoke Hahn-Banach. Assume there's some point p(1,1)+(1−p)(f,b) that lies in both of them at once. Then, p+(1−p)b≤h(p+(1−p)f) by the defining condition of B, and p∈[0,1) and b>ϕ(f). And also, since the hyperplane corresponding to ϕ lies above the graph of h, ϕ(f)≥h(f). Putting these together, we get:
p+(1−p)b>p+(1−p)ψ(f)≥p+(1−p)h(f)=1−(1−p)+(1−p)h(1+(f−1))
=h(1+(1−p)(f−1))=h(p+(1−p)f)≥p+(1−p)b
The first strict inequality is because p isn't 1 or higher, and b>ψ(f). The ≥ is because ψ≥h. The equality in the first line is just expanding accordingly. The critical equality that starts off the second line is because our infradistribution is cohomogenous, so 1−a+ah(1+f)=h(1+af), and we specialize this accordingly. Then the next equality is just expanding, canceling out like terms, and reexpressing, and the final ≥ is because of the defining condition of B. As a whole, this is impossible, we just showed a number is above itself. So A and B are disjoint.
A and B are disjoint, both convex, and A is open, so we can invoke Hahn-Banach and separate them with a hyperplane ψ. At the start we assumed that ϕ was a minimal point, and ϕ(1)>1, so if we can show that h≤ψ≤ϕ and ψ≠ϕ, then ϕ isn't minimal and we have a contradiction, so all minimal points must map 1 to 1 (it can't be less because any hyperplane corresponding to a minimal point must lie above h and h(1)=1)
Now, assuming there is a f where ψ(f)>ϕ(f) so 0(1,1)+(1−0)(f,ψ(f))=(f,ψ(f)) lies in A because 0∈[0,1) and ψ(f)>ϕ(f). Thus, ψ cuts into the set A, but it doesn't because it's a separating hyperplane, so we have a contradiction and ψ≤ϕ. Also, since ψ separates A and B, it doesn't cut into the set B, which is the hypograph of h, so h≤ψ.
Now that we have h≤ψ≤ϕ, all we need is to show that ψ(1)=1 to wrap things up. First, ψ≥h and h(1)=1, so ψ(1)≥1. Second, there are points arbitrarily close to (1,1) that lie within the set A, by letting p be very very close to 1. So, if ψ(1)>1, then (1,1) would be strictly below ψ, which is incompatible with it having points arbitrarily close to it that lie within A. Thus, ψ(1)=1.
Hang on, ϕ is a minimal and ϕ(1)>1, so ψ≠ϕ. Also, ψ≤ϕ. Thus, ϕ isn't minimal and we have a contradiction. The flaw must be with our very first assumption, that there's a minimal point with λ+b>1. So all minimal points must have λ+b=1, and we're done with the first half, h(1+af)=1−a+ah(1+f) implies all minimals have b=0.
In the other direction, assume all minimal points have λ+b=1. Then
h(1+af)=EH(1+af)=inf(m,b)∈H(m(1+af)+b)
=inf(λμ,b)∈Hmin(λμ(1+af)+b)=inf(λμ,b)∈Hmin(λ+aλμ(f)+b)=inf(λμ,b)∈Hmin(1+aλμ(f))
=inf(λμ,b)∈Hmin(1+aλμ(f)+a(λ+b−1))=inf(λμ,b)∈Hmin(1−a+aλ+aλμ(f)+ab)
=1−a+ainf(λμ,b)∈Hmin(λ+λμ(f)+b)=1−a+ainf(λμ,b)∈Hmin(λμ(1+f)+b)
=1−a+ainf(m,b)∈H(m(1+f)+b)=1−a+aEH(1+f)=1−a+ah(1+f)
And we've shown the equivalence.
Proposition 6: h(c+f)=c+h(f) iff all minimal points in H have λ=1 iff h(c)=c.
Proof sketch: To go from h(c)=c to "all minimal points have λ=1" requires a contrapositive proof where we assume that there are minimal points with λ≠1, and show it's incompatible with h(c)=c. Showing that λ=1 for all minimal points implies the full form of C-additivity only takes some equality shuffling, and then it's trivial that h(c+f)=c+h(f) implies h(c)=c, just let f be 0.
Proof: Assume there's a minimal point with λ>1. Then the corresponding hyperplane slopes down at a rate of <−1 starting at 0 in the −c direction while h only slopes down at a rate of -1 in that same direction due to h(c)=c. So, eventually, the hyperplane crosses the graph of h, witnessing that it can't actually be a hyperplane above the graph of h and we have a contradiction. Similar arguments dispatch the existence of a minimal point with λ<1, because the hyperplane slopes up at a rate of <1 in the +c direction starting at 0, while h slopes up at a rate of 1 in that same direction due to h(c)=c, so eventually the hyperplane crosses the graph of h and we have another contradiction. So, h(c)=c implies all minimal points have λ=1.
In the other direction,
h(c+f)=EH(c+f)=inf(m,b)∈H(m(c+f)+b)
=inf(m,b)∈Hmin(m(c+f)+b)=inf(m,b)∈Hmin(c+m(f)+b)=c+inf(m,b)∈Hmin(m(f)+b)
=c+inf(m,b)∈H(m(f)+b)=c+EH(f)=c+h(f)
Finally, it's trivial that h(c+f)=c+h(f) implies h(c)=c.
Proposition 7: h(c+af)=c+ah(f) iff all minimal points in H have λ=1 and b=0.
This proof will proceed by showing that h(c+af)=c+ah(f) iff both homogenity and C-additivity hold. We know from earlier that homogenity and C-additivity are equivalent to b=0 and λ=1 respectively, so if we can show "crispness iff homogenity and C-additivity", we're done.
In one direction, h(c+af)=c+ah(f) implies both homogenity and crispness by taking c=0 and a=1 in the two respective cases. In the other direction, h(c+af)=c+h(af)=c+ah(f) by homogenity and C-additivity, and we've derived crispness, so we're done.
Proposition 8: h(f)=infx∈Cf(x) iff the set of minimal points of H corresponds to the set of probability distributions supported on C.
In one direction, if the set of minimal points entirely consists of probability distributions supported on C and only those, then
h(f)=EH(f)=inf(m,b)∈Hm(f)+b=inf(λμ,b)∈Hmin(λμ(f)+b)
=infμ∈ΔCμ(f)=infx∈Cδx(f)=infx∈Cf(x)
In the other direction, we can take any infradistribution that maps a function to its minimum value over C and use the above equalities to conclude that the infradistribution generated by all probability distributions supported over C perfectly duplicates the expectation values of the infradistribution, and so the two sets must be equivalent modulo closure, convex hull, and upper completion. The space of distributions over C is closed and convex, and so the same applies to the upper completion of it, and it equals the infradistribution set corresponding to H.
Proposition 9: All sharp infradistributions are extreme points in the space of infradistributions.
Assume we can make a sharp infradistribution H corresponding to a compact set C as a probabilistic mix of crisp infradistributions. If one of the crisp infradistributions has a minimal point (probability distribution) which isn't supported on C, then you could pick that minimal point, and mix it with other minimal points/probability distributions to make a minimal point/probability distribution in the set Hmin which has some support outside of C, which is impossible, because Hmin is just ΔC.
Therefor, all crisp infradistributions which mix to make the sharp infradistribution H induced by C must have all their probability distribution minimal points supported entirely on C.
Now, pick any point x∈C. Consider the dirac-delta distribution on x, δx. It lies in ΔC, so (δx,0)∈Hmin. This set is a mix of the other sets corresponding to crisp infradistributions, so there must be a probability distribution from each of the crisp infradistributions that mix to make δx. This can only happen if every component in the mix is δx itself.
Thus, given any x∈C, all the crisp infradistributions that mix to make the sharp infradistribution induced by C must include (δx,0) in them. Thus, all crisp infradistributions that mix to make a sharp infradistribution must contain all the dirac-deltas of points in C within them, and by closure and convexity, all the crisp infradistributions contain all of ΔC within their minimal points.
So, all these crisp infradistributions that mix to make our sharp H lack every probability distribution that isn't supported on C, and contain every probability distribution that's supported on C, so they're all equal to our sharp infradistribution H itself. Thus, H is extreme in the space of crisp infradistributions.
This argument works for any sharp infradistribution, so they're all extreme in the space of crisp infradistributions.