So, to be clear, the difference between an optimal predictor and a quasi-optimal predictor is as follows: is the amount by which some other polynomial-size circuit family is able to beat the current predictor. An optimal predictor cannot be beaten by any more than a such that for any , while a quasi-optimal predictor can only assert it cannot be beaten by any more than a such that . Yes?
Exactly. It's such a simple tweak that it's embarrassing I haven't noticed it in the first place. It's just that in average complexity theory negligible errors seem much more popular. The price to pay is that some theorems require stronger assumptions namely something that had to be bounded by a polynomial now has to be bounded by a constant. On the other hand Theorems 9 demonstrates that there is no chance optimal predictors cover all of (for example) whereas quasi-optimal predictors might work. Also, I think that in there is a universal construction for uniform quasi-optimal predictors (as opposed to optimal predictors) similar to Levin's universal search although I haven't fleshed out the details yet (anyhow such a construction would be theoretically valid but highly inefficient in practice).
In this post I define the concept of quasi-optimal predictors which is a weaker variant on the theme of optimal predictors. I explain the properties of quasi-optimal predictors that I currently understand (which are completely parallel to the properties of optimal predictors) and give an example where there is a quasi-optimal predictor but there is no optimal predictor.
All proofs are given in the appendix and are mostly analogous to proofs of corresponding theorems for optimal predictors.
Definition 1
Given (D,μ) a distributional decision problem, a quasi-optimal predictor for (D,μ) is a family of polynomial size Boolean circuits {Pk:suppμkcirc−−→[0,1]}k∈N s.t. for any family of polynomial size Boolean circuits {Qk:suppμkcirc−−→[0,1]}k∈N we have
Eμk[(Pk(x)−χD(x))2]≤Eμk[(Qk(x)−χD(x))2]+δ(k)
where limk→∞δ(k)=0.
Theorem 1
Consider (D,μ) a distributional decision problem and P a quasi-optimal predictor for (D,μ). Suppose {pk∈[0,1]}k∈N, {qk∈[0,1]}k∈N are s.t.
∃ϵ>0∀k:μk{x∈{0,1}∗∣pk≤Pk(x)≤qk}≥ϵ
Then:
limk→∞Eμk[Pk(x)−χD(x)∣pk≤Pk(x)≤qk]=0
Theorem 2
Consider μ a word ensemble and D1, D2 disjoint languages. Suppose P1 is a quasi-optimal predictor for (D1,μ) and P2 is a quasi-optimal predictor for (D2,μ). Then, P:=η(P1+P2) is a quasi-optimal predictor for (D1∪D2,μ).
Theorem 3
Consider μ a word ensemble and D1, D2 disjoint languages. Suppose P1 is a quasi-optimal predictor for (D1,μ) and P is a quasi-optimal predictor for (D1∪D2,μ). Then, P2:=η(P−P1) is a quasi-optimal predictor for (D2,μ).
Theorem 4
Consider (D1,μ1), (D2,μ2) distributional decision problems with respective quasi-optimal predictors P1 and P2. Define {Pk:suppμkcirc−−→[0,1]}k∈N as the family of circuits computing Pk((x1,x2)):=Pk1(x1)Pk2(x2). Then, P is a quasi-optimal predictor for (D1×D2,μ1×μ2).
Theorem 5
Consider C,D⊆{0,1}∗ and μ a word ensemble. Assume PD is a quasi-optimal predictor for (D,μ) and PC∣D is a quasi-optimal predictor for (C,μ∣D). Then PDPC∣D is a quasi-optimal predictor for (C∩D,μ).
Theorem 6
Consider C,D⊆{0,1}∗ and μ a word ensemble. Assume ∃ϵ>0∀k:μk(D)≥ϵ. Assume PD is a quasi-optimal predictor for (D,μ) and PC∩D is a quasi-optimal predictor for (C∩D,μ). Define PC∣D as the circuit family computing
PkC∣D(x):=⎧⎪⎨⎪⎩1if PkD(x)=0η(PkC∩D(x)PkD(x))rounded to k binary places if PkD(x)>0
Then, PC∣D is a quasi-optimal predictor for (C,μ∣D).
Definition 2
Consider μ a word ensemble and {Qk1,2:suppμkcirc−−→[0,1]}k∈N two circuit families. We say Q1 is quasisimilar to Q2 relative to μ (denoted Q1μ≈Q2) when limk→∞Eμk[(Qk1(x)−Qk2(x))2]=0.
Theorem 7
Consider (D,μ) a distributional decision problem, P a quasi-optimal predictor for (D,μ) and {Qk:suppμkcirc−−→[0,1]}k∈N a polynomial size family. Then, Q is a quasi-optimal predictor for (D,μ) if and only if Pμ≈Q.
Definition 3
Consider (C,μ), (D,ν) distributional decision problems, {fk:suppμkcirc−−→{0,1}∗}k∈N a polynomial size family of circuits. f is called a (non-uniform) strong pseudo-invertible reduction of C to D when there is a polynomial p:N→N s.t. the following conditions hold:
(i) ∀k∈N,x∈suppμk:χD(fk(x))=χC(x)
(ii) There is M∈R s.t.
∀k∈N,y∈{0,1}∗:μk((fk)−1(y))νp(k)(y)≤M
(iii) There is a polynomial q:N→N and a family of polynomial size circuits {gk:suppνp(k)×{0,1}q(k)circ−−→{0,1}∗}k∈N s.t.
∀y∈fk(suppμk),x∗∈{0,1}∗:PrUq(k)[gk(y,r)=x∗]=Prμk[x=x∗|fk(x)=y]
(iv) There are polynomial size circuits {Rk:suppνp(k)circ−−→Q≥0}k∈N s.t.
∀k∈N,y∈suppνp(k):Rk(y)=μk((fk)−1(y))νp(k)(y)
Theorem 8
Consider (C,μ), (D,ν) distributional decision problems, f a strong pseudo-invertible reduction of (C,μ) to (D,ν) and PD a quasi-optimal predictor for (D,ν). Define {PkC:suppμkcirc−−→[0,1]}k∈N as the family of circuits computing PkC(x):=Pp(k)D(fk(x)). Then, PC is a quasi-optimal predictor for (C,μ).
Theorem 9
Consider f:{0,1}∗→{0,1}∗ a one-to-one non-uniformly hard one-way function. Define ~μkf:=1k∑i<kμif. Then, Pf is a quasi-optimal predictor for (Df,~μf).
Appendix
Lemma 1
Consider (D,μ) a distributional decision problem and {Pk:suppμkcirc−−→[0,1]}k∈N a family of polynomial size. Then, P is a quasi-optimal predictor if and only if there is a function δ:N×N→[0,1] s.t.
(i) δ is non-decreasing in the second argument.
(ii) For any polynomial p:N→N:
limk→∞δ(k,p(k))=0
In the following, we will call functions satisfying conditions (i) and (ii) quasinegligible.
(iii) for any Q:suppμkcirc−−→[0,1] we have
Eμk[(Pk(x)−χD(x))2]≤Eμk[(Q(x)−χD(x))2]+δ(k,|Q|)
Proof of Lemma 1
Define
δ(k,q):=max|Q|≤q{Eμk[(Pk(x)−χD(x))2]−Eμk[(Q(x)−χD(x))2]}
Lemma 2
Consider (D,μ) a distributional decision problem and P a corresponding quasi-optimal predictor. Then, there is a function δ:N×N×N→[0,1] s.t.
(i) δ is non-decreasing in the second and third arguments.
(ii) For all polynomials p,q:N→N:
limk→∞δ(k,p(k),q(k))=0
(iii) for all k∈N, Q:suppμkcirc−−→[0,1] and w:suppμkcirc−−→Q≥0 we have
Eμk[w(x)(Pk(x)−χD(x))2]≤Eμk[w(x)(Q(x)−χD(x))2]+(maxw)δ(k,|Q|,|w|)
Proof of Lemma 2
Given t∈[0,maxw], denote
α(t):=min{s≥t∣∃x∈suppμk:w(x)=s}
Consider circuit Qt:suppμkcirc−−→[0,1] computing the following function:
Qt(x):={Q(x)if w(x)≥α(t)Pk(x)if w(x)<α(t)
There is a polynomial q s.t. |Qt|≤q(k,|Q|,|w|). By Lemma 1,
Eμk[(Pk(x)−χD(x))2]≤Eμk[(Qt(x)−χD(x))2]+δ(k,q(k,|Q|,|w|))
for δ quasinegligible.
Eμk[(Pk(x)−χD(x))2−(Qt(x)−χD(x))2]≤δ(k,q(k,|Q|,|w|))
Eμk[θ(w(x)−t)(Pk(x)−χD(x))2−(Q(x)−χD(x))2]≤δ(k,q(k,|Q|,|w|))
Integrating the inequality with respect to t from 0 to maxw, we get
Eμk[∫maxw0θ(w(x)−t)dt((Pk(x)−χD(x))2−(Q(x)−χD(x))2]≤(maxw)δ(k,q(k,|Q|,|w|))
Eμk[w(x)(Pk(x)−χD(x))2−(Q(x)−χD(x))2]≤(maxw)δ(k,q(k,|Q|,|w|))
Proof of Theorem 1
Define
ϕk:=Eμk[χD(x)−Pk(x)∣pk≤Pk(x)≤qk]
Assume to the contrary that there is ϵ>0 and an infinite set I⊆N s.t.
∀k∈I:|ϕk|≥ϵ
Define {wk:suppμkcirc−−→{0,1}}k∈N as the circuits computing
wk(x):=θ(Pk(x)−pk)θ(qk−Pk(x))
|wk| is bounded by a polynomial since Pk produces binary fractions of polynomial size therefore it is possible to compare them to the fixed numbers pk,qk using a polynomial size circuit even if the latter have infinite binary expansions.
We have
ϕk=Eμk[wk(x)(χD(x)−Pk(x))]Eμk[wk(x)]
Define ψk to be ϕk truncated to the first significant binary digit. Define {Qk:suppμkcirc−−→[0,1]}k∈N as the circuits computing
Qk(x):=η(Pk(x)+ψk)
By the assumption, ψk has binary notation of bounded size, therefore |Qk| is bounded by a polynomial.
Applying Lemma 2 we get
∀k∈I:Eμk[wk(x)(Pk(x)−χD(x))2]≤Eμk[wk(x)(Qk(x)−χD(x))2]+δ(k)
for δ vanishing at infinity.
∀k∈I:Eμk[wk(x)((Pk(x)−χD(x))2−(Qk(x)−χD(x))2)]≤δ(k)
∀k∈I:Eμk[wk(x)((Pk(x)−χD(x))2−(η(Pk(x)+ψk)−χD(x))2)]≤δ(k)
Obviously (η(Pk(x)+ψk)−χD(x))2≤(Pk(x)+ψk−χD(x))2, therefore
∀k∈I:Eμk[wk(x)((Pk(x)−χD(x))2−(Pk(x)+ψk−χD(x))2)]≤δ(k)
∀k∈I:ψkEμk[wk(x)(2(χD(x)−Pk(x))−ψk)]≤δ(k)
The expression on the left hand side is a quadratic polynomial in ψk which attains its maximum at ϕk and has roots at 0 and 2ϕk. ψk is between 0 and ϕk, but not closer to 0 than ϕk2. Therefore, the inequality is preserved if we replace ψk by ϕk2.
∀k∈I:ϕk2Eμk[wk(x)(2(χD(x)−Pk(x))−ϕk2)]≤δ(k)
Substituting the equation for ϕk we get
∀k∈I:12Eμk[wk(x)(χD(x)−Pk(x))]Eμk[wk(x)]Eμk[wk(x)(2(χD(x)−Pk(x))−12Eμk[wk(x)(χD(x)−Pk(x))]Eμk[wk(x)])]≤δ(k)
∀k∈I:34Eμk[wk(x)(χD(x)−Pk(x))]2Eμk[wk(x)]≤δ(k)
∀k∈I:34Eμk[wk(x)]ϕ2k≤δ(k)
∀k∈I:ϕ2k≤43Eμk[wk(x)]−1δ(k)
∀k∈I:ϕ2k≤43μk{x∈{0,1}∗∣pk≤Pk(x)≤qk}−1δ(k)
Thus ϕk vanishes at infinity on I, which is a contradiction.
Lemma 3
Consider (D,μ) a distributional decision problem. If {Pk:suppμkcirc−−→[0,1]}k∈N is a quasi-optimal predictor for (D,μ) then there are c1,c2∈R and a quasinegligible function δ∗ s.t. for any Q:suppμkcirc−−→Q we have
|Eμk[Q(x)(Pk(x)−χD(x))]|≤(c1+c2Eμk[Q(x)2])δ∗(k,|Q|)
Conversely, suppose M∈Q and {Pk:suppμkcirc−−→Q∩[−M,+M]}k∈N is a polynomial size family for which there is a quasinegligible function δ∗ s.t. for any Q:suppμkcirc−−→Q∩[−M−1,+M]}k∈N we have
|Eμk[Q(x)(Pk(x)−χD(x))]|≤δ∗(k,|Q|)
Define {~Pk:suppμkcirc−−→[0,1]}k∈N to be s.t. computing ~Pk(x) is equivalent to computing η(Pk(x)) rounded to k digits after the binary point. Then, ~P is a quasi-optimal predictor.
Proof of Lemma 3
Assume P is an optimal predictor. Consider Q:suppμkcirc−−→Q and t=σ2−a where σ∈{±1} and a∈N. The function η(Pk(x)+tQ(x)) can be approximated by a circuit of size p(k,|Q|) for some fixed polynomial p, within rounding error ϵk(x) s.t. ∀x∈suppμk:|ϵk(x)|≤2−k. By Lemma 1,
Eμk[(Pk(x)−χD(x))2]≤Eμk[(η(Pk(x)+tQ(x))+ϵk(x)−χD(x))2]+δ(k,|Q|)
where δ is quasinegligible. ϵ is bounded by a negligible function and therefore can be ignored by redefining δ. As in the proof of Theorem 1, η can be dropped.
Eμk[(Pk(x)−χD(x))2−(Pk(x)+tQ(x)−χD(x))2]≤δ(k,|Q|)
The expression on the left hand side is a quadratic polynomial in t. Explicitly:
−Eμk[Q(x)2]t2−2Eμk[Q(x)(Pk(x)−χD(x))]t≤δ(k,|Q|)
Moving Eμk[Q(x)2]t2 to the right hand side and dividing both sides by 2|t|=21−a we get
−Eμk[Q(x)(Pk(x)−χD(x))]σ≤2a−1δ(k,|Q|)+Eμk[Q(x)2]2−a−1
|Eμk[Q(x)(Pk(x)−χD(x))]|≤2a−1δ(k,|Q|)+Eμk[Q(x)2]2−a−1
Take a:=−12logδ(k,|Q|)+ϕ(k) where ϕ(k)∈[−12,+12] is the rounding error. We get
|Eμk[Q(x)(Pk(x)−χD(x))]|≤2ϕ(k)−1δ(k,|Q|)12+Eμk[Q(x)2]2−ϕ(k)−1δ(k,|Q|)12
Conversely, assume that for any R:suppμkcirc−−→Q∩[−M−1,+M]
|Eμk[R(x)(Pk(x)−χD(x))]|≤δ∗(k,|R|)
Consider Q:suppμkcirc−−→[0,1]. We have
Eμk[(Q(x)−χD(x))2]=Eμk[(Q(x)−Pk(x)+Pk(x)−χD(x))2]
Eμk[(Q(x)−χD(x))2]=Eμk[(Q(x)−Pk(x))2]+Eμk[(Pk(x)−χD(x))2]+2Eμk[(Q(x)−Pk(x))(Pk(x)−χD(x)]
2Eμk[(Pk(x)−Q(x))(Pk(x)−χD(x)]=Eμk[(Pk(x)−χD(x))2]−Eμk[(Q(x)−χD(x))2]+Eμk[(Q(x)−Pk(x))2]
Pk(x)−Q(x) can be computed by a circuit R of size polynomial in |Q| and k. Applying the assumption we get
Eμk[(Pk(x)−χD(x))2]−Eμk[(Q(x)−χD(x))2]+Eμk[(Q(x)−Pk(x))2]≤~δ(k,|Q|)
where ~δ is quasinegligible. Noting that Eμk[(Q(x)−Pk(x))2]≥0 and (η(Pk(x))−χD(x))2≤(Pk(x)−χD(x))2 we get
Eμk[(η(Pk(x))−χD(x))2]−Eμk[(Q(x)−χD(x))2]≤~δ(k,|Q|)
Observing that ~P−η(P) is bounded by a negligible function, we get the desired result.
Proof of Theorem 2
Consider Q:suppμkcirc−−→Q. We have
Eμk[Q(x)(Pk1(x)+Pk2(x)−χD1∪D2(x))]=Eμk[Q(x)(Pk1(x)−χD1(x))]+Eμk[Q(x)(Pk2(x)−χD2(x))]
Using Lemma 3:
|Eμk[Q(x)(Pk1(x)−χD1(x))]|≤(c11+c12Eμk[Q(x)2])δ1(k,|Q|)
|Eμk[Q(x)(Pk2(x)−χD2(x))]|≤(c21+c22Eμk[Q(x)2])δ2(k,|Q|)
Therefore
|Eμk[Q(x)(Pk1(x)+Pk2(x)−χD1∪D2(x))]|≤(c11+c21+(c12+c22)Eμk[Q(x)2])(δ1(k,|Q|)+δ2(k,|Q|))
Using Lemma 3 again we get the desired result.
Proof of Theorem 4
We have
Pk((x1,x2))−χD1×D2((x1,x2))=(Pk1(x1)−χD1(x1))χD2(x2)+Pk1(x1)(Pk2(x2)−χD2(x2))
Therefore, for any Q:supp(μ1×μ2)kcirc−−→Q∩[−1,+1]
|E(μ1×μ2)k[Q(x)(Pk(x)−χD1×D2(x))]|≤|Eμk1×μk2[Q((x1,x2))(Pk1(x1)−χD1(x1))χD2(x2)]|+|Eμk1×μk2[Q((x1,x2))Pk1(x1)(Pk2(x2)−χD2(x2))]|
By Lemma 3, it is sufficient to show an appropriate bound for each of the terms on the right hand side. For the first term, we have
|Eμk1×μk2[Q((x1,x2))(Pk1(x1)−χD1(x1))χD2(x2)]|≤Eμk2[|Eμk1[χD2(x2)Q((x1,x2))(Pk1(x1)−χD1(x1))]|]
For any given x2, χD2(x2)Q((x1,x2)) can be computed by a circuit with input x1 of size polynomial in |x2| and |Q|. Applying Lemma 3 to P1, we get
|Eμk1×μk2[Q((x1,x2))(Pk1(x1)−χD1(x1))χD2(x2)]|≤Eμk2[δ1(k,p1(|x2|,|Q|))]
where p1 is a polynomial and δ1 is quasinegligible. Since |x2| is bounded by a polynomial in k for x2∈suppμk2, we get the bound we need.
For the second term, we have
|Eμk1×μk2[Q((x1,x2))Pk1(x1)(Pk2(x2)−χD2(x2))]|≤Eμk1[|Eμk2[Q((x1,x2))Pk1(x1)(Pk2(x2)−χD2(x2))]|]
For any given x1, Q((x1,x2))Pk1(x1) can be computed by a circuit with input x1 of size polynomial in k, |x1| and |Q|. Applying Lemma 3 to P2, we get
|Eμk1×μk2[Q((x1,x2))Pk1(x1)(Pk2(x2)−χD2(x2))]|≤Eμk1[δ2(k,p2(k,|x1|,|Q|))]
Again, we got the required bound.
Proof of Theorem 7
Assume Q is a quasi-optimal predictor. Applying Lemma 3 to predictor P and circuits computing Pk−Qk, we get
|Eμk[(Pk(x)−Qk(x))(Pk(x)−χD(x))]|≤δ(k)
for some δ vanishing at infinity. Applying Lemma 3 to predictor Q and circuits computing Pk−Qk, we get
|Eμk[(Pk(x)−Qk(x))(Qk(x)−χD(x))]|≤ϵ(k)
for some ϵ vanishing at infinity. We have
Eμk[(Pk(x)−Qk(x))2]=Eμk[(Pk(x)−Qk(x))(Pk(x)−χD(x))]−Eμk[(Pk(x)−Qk(x))(Qk(x)−χD(x))]
Eμk[(Pk(x)−Qk(x))2]≤|Eμk[(Pk(x)−Qk(x))(Pk(x)−χD(x))]|+|Eμk[(Pk(x)−Qk(x))(Qk(x)−χD(x))]|
Eμk[(Pk(x)−Qk(x))2]≤δ(k)+ϵ(k)
Conversely, assume Pμ≈Q. Consider some R:suppμkcirc−−→[0,1]. We have
Eμk[R(x)(Qk(x)−χD(x))]=Eμk[R(x)(Qk(x)−Pk(x)+Pk(x)−χD(x))]
Eμk[R(x)(Qk(x)−χD(x))]=Eμk[R(x)(Qk(x)−Pk(x))]+Eμk[R(x)(Pk(x)−χD(x))]
|Eμk[R(x)(Qk(x)−Pk(x))]|≤Eμk[|Qk(x)−Pk(x)|]≤√Eμk[(Qk(x)−Pk(x))2]≤δ(k)
for some δ vanishing at infinity, since Pμ≈Q.
|Eμk[R(x)(Pk(x)−χD(x))]|≤δ∗(k,|R|)
for some quasinegligible δ∗, using Lemma 3. Combining both inequalities we get
|Eμk[R(x)(Qk(x)−χD(x))]|≤δ(k)+δ∗(k,|R|)
Using Lemma 3 again we conclude Q is a quasi-optimal predictor.
Lemma 4
Consider C⊆D⊆{0,1}∗ and μ a word ensemble. Assume PC is a quasi-optimal predictor for (C,μ) and PD is a quasi-optimal predictor for (D,μ). Define
ϵk(x):=θ(PkC(x)−PkD(x))(PkC(x)−PkD(x))
Then, limk→∞Eμk[ϵk(x)]=0.
Proof of Lemma 4
By Theorem 3 and Lemma 3 there is a quasinegligible function δ such that for any Q:suppμkcirc−−→Q∩[−2,+1] we have
|Eμk[Q(x)(PkD(x)−PkC(x)−χD∖C(x))]|≤δ(k,|Q|)
Take Q to be the circuit computing θ(PkC(x)−PkD(x)). Its size is polynomial in k therefore
|Eμk[θ(PkC(x)−PkD(x))(PkD(x)−PkC(x)−χD∖C(x))]|≤δ∗(k)
where δ∗ vanishes at infinity.
|Eμk[ϵk(x)]+Eμk[θ(PkC(x)−PkD(x))χD∖C(x)]|≤δ∗(k)
Since both terms inside the absolute value are non-negative we get the desired result.
Proof of Theorem 6
When PkD(x)>0 we have
PkC∣D(x)=min(PkC∩D(x),PkD(x))PkD(x)
Define ~PkC∩D to be the circuit computing min(PkC∩D(x),PkD(x)). Since C∩D⊆D, Lemma 4 implies that limk→∞Eμk[PkC∩D(x)−~PkC∩D(x)]=0. This implies limk→∞Eμk[(PkC∩D(x)−~PkC∩D(x))2]=0 and by Theorem 7 ~PC∩D is a quasi-optimal predictor for (C∩D,μ).
We have ~PkC∩D(x)=PkC∣D(x)PkD(x) (whether PkD(x)>0 or PkD(x)=0) and therefore
~PkC∩D(x)−χC∩D(x)=(PkC∣D(x)−χC(x))χD(x)+PkC∣D(x)(PkD(x)−χD(x))
(PkC∣D(x)−χC(x))χD(x)=~PkC∩D(x)−χC∩D(x)−PkC∣D(x)(PkD(x)−χD(x))
Consider Q:suppμkcirc−−→Q∩[−1,+1].
Eμk∣D[Q(x)(PkC∣D(x)−χC(x))]=μk(D)−1Eμk[Q(x)(PkC∣D(x)−χC(x))χD(x)]
By Lemma 3 it is sufficient to prove appropriate bounds on |Eμk[Q(x)(~PkC∩D(x)−χC∩D(x))]| and |Eμk[Q(x)PkC∣D(x)(PkD(x)−χD(x))]|. Both bounds follow from Lemma 3 using the facts ~PC∩D and PD are quasi-optimal predictors and |PkC∣D| is bounded by a polynomial.
Proof of Theorem 8
Consider k∈N, QC:suppμkcirc−−→[0,1]. Define QD:suppνp(k)×{0,1}q(k)circ−−→[0,1] to be the circuit computing QD(y,r):=QC(gk(y,r)). Applying Lemma 2, treating r as a constant and using R as the weight circuit, we get
Eνp(k)[Rk(y)(Pp(k)D(y)−χD(y))2]≤Eνp(k)[Rk(y)(QD(y,r)−χD(y))2]+δ(k,|QC|)
where δ is quasinegligible. We used condition (ii) to get a constant bound on maxRk and condition (iv) to get a polynomial bound on |Rk|.
We take the expectation value of both sides with respect to the uniform measure over r:
Eνp(k)[Rk(y)(Pp(k)D(y)−χD(y))2]≤Eνp(k)×Uq(k)[Rk(y)(QD(y,r)−χD(y))2]+δ(k,|QC|)
The left hand side can be rewritten as follows
Eνp(k)[Rk(y)(Pp(k)D(y)−χD(y))2]=∑y∈{0,1}∗νp(k)(y)μk((fk)−1(y))νp(k)(y)(Pp(k)D(y)−χD(y))2
Eνp(k)[Rk(y)(Pp(k)D(y)−χD(y))2]=∑y∈{0,1}∗μk((fk)−1(y))(Pp(k)D(y)−χD(y))2
Eνp(k)[Rk(y)(Pp(k)D(y)−χD(y))2]=∑y∈{0,1}∗∑x∈suppμkfk(x)=yμk(x)(Pp(k)D(y)−χD(y))2
Grouping the sum by x, we get
Eνp(k)[Rk(y)(Pp(k)D(y)−χD(y))2]=∑x∈suppμkμk(x)(PkC(x)−χC(x))2
Eνp(k)[Rk(y)(Pp(k)D(y)−χD(y))2]=Eμk[(PkC(x)−χC(x))2]
The first term on the right hand side can be rewritten as
Eνp(k)×Uq(k)[Rk(y)(QD(y,r)−χD(y))2]=∑y∈{0,1}∗∑r∈{0,1}q(k)2−q(k)μk((fk)−1(y))(QD(y,r)−χD(y))2
Grouping the sum by x:=g(y,r) we get:
Eνp(k)×Uq(k)[Rk(y)(QD(y,r)−χD(y))2]=∑x∈{0,1}∗∑y∈{0,1}∗∑r∈{0,1}q(k)gk(y,r)=x2−q(k)μk((fk)−1(y))(QC(x)−χC(x))2
Condition (iii) tells us that ∑r∈{0,1}q(k)gk(y,r)=x2−q(k) is only non-vanishing when y=fk(x) and that in this case it equals μk(x)μk((fk)−1(y)). Therefore
Eνp(k)×Uq(k)[Rk(y)(QD(y,r)−χD(y))2]=∑x∈{0,1}∗μk(x)(QC(x)−χC(x))2
Eνp(k)×Uq(k)[Rk(y)(QD(y,r)−χD(y))2]=Eμk[(QC(x)−χC(x))2]
Putting everything together, we get
Eμk[(PkC(x)−χC(x))2]≤Eμk[(QC(x)−χC(x))2]+δ(k,|QC|)
Proof of Theorem 9
Assume to the contrary that Pf is not quasi-optimal. Then there is an infinite set I⊆N, a polynomial size family of circuits {Qk:supp~μkfcirc−−→[0,1]}k∈I and ϵ>0 s.t.
∀k∈I:E~μkf[(Pkf(x)−χDf(x)2)]≥E~μkf[(Qk(x)−χDf(x))2]+ϵ
∀k∈I:E~μkf[(Qk(x)−χDf(x))2]≤14−ϵ
Define the functions {qk:supp~μkf×[0,1]→{0,1}}k∈I by qk(x,t):=θ(Qk(x)−t). We have
∀k∈I,x∈supp~μkf:Qk(x)=∫10qk(x,t)dt
Substituting into the inequality above
∀k∈I:E~μkf[(∫10qk(x,t)dt−χDf(x))2]≤14−ϵ
∀k∈I:E~μkf[|∫10qk(x,t)dt−χDf(x)|]2≤14−ϵ
∀k∈I:E~μkf[|∫10(qk(x,t)−χDf(x))dt|]≤√14−ϵ
For every given x, qk(x,t)−χDf(x) is either non-negative for all t or non-positive for t. Hence we can move the absolute value inside the integral:
∀k∈I:E~μkf[∫10|qk(x,t)−χDf(x)|dt]≤√14−ϵ
∀k∈I:∫10E~μkf[|qk(x,t)−χDf(x)|]dt≤√14−ϵ
This implies that we can choose {tk∈Qk(supp~μkf)∪{0,1}}k∈I s.t.
∀k∈I:E~μkf[|qk(x,tk)−χDf(x)|]≤√14−ϵ
∀k∈I:Pr~μkf[qk(x,tk)≠χDf(x)]≤√14−ϵ
∀k∈I:Pr~μkf[qk(x,tk)=χDf(x)]≥1−√14−ϵ
Using the fact that the graph of the square root lies below its tangent at any point, this leads to
∀k∈I:Pr~μkf[qk(x,tk)=χDf(x)]≥12+ϵ
Define {gk:f({0,1}k)×{0,1}kcirc−−→{0,1}}k∈N as the circuits computing gk(y,r):=1−qk((y,r),tk). The definitions of qk and tk imply that |gk| is bounded by a polynomial. The inequality above and the definitions of Df and ~μf imply
∀k∈I:1k∑i<kPrUi×Ui[gi(f(x),r)=x⋅r]≥12+ϵ
But this contradicts the assumption on f.
Note that this argument doesn't show Pf is optimal since while the averaging over i preserves the property of vanishing at infinity, it doesn't preserve the property of negligibility. Moreover, it is possible to show that no optimal predictor for (Df,~μf) exists.