Quasi-optimal predictors

Vanessa Kosoy

In this post I define the concept of quasi-optimal predictors which is a weaker variant on the theme of optimal predictors. I explain the properties of quasi-optimal predictors that I currently understand (which are completely parallel to the properties of optimal predictors) and give an example where there is a quasi-optimal predictor but there is no optimal predictor.

All proofs are given in the appendix and are mostly analogous to proofs of corresponding theorems for optimal predictors.

Definition 1

Given $(D, μ)$ a distributional decision problem, a quasi-optimal predictor for $(D, μ)$ is a family of polynomial size Boolean circuits ${P^{k} : supp μ^{k} c i r c - - \to [0, 1]}_{k \in N}$ s.t. for any family of polynomial size Boolean circuits ${Q^{k} : supp μ^{k} c i r c - - \to [0, 1]}_{k \in N}$ we have

$E_{μ^{k}} [(P^{k} (x) - χ_{D} (x))^{2}] \leq E_{μ^{k}} [(Q^{k} (x) - χ_{D} (x))^{2}] + δ (k)$

where ${lim}_{k \to \infty} δ (k) = 0$ .

Theorem 1

Consider $(D, μ)$ a distributional decision problem and $P$ a quasi-optimal predictor for $(D, μ)$ . Suppose ${p_{k} \in [0, 1]}_{k \in N}$ , ${q_{k} \in [0, 1]}_{k \in N}$ are s.t.

$\exists ϵ > 0 \forall k : μ^{k} {x \in {0, 1}^{*} ∣ p_{k} \leq P^{k} (x) \leq q_{k}} \geq ϵ$

Then:

$lim k \to \infty E_{μ^{k}} [P^{k} (x) - χ_{D} (x) ∣ p_{k} \leq P^{k} (x) \leq q_{k}] = 0$

Theorem 2

Consider $μ$ a word ensemble and $D_{1}$ , $D_{2}$ disjoint languages. Suppose $P_{1}$ is a quasi-optimal predictor for $(D_{1}, μ)$ and $P_{2}$ is a quasi-optimal predictor for $(D_{2}, μ)$ . Then, $P := η (P_{1} + P_{2})$ is a quasi-optimal predictor for $(D_{1} \cup D_{2}, μ)$ .

Theorem 3

Consider $μ$ a word ensemble and $D_{1}$ , $D_{2}$ disjoint languages. Suppose $P_{1}$ is a quasi-optimal predictor for $(D_{1}, μ)$ and $P$ is a quasi-optimal predictor for $(D_{1} \cup D_{2}, μ)$ . Then, $P_{2} := η (P - P_{1})$ is a quasi-optimal predictor for $(D_{2}, μ)$ .

Theorem 4

Consider $(D_{1}, μ_{1})$ , $(D_{2}, μ_{2})$ distributional decision problems with respective quasi-optimal predictors $P_{1}$ and $P_{2}$ . Define ${P^{k} : supp μ^{k} c i r c - - \to [0, 1]}_{k \in N}$ as the family of circuits computing $P^{k} ((x_{1}, x_{2})) := P_{1}^{k} (x_{1}) P_{2}^{k} (x_{2})$ . Then, $P$ is a quasi-optimal predictor for $(D_{1} \times D_{2}, μ_{1} \times μ_{2})$ .

Theorem 5

Consider $C, D \subseteq {0, 1}^{*}$ and $μ$ a word ensemble. Assume $P_{D}$ is a quasi-optimal predictor for $(D, μ)$ and $P_{C ∣ D}$ is a quasi-optimal predictor for $(C, μ ∣ D)$ . Then $P_{D} P_{C ∣ D}$ is a quasi-optimal predictor for $(C \cap D, μ) .$

Theorem 6

Consider $C, D \subseteq {0, 1}^{*}$ and $μ$ a word ensemble. Assume $\exists ϵ > 0 \forall k : μ^{k} (D) \geq ϵ$ . Assume $P_{D}$ is a quasi-optimal predictor for $(D, μ)$ and $P_{C \cap D}$ is a quasi-optimal predictor for $(C \cap D, μ)$ . Define $P_{C ∣ D}$ as the circuit family computing

$P_{C ∣ D}^{k} (x) := ⎧ ⎪ ⎨ ⎪ ⎩ \begin{matrix} 1 & if P_{D}^{k} (x) = 0 η (\frac{P_{C \cap D}^{k} (x)}{P_{D}^{k} (x)}) & rounded to k binary places if P_{D}^{k} (x) > 0 \end{matrix}$

Then, $P_{C ∣ D}$ is a quasi-optimal predictor for $(C, μ ∣ D)$ .

Definition 2

Consider $μ$ a word ensemble and ${Q_{1, 2}^{k} : supp μ^{k} c i r c - - \to [0, 1]}_{k \in N}$ two circuit families. We say $Q_{1}$ is quasisimilar to $Q_{2}$ relative to $μ$ (denoted $Q_{1} μ \approx Q_{2}$ ) when ${lim}_{k \to \infty} E_{μ^{k}} [(Q_{1}^{k} (x) - Q_{2}^{k} (x))^{2}] = 0$ .

Theorem 7

Consider $(D, μ)$ a distributional decision problem, $P$ a quasi-optimal predictor for $(D, μ)$ and ${Q^{k} : supp μ^{k} c i r c - - \to [0, 1]}_{k \in N}$ a polynomial size family. Then, $Q$ is a quasi-optimal predictor for $(D, μ)$ if and only if $P μ \approx Q$ .

Definition 3

Consider $(C, μ)$ , $(D, ν)$ distributional decision problems, ${f^{k} : supp μ^{k} c i r c - - \to {0, 1}^{*}}_{k \in N}$ a polynomial size family of circuits. $f$ is called a (non-uniform) strong pseudo-invertible reduction of $C$ to $D$ when there is a polynomial $p : N \to N$ s.t. the following conditions hold:

(i) $\forall k \in N, x \in supp μ^{k} : χ_{D} (f^{k} (x)) = χ_{C} (x)$

(ii) There is $M \in R$ s.t.

$\forall k \in N, y \in {0, 1}^{*} : \frac{μ^{k} ((f^{k})^{- 1} (y))}{ν^{p (k)} (y)} \leq M$

(iii) There is a polynomial $q : N \to N$ and a family of polynomial size circuits ${g^{k} : supp ν^{p (k)} \times {0, 1}^{q (k)} c i r c - - \to {0, 1}^{*}}_{k \in N}$ s.t.

$\forall y \in f^{k} (supp μ^{k}), x^{*} \in {0, 1}^{*} : P r_{U^{q (k)}} [g^{k} (y, r) = x^{*}] = P r_{μ^{k}} [x = x^{*} | f^{k} (x) = y]$

(iv) There are polynomial size circuits ${R^{k} : supp ν^{p (k)} c i r c - - \to Q^{\geq 0}}_{k \in N}$ s.t.

$\forall k \in N, y \in supp ν^{p (k)} : R^{k} (y) = \frac{μ^{k} ((f^{k})^{- 1} (y))}{ν^{p (k)} (y)}$

Theorem 8

Consider $(C, μ)$ , $(D, ν)$ distributional decision problems, $f$ a strong pseudo-invertible reduction of $(C, μ)$ to $(D, ν)$ and $P_{D}$ a quasi-optimal predictor for $(D, ν)$ . Define ${P_{C}^{k} : supp μ^{k} c i r c - - \to [0, 1]}_{k \in N}$ as the family of circuits computing $P_{C}^{k} (x) := P_{D}^{p (k)} (f^{k} (x))$ . Then, $P_{C}$ is a quasi-optimal predictor for $(C, μ)$ .

Theorem 9

Consider $f : {0, 1}^{*} \to {0, 1}^{*}$ a one-to-one non-uniformly hard one-way function. Define ${~ μ}_{f}^{k} := \frac{1}{k} \sum_{i < k} μ_{f}^{i}$ . Then, $P_{f}$ is a quasi-optimal predictor for $(D_{f}, {~ μ}_{f})$ .

Appendix

Lemma 1

Consider $(D, μ)$ a distributional decision problem and ${P^{k} : supp μ^{k} c i r c - - \to [0, 1]}_{k \in N}$ a family of polynomial size. Then, $P$ is a quasi-optimal predictor if and only if there is a function $δ : N \times N \to [0, 1]$ s.t.

(i) $δ$ is non-decreasing in the second argument.

(ii) For any polynomial $p : N \to N$ :

$lim k \to \infty δ (k, p (k)) = 0$

In the following, we will call functions satisfying conditions (i) and (ii) quasinegligible.

(iii) for any $Q : supp μ^{k} c i r c - - \to [0, 1]$ we have

$E_{μ^{k}} [(P^{k} (x) - χ_{D} (x))^{2}] \leq E_{μ^{k}} [(Q (x) - χ_{D} (x))^{2}] + δ (k, | Q |)$

Proof of Lemma 1

Define

$δ (k, q) := max | Q | \leq q {E_{μ^{k}} [(P^{k} (x) - χ_{D} (x))^{2}] - E_{μ^{k}} [(Q (x) - χ_{D} (x))^{2}]}$

Lemma 2

Consider $(D, μ)$ a distributional decision problem and $P$ a corresponding quasi-optimal predictor. Then, there is a function $δ : N \times N \times N \to [0, 1]$ s.t.

(i) $δ$ is non-decreasing in the second and third arguments.

(ii) For all polynomials $p, q : N \to N$ :

$lim k \to \infty δ (k, p (k), q (k)) = 0$

(iii) for all $k \in N$ , $Q : supp μ^{k} c i r c - - \to [0, 1]$ and $w : supp μ^{k} c i r c - - \to Q^{\geq 0}$ we have

$E_{μ^{k}} [w (x) (P^{k} (x) - χ_{D} (x))^{2}] \leq E_{μ^{k}} [w (x) (Q (x) - χ_{D} (x))^{2}] + (max w) δ (k, | Q |, | w |)$

Proof of Lemma 2

Given $t \in [0, max w]$ , denote

$α (t) := m i n {s \geq t ∣ \exists x \in supp μ^{k} : w (x) = s}$

Consider circuit $Q_{t} : supp μ^{k} c i r c - - \to [0, 1]$ computing the following function:

$Q_{t} (x) := {\begin{matrix} Q (x) & if w (x) \geq α (t) P^{k} (x) & if w (x) < α (t) \end{matrix}$

There is a polynomial $q$ s.t. $| Q_{t} | \leq q (k, | Q |, | w |)$ . By Lemma 1,

$E_{μ^{k}} [(P^{k} (x) - χ_{D} (x))^{2}] \leq E_{μ^{k}} [(Q_{t} (x) - χ_{D} (x))^{2}] + δ (k, q (k, | Q |, | w |))$

for $δ$ quasinegligible.

$E_{μ^{k}} [(P^{k} (x) - χ_{D} (x))^{2} - (Q_{t} (x) - χ_{D} (x))^{2}] \leq δ (k, q (k, | Q |, | w |))$

$E_{μ^{k}} [θ (w (x) - t) (P^{k} (x) - χ_{D} (x))^{2} - (Q (x) - χ_{D} (x))^{2}] \leq δ (k, q (k, | Q |, | w |))$

Integrating the inequality with respect to $t$ from $0$ to $max w$ , we get

$E_{μ^{k}} [\int_{0}^{max w} θ (w (x) - t) d t ((P^{k} (x) - χ_{D} (x))^{2} - (Q (x) - χ_{D} (x))^{2}] \leq (max w) δ (k, q (k, | Q |, | w |))$

$E_{μ^{k}} [w (x) (P^{k} (x) - χ_{D} (x))^{2} - (Q (x) - χ_{D} (x))^{2}] \leq (max w) δ (k, q (k, | Q |, | w |))$

Proof of Theorem 1

Define

$ϕ_{k} := E_{μ^{k}} [χ_{D} (x) - P^{k} (x) ∣ p_{k} \leq P^{k} (x) \leq q_{k}]$

Assume to the contrary that there is $ϵ > 0$ and an infinite set $I \subseteq N$ s.t.

$\forall k \in I : | ϕ_{k} | \geq ϵ$

Define ${w^{k} : supp μ^{k} c i r c - - \to {0, 1}}_{k \in N}$ as the circuits computing

$w^{k} (x) := θ (P^{k} (x) - p_{k}) θ (q_{k} - P^{k} (x))$

$| w^{k} |$ is bounded by a polynomial since $P^{k}$ produces binary fractions of polynomial size therefore it is possible to compare them to the fixed numbers $p_{k}, q_{k}$ using a polynomial size circuit even if the latter have infinite binary expansions.

We have

$ϕ_{k} = \frac{E_{μ^{k}} [w^{k} (x) (χ_{D} (x) - P^{k} (x))]}{E_{μ^{k}} [w^{k} (x)]}$

Define $ψ_{k}$ to be $ϕ_{k}$ truncated to the first significant binary digit. Define ${Q^{k} : supp μ^{k} c i r c - - \to [0, 1]}_{k \in N}$ as the circuits computing

$Q^{k} (x) := η (P^{k} (x) + ψ_{k})$

By the assumption, $ψ_{k}$ has binary notation of bounded size, therefore $| Q^{k} |$ is bounded by a polynomial.

Applying Lemma 2 we get

$\forall k \in I : E_{μ^{k}} [w^{k} (x) (P^{k} (x) - χ_{D} (x))^{2}] \leq E_{μ^{k}} [w^{k} (x) (Q^{k} (x) - χ_{D} (x))^{2}] + δ (k)$

for $δ$ vanishing at infinity.

$\forall k \in I : E_{μ^{k}} [w^{k} (x) ((P^{k} (x) - χ_{D} (x))^{2} - (Q^{k} (x) - χ_{D} (x))^{2})] \leq δ (k)$

$\forall k \in I : E_{μ^{k}} [w^{k} (x) ((P^{k} (x) - χ_{D} (x))^{2} - (η (P^{k} (x) + ψ_{k}) - χ_{D} (x))^{2})] \leq δ (k)$

Obviously $(η (P^{k} (x) + ψ_{k}) - χ_{D} (x))^{2} \leq (P^{k} (x) + ψ_{k} - χ_{D} (x))^{2}$ , therefore

$\forall k \in I : E_{μ^{k}} [w^{k} (x) ((P^{k} (x) - χ_{D} (x))^{2} - (P^{k} (x) + ψ_{k} - χ_{D} (x))^{2})] \leq δ (k)$

$\forall k \in I : ψ_{k} E_{μ^{k}} [w^{k} (x) (2 (χ_{D} (x) - P^{k} (x)) - ψ_{k})] \leq δ (k)$

The expression on the left hand side is a quadratic polynomial in $ψ_{k}$ which attains its maximum at $ϕ_{k}$ and has roots at $0$ and $2 ϕ_{k}$ . $ψ_{k}$ is between $0$ and $ϕ_{k}$ , but not closer to $0$ than $\frac{ϕ_{k}}{2}$ . Therefore, the inequality is preserved if we replace $ψ_{k}$ by $\frac{ϕ_{k}}{2}$ .

$\forall k \in I : \frac{ϕ_{k}}{2} E_{μ^{k}} [w^{k} (x) (2 (χ_{D} (x) - P^{k} (x)) - \frac{ϕ_{k}}{2})] \leq δ (k)$

Substituting the equation for $ϕ_{k}$ we get

$\forall k \in I : \frac{1}{2} \frac{E_{μ^{k}} [w^{k} (x) (χ_{D} (x) - P^{k} (x))]}{E_{μ^{k}} [w^{k} (x)]} E_{μ^{k}} [w^{k} (x) (2 (χ_{D} (x) - P^{k} (x)) - \frac{1}{2} \frac{E_{μ^{k}} [w^{k} (x) (χ_{D} (x) - P^{k} (x))]}{E_{μ^{k}} [w^{k} (x)]})] \leq δ (k)$

$\forall k \in I : \frac{3}{4} \frac{E_{μ^{k}} [w^{k} (x) (χ_{D} (x) - P^{k} (x))]^{2}}{E_{μ^{k}} [w^{k} (x)]} \leq δ (k)$

$\forall k \in I : \frac{3}{4} E_{μ^{k}} [w^{k} (x)] ϕ_{k}^{2} \leq δ (k)$

$\forall k \in I : ϕ_{k}^{2} \leq \frac{4}{3} E_{μ^{k}} [w^{k} (x)]^{- 1} δ (k)$

$\forall k \in I : ϕ_{k}^{2} \leq \frac{4}{3} μ^{k} {x \in {0, 1}^{*} ∣ p_{k} \leq P^{k} (x) \leq q_{k}}^{- 1} δ (k)$

Thus $ϕ_{k}$ vanishes at infinity on $I$ , which is a contradiction.

Lemma 3

Consider $(D, μ)$ a distributional decision problem. If ${P^{k} : supp μ^{k} c i r c - - \to [0, 1]}_{k \in N}$ is a quasi-optimal predictor for $(D, μ)$ then there are $c_{1}, c_{2} \in R$ and a quasinegligible function $δ^{*}$ s.t. for any $Q : supp μ^{k} c i r c - - \to Q$ we have

$| E_{μ^{k}} [Q (x) (P^{k} (x) - χ_{D} (x))] | \leq (c_{1} + c_{2} E_{μ^{k}} [Q (x)^{2}]) δ^{*} (k, | Q |)$

Conversely, suppose $M \in Q$ and ${P^{k} : supp μ^{k} c i r c - - \to Q \cap [- M, + M]}_{k \in N}$ is a polynomial size family for which there is a quasinegligible function $δ^{*}$ s.t. for any $Q : supp μ^{k} c i r c - - \to Q \cap [- M - 1, + M]}_{k \in N}$ we have

$| E_{μ^{k}} [Q (x) (P^{k} (x) - χ_{D} (x))] | \leq δ^{*} (k, | Q |)$

Define ${{~ P}^{k} : supp μ^{k} c i r c - - \to [0, 1]}_{k \in N}$ to be s.t. computing ${~ P}^{k} (x)$ is equivalent to computing $η (P^{k} (x))$ rounded to $k$ digits after the binary point. Then, $~ P$ is a quasi-optimal predictor.

Proof of Lemma 3

Assume $P$ is an optimal predictor. Consider $Q : supp μ^{k} c i r c - - \to Q$ and $t = σ 2^{- a}$ where $σ \in {\pm 1}$ and $a \in N$ . The function $η (P^{k} (x) + t Q (x))$ can be approximated by a circuit of size $p (k, | Q |)$ for some fixed polynomial $p$ , within rounding error $ϵ_{k} (x)$ s.t. $\forall x \in supp μ^{k} : | ϵ_{k} (x) | \leq 2^{- k}$ . By Lemma 1,

$E_{μ^{k}} [(P^{k} (x) - χ_{D} (x))^{2}] \leq E_{μ^{k}} [(η (P^{k} (x) + t Q (x)) + ϵ_{k} (x) - χ_{D} (x))^{2}] + δ (k, | Q |)$

where $δ$ is quasinegligible. $ϵ$ is bounded by a negligible function and therefore can be ignored by redefining $δ$ . As in the proof of Theorem 1, $η$ can be dropped.

$E_{μ^{k}} [(P^{k} (x) - χ_{D} (x))^{2} - (P^{k} (x) + t Q (x) - χ_{D} (x))^{2}] \leq δ (k, | Q |)$

The expression on the left hand side is a quadratic polynomial in $t$ . Explicitly:

$- E_{μ^{k}} [Q (x)^{2}] t^{2} - 2 E_{μ^{k}} [Q (x) (P^{k} (x) - χ_{D} (x))] t \leq δ (k, | Q |)$

Moving $E_{μ^{k}} [Q (x)^{2}] t^{2}$ to the right hand side and dividing both sides by $2 | t | = 2^{1 - a}$ we get

$- E_{μ^{k}} [Q (x) (P^{k} (x) - χ_{D} (x))] σ \leq 2^{a - 1} δ (k, | Q |) + E_{μ^{k}} [Q (x)^{2}] 2^{- a - 1}$

$| E_{μ^{k}} [Q (x) (P^{k} (x) - χ_{D} (x))] | \leq 2^{a - 1} δ (k, | Q |) + E_{μ^{k}} [Q (x)^{2}] 2^{- a - 1}$

Take $a := - \frac{1}{2} log δ (k, | Q |) + ϕ (k)$ where $ϕ (k) \in [- \frac{1}{2}, + \frac{1}{2}]$ is the rounding error. We get

$| E_{μ^{k}} [Q (x) (P^{k} (x) - χ_{D} (x))] | \leq 2^{ϕ (k) - 1} δ (k, | Q |)^{\frac{1}{2}} + E_{μ^{k}} [Q (x)^{2}] 2^{- ϕ (k) - 1} δ (k, | Q |)^{\frac{1}{2}}$

Conversely, assume that for any $R : supp μ^{k} c i r c - - \to Q \cap [- M - 1, + M]$

$| E_{μ^{k}} [R (x) (P^{k} (x) - χ_{D} (x))] | \leq δ^{*} (k, | R |)$

Consider $Q : supp μ^{k} c i r c - - \to [0, 1]$ . We have

$E_{μ^{k}} [(Q (x) - χ_{D} (x))^{2}] = E_{μ^{k}} [(Q (x) - P^{k} (x) + P^{k} (x) - χ_{D} (x))^{2}]$

$E_{μ^{k}} [(Q (x) - χ_{D} (x))^{2}] = E_{μ^{k}} [(Q (x) - P^{k} (x))^{2}] + E_{μ^{k}} [(P^{k} (x) - χ_{D} (x))^{2}] + 2 E_{μ^{k}} [(Q (x) - P^{k} (x)) (P^{k} (x) - χ_{D} (x)]$

$2 E_{μ^{k}} [(P^{k} (x) - Q (x)) (P^{k} (x) - χ_{D} (x)] = E_{μ^{k}} [(P^{k} (x) - χ_{D} (x))^{2}] - E_{μ^{k}} [(Q (x) - χ_{D} (x))^{2}] + E_{μ^{k}} [(Q (x) - P^{k} (x))^{2}]$

$P^{k} (x) - Q (x)$ can be computed by a circuit $R$ of size polynomial in $| Q |$ and $k$ . Applying the assumption we get

$E_{μ^{k}} [(P^{k} (x) - χ_{D} (x))^{2}] - E_{μ^{k}} [(Q (x) - χ_{D} (x))^{2}] + E_{μ^{k}} [(Q (x) - P^{k} (x))^{2}] \leq ~ δ (k, | Q |)$

where $~ δ$ is quasinegligible. Noting that $E_{μ^{k}} [(Q (x) - P^{k} (x))^{2}] \geq 0$ and $(η (P^{k} (x)) - χ_{D} (x))^{2} \leq (P^{k} (x) - χ_{D} (x))^{2}$ we get

$E_{μ^{k}} [(η (P^{k} (x)) - χ_{D} (x))^{2}] - E_{μ^{k}} [(Q (x) - χ_{D} (x))^{2}] \leq ~ δ (k, | Q |)$

Observing that $~ P - η (P)$ is bounded by a negligible function, we get the desired result.

Proof of Theorem 2

Consider $Q : supp μ^{k} c i r c - - \to Q$ . We have

$E_{μ^{k}} [Q (x) (P_{1}^{k} (x) + P_{2}^{k} (x) - χ_{D_{1} \cup D_{2}} (x))] = E_{μ^{k}} [Q (x) (P_{1}^{k} (x) - χ_{D_{1}} (x))] + E_{μ^{k}} [Q (x) (P_{2}^{k} (x) - χ_{D_{2}} (x))]$

Using Lemma 3:

$| E_{μ^{k}} [Q (x) (P_{1}^{k} (x) - χ_{D_{1}} (x))] | \leq (c_{11} + c_{12} E_{μ^{k}} [Q (x)^{2}]) δ_{1} (k, | Q |)$

$| E_{μ^{k}} [Q (x) (P_{2}^{k} (x) - χ_{D_{2}} (x))] | \leq (c_{21} + c_{22} E_{μ^{k}} [Q (x)^{2}]) δ_{2} (k, | Q |)$

Therefore

$| E_{μ^{k}} [Q (x) (P_{1}^{k} (x) + P_{2}^{k} (x) - χ_{D_{1} \cup D_{2}} (x))] | \leq (c_{11} + c_{21} + (c_{12} + c_{22}) E_{μ^{k}} [Q (x)^{2}]) (δ_{1} (k, | Q |) + δ_{2} (k, | Q |))$

Using Lemma 3 again we get the desired result.

Proof of Theorem 4

We have

$P^{k} ((x_{1}, x_{2})) - χ_{D_{1} \times D_{2}} ((x_{1}, x_{2})) = (P_{1}^{k} (x_{1}) - χ_{D_{1}} (x_{1})) χ_{D_{2}} (x_{2}) + P_{1}^{k} (x_{1}) (P_{2}^{k} (x_{2}) - χ_{D_{2}} (x_{2}))$

Therefore, for any $Q : supp (μ_{1} \times μ_{2})^{k} c i r c - - \to Q \cap [- 1, + 1]$

$| E_{(μ_{1} \times μ_{2})^{k}} [Q (x) (P^{k} (x) - χ_{D_{1} \times D_{2}} (x))] | \leq | E_{μ_{1}^{k} \times μ_{2}^{k}} [Q ((x_{1}, x_{2})) (P_{1}^{k} (x_{1}) - χ_{D_{1}} (x_{1})) χ_{D_{2}} (x_{2})] | + | E_{μ_{1}^{k} \times μ_{2}^{k}} [Q ((x_{1}, x_{2})) P_{1}^{k} (x_{1}) (P_{2}^{k} (x_{2}) - χ_{D_{2}} (x_{2}))] |$

By Lemma 3, it is sufficient to show an appropriate bound for each of the terms on the right hand side. For the first term, we have

$| E_{μ_{1}^{k} \times μ_{2}^{k}} [Q ((x_{1}, x_{2})) (P_{1}^{k} (x_{1}) - χ_{D_{1}} (x_{1})) χ_{D_{2}} (x_{2})] | \leq E_{μ_{2}^{k}} [| E_{μ_{1}^{k}} [χ_{D_{2}} (x_{2}) Q ((x_{1}, x_{2})) (P_{1}^{k} (x_{1}) - χ_{D_{1}} (x_{1}))] |]$

For any given $x_{2}$ , $χ_{D_{2}} (x_{2}) Q ((x_{1}, x_{2}))$ can be computed by a circuit with input $x_{1}$ of size polynomial in $| x_{2} |$ and $| Q |$ . Applying Lemma 3 to $P_{1}$ , we get

$| E_{μ_{1}^{k} \times μ_{2}^{k}} [Q ((x_{1}, x_{2})) (P_{1}^{k} (x_{1}) - χ_{D_{1}} (x_{1})) χ_{D_{2}} (x_{2})] | \leq E_{μ_{2}^{k}} [δ_{1} (k, p_{1} (| x_{2} |, | Q |))]$

where $p_{1}$ is a polynomial and $δ_{1}$ is quasinegligible. Since $| x_{2} |$ is bounded by a polynomial in $k$ for $x_{2} \in supp μ_{2}^{k}$ , we get the bound we need.

For the second term, we have

$| E_{μ_{1}^{k} \times μ_{2}^{k}} [Q ((x_{1}, x_{2})) P_{1}^{k} (x_{1}) (P_{2}^{k} (x_{2}) - χ_{D_{2}} (x_{2}))] | \leq E_{μ_{1}^{k}} [| E_{μ_{2}^{k}} [Q ((x_{1}, x_{2})) P_{1}^{k} (x_{1}) (P_{2}^{k} (x_{2}) - χ_{D_{2}} (x_{2}))] |]$

For any given $x_{1}$ , $Q ((x_{1}, x_{2})) P_{1}^{k} (x_{1})$ can be computed by a circuit with input $x_{1}$ of size polynomial in $k$ , $| x_{1} |$ and $| Q |$ . Applying Lemma 3 to $P_{2}$ , we get

$| E_{μ_{1}^{k} \times μ_{2}^{k}} [Q ((x_{1}, x_{2})) P_{1}^{k} (x_{1}) (P_{2}^{k} (x_{2}) - χ_{D_{2}} (x_{2}))] | \leq E_{μ_{1}^{k}} [δ_{2} (k, p_{2} (k, | x_{1} |, | Q |))]$

Again, we got the required bound.

Proof of Theorem 7

Assume $Q$ is a quasi-optimal predictor. Applying Lemma 3 to predictor $P$ and circuits computing $P^{k} - Q^{k}$ , we get

$| E_{μ^{k}} [(P^{k} (x) - Q^{k} (x)) (P^{k} (x) - χ_{D} (x))] | \leq δ (k)$

for some $δ$ vanishing at infinity. Applying Lemma 3 to predictor $Q$ and circuits computing $P^{k} - Q^{k}$ , we get

$| E_{μ^{k}} [(P^{k} (x) - Q^{k} (x)) (Q^{k} (x) - χ_{D} (x))] | \leq ϵ (k)$

for some $ϵ$ vanishing at infinity. We have

$E_{μ^{k}} [(P^{k} (x) - Q^{k} (x))^{2}] = E_{μ^{k}} [(P^{k} (x) - Q^{k} (x)) (P^{k} (x) - χ_{D} (x))] - E_{μ^{k}} [(P^{k} (x) - Q^{k} (x)) (Q^{k} (x) - χ_{D} (x))]$

$E_{μ^{k}} [(P^{k} (x) - Q^{k} (x))^{2}] \leq | E_{μ^{k}} [(P^{k} (x) - Q^{k} (x)) (P^{k} (x) - χ_{D} (x))] | + | E_{μ^{k}} [(P^{k} (x) - Q^{k} (x)) (Q^{k} (x) - χ_{D} (x))] |$

$E_{μ^{k}} [(P^{k} (x) - Q^{k} (x))^{2}] \leq δ (k) + ϵ (k)$

Conversely, assume $P μ \approx Q$ . Consider some $R : supp μ^{k} c i r c - - \to [0, 1]$ . We have

$E_{μ^{k}} [R (x) (Q^{k} (x) - χ_{D} (x))] = E_{μ^{k}} [R (x) (Q^{k} (x) - P^{k} (x) + P^{k} (x) - χ_{D} (x))]$

$E_{μ^{k}} [R (x) (Q^{k} (x) - χ_{D} (x))] = E_{μ^{k}} [R (x) (Q^{k} (x) - P^{k} (x))] + E_{μ^{k}} [R (x) (P^{k} (x) - χ_{D} (x))]$

$| E_{μ^{k}} [R (x) (Q^{k} (x) - P^{k} (x))] | \leq E_{μ^{k}} [| Q^{k} (x) - P^{k} (x) |] \leq \sqrt{E_{μ^{k}} [(Q^{k} (x) - P^{k} (x))^{2}]} \leq δ (k)$

for some $δ$ vanishing at infinity, since $P μ \approx Q$ .

$| E_{μ^{k}} [R (x) (P^{k} (x) - χ_{D} (x))] | \leq δ^{*} (k, | R |)$

for some quasinegligible $δ^{*}$ , using Lemma 3. Combining both inequalities we get

$| E_{μ^{k}} [R (x) (Q^{k} (x) - χ_{D} (x))] | \leq δ (k) + δ^{*} (k, | R |)$

Using Lemma 3 again we conclude $Q$ is a quasi-optimal predictor.

Lemma 4

Consider $C \subseteq D \subseteq {0, 1}^{*}$ and $μ$ a word ensemble. Assume $P_{C}$ is a quasi-optimal predictor for $(C, μ)$ and $P_{D}$ is a quasi-optimal predictor for $(D, μ)$ . Define

$ϵ^{k} (x) := θ (P_{C}^{k} (x) - P_{D}^{k} (x)) (P_{C}^{k} (x) - P_{D}^{k} (x))$

Then, ${lim}_{k \to \infty} E_{μ^{k}} [ϵ^{k} (x)] = 0$ .

Proof of Lemma 4

By Theorem 3 and Lemma 3 there is a quasinegligible function $δ$ such that for any $Q : supp μ^{k} c i r c - - \to Q \cap [- 2, + 1]$ we have

$| E_{μ^{k}} [Q (x) (P_{D}^{k} (x) - P_{C}^{k} (x) - χ_{D ∖ C} (x))] | \leq δ (k, | Q |)$

Take $Q$ to be the circuit computing $θ (P_{C}^{k} (x) - P_{D}^{k} (x))$ . Its size is polynomial in $k$ therefore

$| E_{μ^{k}} [θ (P_{C}^{k} (x) - P_{D}^{k} (x)) (P_{D}^{k} (x) - P_{C}^{k} (x) - χ_{D ∖ C} (x))] | \leq δ^{*} (k)$

where $δ^{*}$ vanishes at infinity.

$| E_{μ^{k}} [ϵ^{k} (x)] + E_{μ^{k}} [θ (P_{C}^{k} (x) - P_{D}^{k} (x)) χ_{D ∖ C} (x)] | \leq δ^{*} (k)$

Since both terms inside the absolute value are non-negative we get the desired result.

Proof of Theorem 6

When $P_{D}^{k} (x) > 0$ we have

$P_{C ∣ D}^{k} (x) = \frac{min (P_{C \cap D}^{k} (x), P_{D}^{k} (x))}{P_{D}^{k} (x)}$

Define ${~ P}_{C \cap D}^{k}$ to be the circuit computing $min (P_{C \cap D}^{k} (x), P_{D}^{k} (x))$ . Since $C \cap D \subseteq D$ , Lemma 4 implies that ${lim}_{k \to \infty} E_{μ^{k}} [P_{C \cap D}^{k} (x) - {~ P}_{C \cap D}^{k} (x)] = 0$ . This implies ${lim}_{k \to \infty} E_{μ^{k}} [(P_{C \cap D}^{k} (x) - {~ P}_{C \cap D}^{k} (x))^{2}] = 0$ and by Theorem 7 ${~ P}_{C \cap D}$ is a quasi-optimal predictor for $(C \cap D, μ)$ .

We have ${~ P}_{C \cap D}^{k} (x) = P_{C ∣ D}^{k} (x) P_{D}^{k} (x)$ (whether $P_{D}^{k} (x) > 0$ or $P_{D}^{k} (x) = 0$ ) and therefore

${~ P}_{C \cap D}^{k} (x) - χ_{C \cap D} (x) = (P_{C ∣ D}^{k} (x) - χ_{C} (x)) χ_{D} (x) + P_{C ∣ D}^{k} (x) (P_{D}^{k} (x) - χ_{D} (x))$

$(P_{C ∣ D}^{k} (x) - χ_{C} (x)) χ_{D} (x) = {~ P}_{C \cap D}^{k} (x) - χ_{C \cap D} (x) - P_{C ∣ D}^{k} (x) (P_{D}^{k} (x) - χ_{D} (x))$

Consider $Q : supp μ^{k} c i r c - - \to Q \cap [- 1, + 1]$ .

$E_{μ^{k} ∣ D} [Q (x) (P_{C ∣ D}^{k} (x) - χ_{C} (x))] = μ^{k} (D)^{- 1} E_{μ^{k}} [Q (x) (P_{C ∣ D}^{k} (x) - χ_{C} (x)) χ_{D} (x)]$

By Lemma 3 it is sufficient to prove appropriate bounds on $| E_{μ^{k}} [Q (x) ({~ P}_{C \cap D}^{k} (x) - χ_{C \cap D} (x))] |$ and $| E_{μ^{k}} [Q (x) P_{C ∣ D}^{k} (x) (P_{D}^{k} (x) - χ_{D} (x))] |$ . Both bounds follow from Lemma 3 using the facts ${~ P}_{C \cap D}$ and $P_{D}$ are quasi-optimal predictors and $| P_{C ∣ D}^{k} |$ is bounded by a polynomial.

Proof of Theorem 8

Consider $k \in N$ , $Q_{C} : supp μ^{k} c i r c - - \to [0, 1]$ . Define $Q_{D} : supp ν^{p (k)} \times {0, 1}^{q (k)} c i r c - - \to [0, 1]$ to be the circuit computing $Q_{D} (y, r) := Q_{C} (g^{k} (y, r))$ . Applying Lemma 2, treating $r$ as a constant and using $R$ as the weight circuit, we get

$E_{ν^{p (k)}} [R^{k} (y) (P_{D}^{p (k)} (y) - χ_{D} (y))^{2}] \leq E_{ν^{p (k)}} [R^{k} (y) (Q_{D} (y, r) - χ_{D} (y))^{2}] + δ (k, | Q_{C} |)$

where $δ$ is quasinegligible. We used condition (ii) to get a constant bound on $max R^{k}$ and condition (iv) to get a polynomial bound on $| R^{k} |$ .

We take the expectation value of both sides with respect to the uniform measure over $r$ :

$E_{ν^{p (k)}} [R^{k} (y) (P_{D}^{p (k)} (y) - χ_{D} (y))^{2}] \leq E_{ν^{p (k)} \times U^{q (k)}} [R^{k} (y) (Q_{D} (y, r) - χ_{D} (y))^{2}] + δ (k, | Q_{C} |)$

The left hand side can be rewritten as follows

$E_{ν^{p (k)}} [R^{k} (y) (P_{D}^{p (k)} (y) - χ_{D} (y))^{2}] = \sum y \in {0, 1}^{*} ν^{p (k)} (y) \frac{μ^{k} ((f^{k})^{- 1} (y))}{ν^{p (k)} (y)} (P_{D}^{p (k)} (y) - χ_{D} (y))^{2}$

$E_{ν^{p (k)}} [R^{k} (y) (P_{D}^{p (k)} (y) - χ_{D} (y))^{2}] = \sum y \in {0, 1}^{*} μ^{k} ((f^{k})^{- 1} (y)) (P_{D}^{p (k)} (y) - χ_{D} (y))^{2}$

$E_{ν^{p (k)}} [R^{k} (y) (P_{D}^{p (k)} (y) - χ_{D} (y))^{2}] = \sum y \in {0, 1}^{*} \sum \begin{matrix} x \in supp μ^{k} f^{k} (x) = y \end{matrix} μ^{k} (x) (P_{D}^{p (k)} (y) - χ_{D} (y))^{2}$

Grouping the sum by $x$ , we get

$E_{ν^{p (k)}} [R^{k} (y) (P_{D}^{p (k)} (y) - χ_{D} (y))^{2}] = \sum x \in supp μ^{k} μ^{k} (x) (P_{C}^{k} (x) - χ_{C} (x))^{2}$

$E_{ν^{p (k)}} [R^{k} (y) (P_{D}^{p (k)} (y) - χ_{D} (y))^{2}] = E_{μ^{k}} [(P_{C}^{k} (x) - χ_{C} (x))^{2}]$

The first term on the right hand side can be rewritten as

$E_{ν^{p (k)} \times U^{q (k)}} [R^{k} (y) (Q_{D} (y, r) - χ_{D} (y))^{2}] = \sum y \in {0, 1}^{*} \sum r \in {0, 1}^{q (k)} 2^{- q (k)} μ^{k} ((f^{k})^{- 1} (y)) (Q_{D} (y, r) - χ_{D} (y))^{2}$

Grouping the sum by $x := g (y, r)$ we get:

$E_{ν^{p (k)} \times U^{q (k)}} [R^{k} (y) (Q_{D} (y, r) - χ_{D} (y))^{2}] = \sum x \in {0, 1}^{*} \sum y \in {0, 1}^{*} \sum \begin{matrix} r \in {0, 1}^{q (k)} g^{k} (y, r) = x \end{matrix} 2^{- q (k)} μ^{k} ((f^{k})^{- 1} (y)) (Q_{C} (x) - χ_{C} (x))^{2}$

Condition (iii) tells us that $\sum_{\begin{matrix} r \in {0, 1}^{q (k)} g^{k} (y, r) = x \end{matrix}} 2^{- q (k)}$ is only non-vanishing when $y = f^{k} (x)$ and that in this case it equals $\frac{μ^{k} (x)}{μ^{k} ((f^{k})^{- 1} (y))}$ . Therefore

$E_{ν^{p (k)} \times U^{q (k)}} [R^{k} (y) (Q_{D} (y, r) - χ_{D} (y))^{2}] = \sum x \in {0, 1}^{*} μ^{k} (x) (Q_{C} (x) - χ_{C} (x))^{2}$

$E_{ν^{p (k)} \times U^{q (k)}} [R^{k} (y) (Q_{D} (y, r) - χ_{D} (y))^{2}] = E_{μ^{k}} [(Q_{C} (x) - χ_{C} (x))^{2}]$

Putting everything together, we get

$E_{μ^{k}} [(P_{C}^{k} (x) - χ_{C} (x))^{2}] \leq E_{μ^{k}} [(Q_{C} (x) - χ_{C} (x))^{2}] + δ (k, | Q_{C} |)$

Proof of Theorem 9

Assume to the contrary that $P_{f}$ is not quasi-optimal. Then there is an infinite set $I \subseteq N$ , a polynomial size family of circuits ${Q^{k} : supp {~ μ}_{f}^{k} c i r c - - \to [0, 1]}_{k \in I}$ and $ϵ > 0$ s.t.

$\forall k \in I : E_{{~ μ}_{f}^{k}} [(P_{f}^{k} (x) - χ_{D_{f}} (x)^{2})] \geq E_{{~ μ}_{f}^{k}} [(Q^{k} (x) - χ_{D_{f}} (x))^{2}] + ϵ$

$\forall k \in I : E_{{~ μ}_{f}^{k}} [(Q^{k} (x) - χ_{D_{f}} (x))^{2}] \leq \frac{1}{4} - ϵ$

Define the functions ${q^{k} : supp {~ μ}_{f}^{k} \times [0, 1] \to {0, 1}}_{k \in I}$ by $q^{k} (x, t) := θ (Q^{k} (x) - t)$ . We have

$\forall k \in I, x \in supp {~ μ}_{f}^{k} : Q^{k} (x) = \int_{0}^{1} q^{k} (x, t) d t$

Substituting into the inequality above

$\forall k \in I : E_{{~ μ}_{f}^{k}} [(\int_{0}^{1} q^{k} (x, t) d t - χ_{D_{f}} (x))^{2}] \leq \frac{1}{4} - ϵ$

$\forall k \in I : E_{{~ μ}_{f}^{k}} [| \int_{0}^{1} q^{k} (x, t) d t - χ_{D_{f}} (x) |]^{2} \leq \frac{1}{4} - ϵ$

$\forall k \in I : E_{{~ μ}_{f}^{k}} [| \int_{0}^{1} (q^{k} (x, t) - χ_{D_{f}} (x)) d t |] \leq \sqrt{\frac{1}{4} - ϵ}$

For every given $x$ , $q^{k} (x, t) - χ_{D_{f}} (x)$ is either non-negative for all $t$ or non-positive for $t$ . Hence we can move the absolute value inside the integral:

$\forall k \in I : E_{{~ μ}_{f}^{k}} [\int_{0}^{1} | q^{k} (x, t) - χ_{D_{f}} (x) | d t] \leq \sqrt{\frac{1}{4} - ϵ}$

$\forall k \in I : \int_{0}^{1} E_{{~ μ}_{f}^{k}} [| q^{k} (x, t) - χ_{D_{f}} (x) |] d t \leq \sqrt{\frac{1}{4} - ϵ}$

This implies that we can choose ${t_{k} \in Q^{k} (supp {~ μ}_{f}^{k}) \cup {0, 1}}_{k \in I}$ s.t.

$\forall k \in I : E_{{~ μ}_{f}^{k}} [| q^{k} (x, t_{k}) - χ_{D_{f}} (x) |] \leq \sqrt{\frac{1}{4} - ϵ}$

$\forall k \in I : P r_{{~ μ}_{f}^{k}} [q^{k} (x, t_{k}) \neq χ_{D_{f}} (x)] \leq \sqrt{\frac{1}{4} - ϵ}$

$\forall k \in I : P r_{{~ μ}_{f}^{k}} [q^{k} (x, t_{k}) = χ_{D_{f}} (x)] \geq 1 - \sqrt{\frac{1}{4} - ϵ}$

Using the fact that the graph of the square root lies below its tangent at any point, this leads to

$\forall k \in I : P r_{{~ μ}_{f}^{k}} [q^{k} (x, t_{k}) = χ_{D_{f}} (x)] \geq \frac{1}{2} + ϵ$

Define ${g^{k} : f ({0, 1}^{k}) \times {0, 1}^{k} c i r c - - \to {0, 1}}_{k \in N}$ as the circuits computing $g^{k} (y, r) := 1 - q^{k} ((y, r), t_{k})$ . The definitions of $q^{k}$ and $t_{k}$ imply that $| g^{k} |$ is bounded by a polynomial. The inequality above and the definitions of $D_{f}$ and ${~ μ}_{f}$ imply

$\forall k \in I : \frac{1}{k} \sum i < k P r_{U^{i} \times U^{i}} [g^{i} (f (x), r) = x \cdot r] \geq \frac{1}{2} + ϵ$

But this contradicts the assumption on $f$ .

Note that this argument doesn't show $P_{f}$ is optimal since while the averaging over $i$ preserves the property of vanishing at infinity, it doesn't preserve the property of negligibility. Moreover, it is possible to show that no optimal predictor for $(D_{f}, {~ μ}_{f})$ exists.

[-]orthonormal9y00

So, to be clear, the difference between an optimal predictor and a quasi-optimal predictor is as follows: $δ (k)$ is the amount by which some other polynomial-size circuit family is able to beat the current predictor. An optimal predictor cannot be beaten by any more than a $δ$ such that $k^{N} δ (k) \to 0$ for any $N$ , while a quasi-optimal predictor can only assert it cannot be beaten by any more than a $δ$ such that $δ (k) \to 0$ . Yes?

[-]Vanessa Kosoy9y00

Exactly. It's such a simple tweak that it's embarrassing I haven't noticed it in the first place. It's just that in average complexity theory negligible errors seem much more popular. The price to pay is that some theorems require stronger assumptions namely something that had to be bounded by a polynomial now has to be bounded by a constant. On the other hand Theorems 9 demonstrates that there is no chance optimal predictors cover all of $V e r N P$ (for example) whereas quasi-optimal predictors might work. Also, I think that in $V e r N P$ there is a universal construction for uniform quasi-optimal predictors (as opposed to optimal predictors) similar to Levin's universal search although I haven't fleshed out the details yet (anyhow such a construction would be theoretically valid but highly inefficient in practice).