Predictor schemes with logarithmic advice

Vanessa Kosoy

We introduce a variant of optimal predictor schemes where optimality holds within the space of random algorithms with logarithmic advice. These objects are also guaranteed to exist for the error space $Δ_{a v g}^{2}$ . We introduce the class of generatable problems and construct a uniform universal predictor scheme for this class which is optimal in the new sense with respect to the $Δ_{a v g}^{2}$ error space. This is achieved by a construction similar to Levin's universal search.

Results

New notation

Given $n \in N$ , $e v_{n} : N \times {0, 1}^{*}^{n + 1} a l g - \to {0, 1}^{*}$ is the following algorithm. When $e v_{n}^{k} (Q, x_{1} \dots x_{n})$ is computed, $Q$ is interpreted as a program and $Q (x_{1} \dots x_{n})$ is executed for time $k$ . The resulting output is produced.

The notation $e v^{k} (Q, x_{1} \dots x_{n})$ means $e v_{n}^{k} (Q, x_{1} \dots x_{n})$ .

$β : {0, 1}^{*} \to [0, 1]$ is the mapping from a binary expansion to the corresponding real number.

Given $μ$ a word ensemble, $X$ a set, $Q : {0, 1}^{*}^{2} a l g - \to X$ , $T_{Q}^{μ} (k, s)$ stands for the maximal runtime of $Q (x, y)$ for $x \in supp μ^{k}$ , $y \in {0, 1}^{s}$ .

Previous posts focused on prediction of distributional decision problems, which is the "computational uncertainty" analogue of probability. Here, we use the broader concept of predicting distributional estimation problems (functions), which is analogous to expectation value.

Definition 1

A distributional estimation problem is a pair $(f, μ)$ where $f : {0, 1}^{*} \to [0, 1]$ is an arbitrary function (even irrational values are allowed) and $μ$ is a word ensemble.

Definition 2

Given an appropriate set $X$ , consider $P : N^{2} \times {0, 1}^{*}^{3} a l g - \to X$ , $r : N^{2} \to N$ polynomial and $a : N^{2} \to {0, 1}^{*}$ . The triple $(P, r, a)$ is called an $X$ -valued $(p o l y, l o g)$ -bischeme when

(i) The runtime of $P (k, j, x, y, z)$ is bounded by $p (k, j)$ with $p$ polynomial.

(ii) $| a (k, j) | \leq c_{1} + c_{2} log (k + 1) + c_{3} log (j + 1)$ for some $c_{1}, c_{2}, c_{3} \in N$ .

A $[0, 1]$ -valued $(p o l y, l o g)$ -bischeme will also be called a $(p o l y, l o g)$ -predictor scheme.

We think of $P$ as a random algorithm where the second word parameter represents its internal coin tosses. The third word parameter represents the advice and we usually substitute $a$ there.

We will use the notations $P^{k j} (x, y, z) := P (k, j, x, y, z)$ , $a^{k j} := a (k, j)$ .

Definition 3

Fix $Δ$ an error space of rank 2 and $(f, μ)$ a distributional estimation problem. Consider $(P, r, a)$ a $(p o l y, l o g)$ -predictor scheme. $(P, r, a)$ is called a $Δ (p o l y, l o g)$ -optimal predictor scheme for $(f, μ)$ when for any $(p o l y, l o g)$ -predictor scheme $(Q, s, b)$ , there is $δ \in Δ$ s.t.

$E_{μ^{k} \times U^{r (k, j)}} [(P^{k j} (x, y, a^{k j}) - f (x))^{2}] \leq E_{μ^{k} \times U^{s (k, j)}} [(Q^{k j} (x, y, b^{k j}) - f (x))^{2}] + δ (k, j)$

Note 1

The notation $(p o l y, l o g)$ is meant to remind us that we allow a polynomial quantity of random bits $r (k, j)$ and a logarithmic quantity of advice $| a^{k j} |$ . In fact, the definitions and some of the theorems can be generalized to other quantities of random and advice (see also Note B.1). Thus, predictor schemes from previous posts are $(p o l y, p o l y)$ -predictor schemes, $(p o l y, O (1))$ -predictor schemes are limited to O(1) advice, $(l o g, 0)$ -predictor schemes use a logarithmic number of random bits and no advice and so on. As usual in complexity theory, it is redundant to consider more advice than random since advice is strictly more powerful.

$Δ (p o l y, l o g)$ -optimal predictor scheme satisfy properties analogical to $Δ$ -optimal predictor schemes. These properties are listed in Appendix A. The proofs of Theorem A.1 and A.4 are given in Appendix B. The other proofs are straightforward adaptions of corresponding proofs with polynomial advice.

We also have the following existence result:

Theorem 1

Consider $(f, μ)$ a distributional estimation problem. Define $Υ : N^{2} \times {0, 1}^{*}^{3} a l g - \to [0, 1]$ by

$Υ^{k j} (x, y, Q) := β (e v^{j} (Q, x, y))$

Define $υ_{f, μ} : N^{2} \to {0, 1}^{*}$ by

$υ_{f, μ}^{k j} := a r g m i n | Q | \leq log j E_{μ^{k} \times U^{j}} [(Υ^{k j} (x, y, Q) - f (x))^{2}]$

Then, $(Υ, j, υ_{f, μ})$ is a $Δ_{a v g}^{2} (p o l y, l o g)$ -optimal predictor scheme for $(f, μ)$ .

Note 2

Consider a distributional decision problem $(D, μ)$ . Assume $(D, μ)$ admits $n \in N$ , $A : N \times {0, 1}^{*}^{3} a l g - \to {0, 1}$ , $a : N \to {0, 1}^{*}$ and a function $r : N \to N$ s.t.

(i) $A (k, x, y, z)$ runs in quasi-polynomial time ( $O (2^{{log}^{n} k})$ ).

(ii) $| a (k) | = O ({log}^{n} k)$

(iii) $lim k \to \infty P r_{μ^{k} \times U^{r (k)}} [A (k, x, y, a (k)) \neq χ_{D} (x)] = 0$

Then it is easy to see we can construct a $(p o l y, l o g)$ -predictor scheme $P_{A}$ taking values in ${0, 1}$ s.t. $E [(P_{A} - f)^{2}] \in Δ_{a v g}^{2}$ . The implication doesn't work for larger sizes of time or advice. Therefore, the uncertainty represented by $Δ_{a v g}^{2} (p o l y, l o g)$ -optimal predictor schemes is associated with the resource gap between quasi-polynomial time plus advice $O ({log}^{n} k)$ and the resources needed to (heuristically) solve the problem in question.

The proof of Theorem 1 is given in Appendix C: it is a straightforward adaptation of the corresponding proof for polynomial advice. Evidently, the above scheme is non-uniform. We will now describe a class of problems which admits uniform $Δ_{a v g}^{2} (p o l y, l o g)$ -optimal predictor schemes.

Definition 4

Consider $Δ^{1}$ an error space of rank 1. A word ensemble $μ$ is called $Δ^{1} (l o g)$ -sampleable when there is $S : N \times {0, 1}^{*}^{2} a l g - \to {0, 1}^{*}$ that runs in polynomial time in the 1st argument, $a^{S} : N \to {0, 1}^{*}$ of logarithmic size and $r^{S} : N \to N$ a polynomial such that

$\sum x \in {0, 1}^{*} | μ^{k} (x) - P r_{U^{r^{S} (k)}} [S^{k} (y, a^{S} (k)) = x] | \in Δ^{1}$

$(S, r^{S}, a^{S})$ is called a $Δ^{1} (l o g)$ -sampler for $μ$ .

Definition 5

Consider $Δ^{1}$ an error space of rank 1. A distributional estimation problem $(f, μ)$ is called $Δ^{1} (l o g)$ -generatable when there are $S : N \times {0, 1}^{*}^{2} a l g - \to {0, 1}^{*}$ and $F : N \times {0, 1}^{*}^{2} a l g - \to [0, 1]$ that run in polynomial time in the 1st argument, $a^{S} : N \to {0, 1}^{*}$ of logarithmic size and $r^{S} : N \to N$ a polynomial such that

(i) $(S, r^{S}, a^{S})$ is a $Δ^{1} (l o g)$ -sampler for $μ$ .

(ii) $E_{U^{r^{S} (k)}} [(F^{k} (y, a^{S} (k)) - f (S^{k} (y, a^{S} (k))))^{2}] \in Δ^{1}$

$(S, F, r^{S}, a^{S})$ is called a $Δ^{1} (l o g)$ -generator for $(f, μ)$ .

When $a^{S}$ is the empty string, $(S, F, r^{S})$ is called a $Δ^{1} (0)$ -generator for $(f, μ)$ . Such $(f, μ)$ is called $Δ^{1} (0)$ -generatable.

Note 3

The class of $Δ^{1} (0)$ -generatable problems can be regarded as an average-case analogue of $N P \cap c o N P$ . If $f$ is a decision problem (i.e. its range is ${0, 1}$ ), words $y \in {0, 1}^{r^{S} (k)}$ s.t. $S^{k} (y) = x$ , $F^{k} (y) = 1$ can be regarded as "proofs" of $f (x) = 1$ and words $y \in {0, 1}^{r^{S} (k)}$ s.t. $S^{k} (y) = x$ , $F^{k} (y) = 0$ can be regarded as "proofs" of $f (x) = 0$ .

Theorem 2

There is an oracle machine $Λ$ that accepts an oracle of signature $S F : N \times {0, 1}^{*} \to {0, 1}^{*} \times [0, 1]$ and a polynomial $r : N \to N$ where the allowed oracle calls are $S F^{k} (x)$ for $| x | = r (k)$ and computes a function of signature $N^{2} \times {0, 1}^{*}^{2} \to [0, 1]$ s.t. for any $(f, μ)$ a distributional estimation problem and $G := (S, F, r^{S}, a^{S})$ a corresponding $Δ_{0}^{1} (l o g)$ -generator, $Λ [G]$ is a $Δ_{a v g}^{2} (p o l y, l o g)$ -optimal predictor scheme for $(f, μ)$ .

In particular if $(f, μ)$ is $Δ_{0}^{1} (0)$ -generatable, we get a uniform $Δ_{a v g}^{2} (p o l y, l o g)$ -optimal predictor scheme.

The following is the description of $Λ$ . Consider $S F : N \times {0, 1}^{*} \to {0, 1}^{*} \times [0, 1]$ and a polynomial $r : N \to N$ . We describe the computation of $Λ [S F, r]^{k j} (x)$ where the extra argument of $Λ$ is regarded as internal coin tosses.

We loop over the first $j$ words in lexicographic order. Each word is interpreted as a program $Q : {0, 1}^{*}^{2} a l g - \to [0, 1]$ . We loop over $j k$ "test runs". At test run $i$ , we generate $(x_{i} \in {0, 1}^{*}, t_{i} \in [0, 1])$ by evaluating $S F^{k} (y_{i})$ for $y_{i}$ sampled from $U^{r (k)}$ . We then sample $z_{i}$ from $U^{j}$ and compute $s_{i} := e v^{j} (Q, x_{i}, z_{i})$ . At the end of the test runs, we compute the average error $ϵ (Q) := \frac{1}{j k} \sum_{i} (s_{i} - t_{i})^{2}$ . At the end of the loop over programs, the program $Q^{*}$ with the lowest error is selected and the output $e v^{j} (Q^{*}, x)$ is produced.

The proof that this construction is $Δ_{a v g}^{2} (p o l y, l o g)$ -optimal is given in Appendix C.

Appendix A

Fix $Δ$ an error space of rank 2.

Theorem A.1

Suppose there is a polynomial $h : N^{2} \to N$ s.t. $h^{- 1} \in Δ$ . Consider $(f, μ)$ a distributional estimation problem and $(P, r, a)$ a $Δ (p o l y, l o g)$ -optimal predictor scheme for $(f, μ)$ . Suppose ${p_{k j} \in [0, 1]}_{k, j \in N}$ , ${q_{k j} \in [0, 1]}_{k, j \in N}$ are s.t.

$\exists ϵ > 0 \forall k, j : (μ^{k} \times U^{r (k, j)}) {(x, y) \in {0, 1}^{*}^{2} ∣ p_{k j} \leq P^{k j} (x, y, a^{k j}) \leq q_{k j}} \geq ϵ$

Define

$ϕ_{k j} := E_{μ^{k} \times U^{r (k, j)}} [f (x) - P^{k j} (x, y, a^{k j}) ∣ p_{k j} \leq P^{k j} (x, y, a^{k j}) \leq q_{k j}]$

Assume that either $p_{k j}, q_{k j}$ have a number of digits logarithmically bounded in $k, j$ or $P^{k j}$ produces outputs with a number of digits logarithmically bounded in $k, j$ (by Theorem A.7 if any $Δ (p o l y, l o g)$ -optimal predictor scheme exists for $(f, μ)$ then a $Δ (p o l y, l o g)$ -optimal predictor scheme with this property exists as well). Then, $| ϕ | \in Δ$ .

Theorem A.2

Consider $μ$ a word ensemble and $f_{1}, f_{2} : {0, 1}^{*} \to [0, 1]$ s.t. $f_{1} + f_{2} \leq 1$ . Suppose $(P_{1}, r_{1}, a_{1})$ is a $Δ (p o l y, l o g)$ -optimal predictor scheme for $(f_{1}, μ)$ and $(P_{2}, r_{2}, a_{2})$ is a $Δ (p o l y, l o g)$ -optimal predictor scheme for $(f_{2}, μ)$ . Define $P : N^{2} \times {0, 1}^{*}^{3} a l g - \to [0, 1]$ by $P^{k j} (x, y_{1} y_{2}, (z_{1}, z_{2})) := η (P_{1}^{k j} (x, y_{1}, z_{1}) + P_{2}^{k j} (x, y_{2}, z_{2}))$ for $| y_{i} | = r_{i} (k, j)$ . Then, $(P, r_{1} + r_{2}, a_{1} a_{2})$ is a $Δ (p o l y, l o g)$ -optimal predictor scheme for $(f_{1} + f_{2}, μ)$ .

Theorem A.3

Consider $μ$ a word ensemble and $f_{1}, f_{2} : {0, 1}^{*} \to [0, 1]$ s.t. $f_{1} + f_{2} \leq 1$ . Suppose $(P_{1}, r_{1}, a_{1})$ is a $Δ (p o l y, l o g)$ -optimal predictor scheme for $(f_{1}, μ)$ and $(P, r_{2}, a_{2})$ is a $Δ (p o l y, l o g)$ -optimal predictor scheme for $(f_{1} + f_{2}, μ)$ . Define $P_{2} : N^{2} \times {0, 1}^{*}^{3} a l g - \to [0, 1]$ by $P_{2}^{k j} (x, y_{1} y_{2}, (z_{1}, z_{2})) := η (P^{k j} (x, y_{1}, z_{1}) - P_{1}^{k j} (x, y_{2}, z_{2}))$ for $| y_{i} | = r_{i} (k, j)$ . Then, $(P_{2}, r_{1} + r_{2}, a_{1} a_{2})$ is a $Δ (p o l y, l o g)$ -optimal predictor scheme for $(f_{2}, μ)$ .

Theorem A.4

Fix $Δ^{1}$ an error space of rank 1 s.t. given $δ^{1} \in Δ^{1}$ , the function $δ (k, j) := δ^{1} (k)$ lies in $Δ$ . Consider $(f_{1}, μ_{1})$ , $(f_{2}, μ_{2})$ distributional estimation problems with respective $Δ (p o l y, l o g)$ -optimal predictor schemes $(P_{1}, r_{1}, a_{1})$ and $(P_{2}, r_{2}, a_{2})$ . Assume $μ_{1}$ is $Δ^{1} (l o g)$ -sampleable and $(f_{2}, μ_{2})$ is $Δ^{1} (l o g)$ -generatable. Define $f_{1} \times f_{2} : {0, 1}^{*} \to [0, 1]$ by $(f_{1} \times f_{2}) (x_{1}, x_{2}) = f_{1} (x_{1}) f_{2} (x_{2})$ and $(f_{1} \times f_{2}) (y) = 0$ for $y$ not of this form. Define $P : N^{2} \times {0, 1}^{*}^{3} a l g - \to [0, 1]$ by $P^{k j} ((x_{1}, x_{2}), y_{1} y_{2}, (z_{1}, z_{2})) := P_{1}^{k j} (x_{1}, y_{1}, z_{1}) P_{2}^{k j} (x_{2}, y_{2}, z_{2})$ for $| y_{i} | = r_{i} (k, j)$ . Then, $(P, r_{1} + r_{2}, (a_{1}, a_{2}))$ is a $Δ (p o l y, l o g)$ -optimal predictor scheme for $(f_{1} \times f_{2}, μ_{1} \times μ_{2})$ .

Theorem A.5

Consider $f : {0, 1}^{*} \to [0, 1]$ , $D \subseteq {0, 1}^{*}$ and $μ$ a word ensemble. Assume $(P_{D}, r_{D}, a_{D})$ is a $Δ (p o l y, l o g)$ -optimal predictor scheme for $(D, μ)$ and $(P_{f ∣ D}, r_{f ∣ D}, a_{f ∣ D})$ is a $Δ (p o l y, l o g)$ -optimal predictor scheme for $(f, μ ∣ D)$ . Define $P : N^{2} \times {0, 1}^{*}^{3} a l g - \to [0, 1]$ by $P^{k j} (x, y_{1} y_{2}, (z_{1}, z_{2})) := P_{D}^{k j} (x, y_{1}, z_{1}) P_{f ∣ D}^{k j} (x, y_{2}, z_{2})$ for $| y_{i} | = r_{i} (k, j)$ . Then $(P, r_{D} + r_{f ∣ D}, (a_{D}, a_{f ∣ D}))$ is a $Δ (p o l y, l o g)$ -optimal predictor scheme for $(χ_{D} f, μ)$ .

Theorem A.6

Fix $h$ a polynomial s.t. $2^{- h} \in Δ$ . Consider $f : {0, 1}^{*} \to [0, 1]$ , $D \subseteq {0, 1}^{*}$ and $μ$ a word ensemble. Assume $\exists ϵ > 0 \forall k : μ^{k} (D) \geq ϵ$ . Assume $(P_{D}, r_{D}, a_{D})$ is a $Δ (p o l y, l o g)$ -optimal predictor scheme for $(D, μ)$ and $(P_{χ_{D} f}, r_{χ_{D} f}, a_{χ_{D} f})$ is a $Δ (p o l y, l o g)$ -optimal predictor scheme for $(χ_{D} f, μ)$ . Define $P_{f ∣ D} : N^{2} \times {0, 1}^{*}^{3} a l g - \to [0, 1]$ by

$P_{f ∣ D}^{k j} (x, y_{1} y_{2}, (z_{1}, z_{2})) := ⎧ ⎪ ⎪ ⎨ ⎪ ⎪ ⎩ \begin{matrix} 1 & if P_{D}^{k j} (x, y_{2}, z_{2}) = 0 η (\frac{P_{χ_{D} f}^{k j} (x, y_{1}, z_{1})}{P_{D}^{k j} (x, y_{2}, z_{2})}) & rounded to h (k, j) binary places if P_{D}^{k j} (x, y_{2}, z_{2}) > 0 \end{matrix}$

Then, $(P_{f ∣ D}, r_{D} + r_{χ_{D} f}, (a_{χ_{D} f}, a_{D}))$ is a $Δ (p o l y, l o g)$ -optimal predictor scheme for $(f, μ ∣ D)$ .

Definition A.1

Consider $μ$ a word ensemble and ${^Q}_{1} := (Q_{1}, s_{1}, b_{1})$ , ${^Q}_{2} := (Q_{2}, s_{2}, b_{2})$ $(p o l y, l o g)$ -predictor schemes. We say ${^Q}_{1}$ is $Δ$ -similar to ${^Q}_{2}$ relative to $μ$ (denoted ${^Q}_{1} μ ≃ Δ {^Q}_{2}$ ) when $E_{μ^{k} \times U^{s_{1} (k, j)} \times U^{s_{2} (k, j)}} [(Q_{1}^{k j} (x, y_{1}, b_{1}^{k j}) - Q_{2}^{k j} (x, y_{2}, b_{2}^{k j}))^{2}] \in Δ$ .

Theorem A.7

Consider $(f, μ)$ a distributional estimation problem, $^P$ a $Δ (p o l y, l o g)$ -optimal predictor scheme for $(f, μ)$ and $^Q$ a $(p o l y, l o g)$ -predictor scheme. Then, $^Q$ is a $Δ (p o l y, l o g)$ -optimal predictor scheme for $(f, μ)$ if and only if $^P μ ≃ Δ^Q$ .

Note A.1

$Δ$ -similarity is not an equivalence relation on the set of arbitrary $(p o l y, l o g)$ -predictor schemes. However, it is an equivalence relation on the set of $(p o l y, l o g)$ -predictor schemes $^Q$ satisfying $^Q μ ≃ Δ^Q$ (i.e. the $μ$ -expectation value of the intrinsic variance of $^Q$ is in $Δ$ ). In particular, for any $f : {0, 1}^{*} \to [0, 1]$ any $Δ (p o l y, l o g)$ -optimal predictor scheme for $(f, μ)$ has this property.

Appendix B

Definition B.1

Given $n \in N$ , a function $δ : N^{2 + n} \to R^{\geq 0}$ is called $Δ$ -moderate when

(i) $δ$ is non-decreasing in arguments $3$ to $2 + n$ .

(ii) For any collection of polynomials ${p_{i} : N^{2} \to N}_{i < n}$ , $δ (k, j, p_{0} (k, j) \dots p_{n - 1} (k, j)) \in Δ$

Lemma B.1

Fix $(f, μ)$ a distributional estimation problem and $^P := (P, r, a)$ a $(p o l y, l o g)$ -predictor scheme. Then, $^P$ is $Δ (p o l y, l o g)$ -optimal iff there is a $Δ$ -moderate function $δ : N^{4} \to [0, 1]$ s.t. for any $k, j, s \in N$ , $Q : {0, 1}^{*}^{2} a l g - \to [0, 1]$

$E_{μ^{k} \times U^{r (k, j)}} [(P^{k j} (x, y, a^{k j}) - f (x))^{2}] \leq E_{μ^{k} \times U^{s}} [(Q (x, y) - f (x))^{2}] + δ (k, j, T_{Q}^{μ} (k, s), 2^{| Q |})$

Proof of Lemma B.1

Define

$δ (k, j, t, u) := max \begin{matrix} T_{Q}^{μ} (k, s) \leq t | Q | \leq log u \end{matrix} {E_{μ^{k} \times U^{r (k, j)}} [(P^{k j} (x, y, a^{k j}) - f (x))^{2}] - E_{μ^{k} \times U^{s}} [(Q (x, y) - f (x))^{2}]}$

Note B.1

Lemma B.1 shows that the error bound for $Δ (p o l y, l o g)$ -optimal predictor scheme is in some sense uniform with respect to $Q$ . This doesn't generalize to e.g. $Δ (p o l y, O (1))$ -optimal predictor schemes. The latter still admit a weaker version of Theorems A.1 and direct analogues of Theorems A.2, A.3, A.5, A.6 and A.7. Theorem A.4 doesn't seem to generalize.

Lemma B.2

Suppose there is a polynomial $h : N^{2} \to N$ s.t. $h^{- 1} \in Δ$ . Fix $(f, μ)$ a distributional estimation problem and $(P, r, a)$ a corresponding $Δ (p o l y, l o g)$ -optimal predictor scheme. Consider $(Q, s, b)$ a $(p o l y, l o g)$ -predictor scheme, $M > 0$ , $w : N^{2} \times {0, 1}^{*}^{3} a l g - \to Q \cap [0, M]$ with runtime bounded by a polynomial in the first two arguments, and $u : N^{2} \to {0, 1}^{*}$ of logarithmic size. Then there is $δ \in Δ$ s.t.

$E_{μ^{k} \times U^{max (r (k, j), s (k, j))}} [w^{k j} (x, y, u^{k j}) (P^{k j} (x, y_{\leq r (k, j)}, a^{k j}) - f (x))^{2}] \leq E_{μ^{k} \times U^{max (r (k, j), s (k, j))}} [w^{k j} (x, y, u^{k j}) (Q^{k j} (x, y_{\leq s (k, j)}, b^{k j}) - f (x))^{2}] + δ (k, j)$

Proof of Lemma B.2

Given $t \in [0, M]$ , define $α^{k j} (t)$ to be $t$ rounded within error $h (k, j)^{- 1}$ . Thus, the number of digits in $α^{k j} (t)$ is logarithmic in $k$ and $j$ . Denote $q (k, j) := max (r (k, j), s (k, j))$ . Consider ${^Q}_{t} := (Q_{t}, r + s, b_{t})$ the $(p o l y, l o g)$ -predictor scheme defined by

$Q_{t}^{k j} (x, y, b_{t}^{k j}) := {\begin{matrix} Q^{k j} (x, y_{\leq s (k, j)}, b^{k j}) & if w^{k j} (x, y_{\leq q (k, j)}, u^{k j}) \geq α^{k j} (t) P^{k j} (x, y_{\leq r (k, j)}, a^{k j}) & if w^{k j} (x, y_{\leq q (k, j)}, u^{k j}) < α^{k j} (t) \end{matrix}$

${^Q}_{t}$ satisfies bounds on runtime and advice size uniform in $t$ . Therefore, Lemma B.1 implies that there is $δ \in Δ$ s.t.

$E_{μ^{k} \times U^{r (k, j)}} [(P^{k j} (x, y, a^{k j}) - f (x))^{2}] \leq E_{μ^{k} \times U^{r (k, j) + s (k, j)}} [(Q_{t}^{k j} (x, y, b^{k j}) - f (x))^{2}] + δ (k, j)$

$E_{μ^{k} \times U^{r (k, j) + s (k, j)}} [(P^{k j} (x, y_{\leq r (k, j)}, a^{k j}) - f (x))^{2} - (Q_{t}^{k j} (x, y, b^{k j}) - f (x))^{2}] \leq δ (k, j)$

$E_{μ^{k} \times U^{q (k, j)}} [θ (w^{k j} (x, y, u^{k j}) - α^{k j} (t)) ((P^{k j} (x, y_{\leq r (k, j)}, a^{k j}) - f (x))^{2} - (Q^{k j} (x, y_{\leq s (k, j)}, b^{k j}) - f (x))^{2})] \leq δ (k, j)$

$E_{μ^{k} \times U^{q (k, j)}} [\int_{0}^{M} θ (w^{k j} (x, y, z, u^{k j}) - α^{k j} (t)) d t ((P^{k j} (x, y_{\leq r (k, j)}, a^{k j}) - f (x))^{2} - (Q^{k j} (x, y_{\leq s (k, j)}, b^{k j}) - f (x))^{2})] \leq M δ (k, j)$

$E_{μ^{k} \times U^{q (k, j)}} [w^{k j} (x, y, z, u^{k j}) ((P^{k j} (x, y_{\leq r (k, j)}, a^{k j}) - f (x))^{2} - (Q^{k j} (x, y_{\leq s (k, j)}, b^{k j}) - f (x))^{2})] \leq M δ (k, j) + h (k, j)^{- 1}$

In the following proofs we will use shorthand notations that omit most of the symbols that are clear for the context. That is, we will use $P$ to mean $P^{k j} (x, y, a^{k j})$ , $f$ to mean $f (x)$ , $E [\dots]$ to mean $E_{μ^{k} \times U^{r} (k, j)} [\dots]$ etc.

Proof of Theorem A.1

Define $w : N^{2} \times {0, 1}^{*}^{3} a l g - \to {0, 1}$ and $u : N^{2} \to {0, 1}^{*}$ by

$w := θ (P - p) θ (q - P)$

We have

$ϕ = \frac{E [w (f - P)]}{E [w]}$

Define $ψ$ to be $ϕ$ truncated to the first significant binary digit. Denote $I \subseteq N^{2}$ the set of $(k, j)$ for which $| ϕ_{k j} | > h (k, j)^{- 1}$ . Consider $(Q, s, b)$ a $(p o l y, l o g)$ -predictor scheme satisfying

$\forall (k, j) \in I : Q^{k j} = η (P^{k j} + ψ_{k j})$

Such $Q$ exists since for $(k, j) \in I$ , $ψ_{k j}$ has binary notation of logarithmically bounded size.

Applying Lemma B.2 we get

$\forall (k, j) \in I : E [w^{k j} (P^{k j} - f)^{2}] \leq E [w^{k j} (Q^{k j} - f)^{2}] + δ (k, j)$

for $δ \in Δ$ .

$\forall (k, j) \in I : E [w^{k j} ((P^{k j} - f)^{2} - (Q^{k j} - f)^{2})] \leq δ (k, j)$

$\forall (k, j) \in I : E [w^{k j} ((P^{k j} - f)^{2} - (η (P^{k j} + ψ_{k j}) - f)^{2})] \leq δ (k, j)$

Obviously $(η (P^{k j} + ψ_{k j}) - f)^{2} \leq (P^{k j} + ψ_{k j} - f)^{2}$ , therefore

$\forall (k, j) \in I : E [w^{k j} ((P^{k j} - f)^{2} - (P^{k j} + ψ_{k j} - f)^{2})] \leq δ (k, j)$

$\forall (k, j) \in I : ψ_{k j} E [w^{k j} (2 (f - P^{k j}) - ψ_{k j})] \leq δ (k, j)$

The expression on the left hand side is a quadratic polynomial in $ψ_{k j}$ which attains its maximum at $ϕ_{k j}$ and has roots at $0$ and $2 ϕ_{k j}$ . $ψ_{k j}$ is between $0$ and $ϕ_{k j}$ , but not closer to $0$ than $\frac{ϕ_{k j}}{2}$ . Therefore, the inequality is preserved if we replace $ψ_{k j}$ by $\frac{ϕ_{k j}}{2}$ .

$\forall (k, j) \in I : \frac{ϕ_{k j}}{2} E [w^{k j} (2 (f - P^{k j}) - \frac{ϕ_{k j}}{2})] \leq δ (k, j)$

Substituting the equation for $ϕ_{k j}$ we get

$\forall (k, j) \in I : \frac{1}{2} \frac{E [w^{k j} (f - P^{k j})]}{E [w^{k j}]} E [w^{k j} (2 (f - P^{k j}) - \frac{1}{2} \frac{E [w^{k j} (f - P^{k j})]}{E [w^{k j}]})] \leq δ (k, j)$

$\forall (k, j) \in I : \frac{3}{4} \frac{E [w^{k j} (f - P^{k j})]^{2}}{E [w^{k j}]} \leq δ (k, j)$

$\forall (k, j) \in I : \frac{3}{4} E [w^{k j}] ϕ_{k j}^{2} \leq δ (k, j)$

$\forall (k, j) \in I : ϕ_{k j}^{2} \leq \frac{4}{3} E [w^{k j}]^{- 1} δ (k, j)$

$\forall (k, j) \in I : ϕ_{k j}^{2} \leq \frac{4}{3} (μ^{k} \times U^{r (k, j)}) {p_{k j} \leq P^{k j} \leq q_{k j}}^{- 1} δ (k, j)$

Thus for all $k, j \in N$ we have

$| ϕ_{k j} | \leq h (k, j)^{- 1} + \sqrt{\frac{4}{3} (μ^{k} \times U^{r (k, j)}) {p_{k j} \leq P^{k j} \leq q_{k j}}^{- 1} δ (k, j)}$

In particular, $| ϕ | \in Δ$ .

Lemma B.3

Consider $(f, μ)$ a distributional estimation problem and $(P, r, a)$ a $Δ (p o l y, l o g)$ -optimal predictor scheme for $(f, μ)$ . Then there are $c_{1}, c_{2} \in R$ and a $Δ$ -moderate function $δ : N^{4} \to [0, 1]$ s.t. for any $k, j, s \in N$ , $Q : {0, 1}^{*}^{2} a l g - \to Q$

$| E_{μ^{k} \times U^{s} \times U^{r (k, j)}} [Q (P^{k j} - f)] | \leq (c_{1} + c_{2} E_{μ^{k} \times U^{s}} [Q^{2}]) δ (k, j, T_{Q}^{μ} (k, s), 2^{| Q |})$

Conversely, consider $M \in Q$ and $(P, r, a)$ a $Q \cap [- M, + M]$ -valued $(p o l y, l o g)$ -bischeme. Suppose that for any $Q \cap [- M - 1, + M]$ -valued $(p o l y, l o g)$ -bischeme $(Q, s, b)$ we have $| E [Q (P - f)] | \in Δ$ .

Define $~ P$ to be s.t. computing ${~ P}^{k j}$ is equivalent to computing $η (P^{k j})$ rounded to $h (k, j)$ digits after the binary point, where $2^{- h} \in Δ$ . Then, $~ P$ is a $Δ (p o l y, l o g)$ -optimal predictor scheme for $(f, μ)$ .

Proof of Lemma B.3

Assume $P$ is a $Δ (p o l y, l o g)$ -optimal predictor scheme. Consider $k, j, s \in N$ , $Q : {0, 1}^{*}^{2} a l g - \to Q$ . Define $t := σ 2^{- a}$ where $σ \in {\pm 1}$ and $a \in N$ . Define $R : {0, 1}^{*}^{2} a l g - \to [0, 1]$ to compute $η (P + t Q)$ rounded within error $2^{- h}$ . By Lemma B.1

$E_{μ^{k} \times U^{r (k, j)}} [(P^{k j} - f)^{2}] \leq E_{μ^{k} \times U^{r (k, j)} \times U^{s}} [(R - f)^{2}] + ~ δ (k, j, T_{R}^{μ} (k, r (k, j) + s), 2^{| R |})$

where $~ δ$ is $Δ$ -moderate. It follows that

$E_{μ^{k} \times U^{r (k, j)}} [(P^{k j} - f)^{2}] \leq E_{μ^{k} \times U^{r (k, j)} \times U^{s}} [(η (P + t Q) - f)^{2}] + δ (k, j, T_{Q}^{μ} (k, s), 2^{| Q |})$

where $δ$ is $Δ$ -moderate ( $a$ doesn't enter the error bound because of the $2^{- h}$ rounding). As in the proof of Theorem A.1, $η$ can be dropped.

$E_{μ^{k} \times U^{r (k, j)} \times U^{s}} [(P^{k j} - f)^{2} - (P^{k j} + t Q - f)^{2}] \leq δ (k, j, T_{Q}^{μ} (k, s), 2^{| Q |})$

The expression on the left hand side is a quadratic polynomial in $t$ . Explicitly:

$- E_{μ^{k} \times U^{s}} [Q^{2}] t^{2} - 2 E_{μ^{k} \times U^{r (k, j)} \times U^{s}} [Q (P^{k j} - f)] t \leq δ (k, j, T_{Q}^{μ} (k, s), 2^{| Q |})$

Moving $E [Q] t^{2}$ to the right hand side and dividing both sides by $2 | t | = 2^{1 - a}$ we get

$- E_{μ^{k} \times U^{r (k, j)} \times U^{s}} [Q (P^{k j} - f)] σ \leq 2^{a - 1} δ (k, j, T_{Q}^{μ} (k, s), 2^{| Q |}) + E_{μ^{k} \times U^{s}} [Q^{2}] 2^{- a - 1}$

$| E_{μ^{k} \times U^{r (k, j)} \times U^{s}} [Q (P^{k j} - f))] | \leq 2^{a - 1} δ (k, j, T_{Q}^{μ} (k, s), 2^{| Q |}) + E_{μ^{k} \times U^{s}} [Q^{2}] 2^{- a - 1}$

Take $a := - \frac{1}{2} log δ (k, j, T_{Q}^{μ} (k, s), 2^{| Q |}) + ϕ (k, j)$ where $ϕ (k, j) \in [- \frac{1}{2}, + \frac{1}{2}]$ is the rounding error. We get

$| E_{μ^{k} \times U^{r (k, j)} \times U^{s}} [Q (P^{k j} - f)] | \leq 2^{ϕ (k, j) - 1} δ (k, j, T_{Q}^{μ} (k, s), 2^{| Q |})^{\frac{1}{2}} + E_{μ^{k} \times U^{s}} [Q^{2}] 2^{- ϕ (k, j) - 1} δ (k, j, T_{Q}^{μ} (k, s), 2^{| Q |})^{\frac{1}{2}}$

Conversely, assume that for any $Q \cap [- M - 1, + M]$ -valued $(p o l y, l o g)$ -bischeme $(R, t, c)$

$| E [R (P - f)] | \leq δ$

Consider $(Q, b, s)$ a $(p o l y, l o g)$ -predictor scheme. We have

$E [(Q - f)^{2}] = E [(Q - P + P - f)^{2}]$

$E [(Q - f)^{2}] = E [(Q - P)^{2}] + E [(P - f)^{2}] + 2 E [(Q - P) (P - f)]$

$2 E [(P - Q) (P - f)] = E [(P - f)^{2}] - E [(Q - f)^{2}] + E [(Q - P)^{2}]$

Taking $R$ to be $P - Q$ we get

$E [(P - f)^{2}] - E [(Q - f)^{2}] + E [(Q - P))^{2}] \leq δ$

where $δ \in Δ$ . Noting that $E [(Q - P)^{2}] \geq 0$ and $(η (P) - f)^{2} \leq (P - f)^{2}$ we get

$E [(η (P) - f)^{2}] - E [(Q - f)^{2}] \leq δ$

Observing that $~ P - η (P)$ is bounded by a function in $Δ$ , we get the desired result.

Theorems A.2 and A.3 follow trivially from Lemma B.3 and we omit the proof.

Proof of Theorem A.4

We have

$P (x_{1}, x_{2}) - (f_{1} \times f_{2}) (x_{1}, x_{2}) = (P_{1} (x_{1}) - f_{1} (x_{1})) f_{2} (x_{2}) + P_{1} (x_{1}) (P_{2} (x_{2}) - f_{2} (x_{2}))$

Therefore, for any $Q \cap [- 1, + 1]$ -valued $(p o l y, l o g)$ -bischeme $(Q, s, b)$

$| E [Q (P - f_{1} \times f_{2})] | \leq | E [Q (x_{1}, x_{2}) (P_{1} (x_{1}) - f_{1} (x_{1})) f_{2} (x_{2})] | + | E [Q (x_{1}, x_{2}) P_{1} (x_{1}) (P_{2} (x_{2}) - f_{2} (x_{2}))] |$

By Lemma B.3, it is sufficient to show an appropriate bound for each of the terms on the right hand side. Suppose $(S_{2}, F_{2}, r_{2}^{S}, a_{2}^{S})$ is a $Δ^{1} (l o g)$ -generator for $(f_{2}, μ_{2})$ . For the first term, we have

$| E_{μ_{1}^{k} \times μ_{2}^{k} \times U^{s (k, j) + r_{1} (k, j)}} [Q^{k j} (x_{1}, x_{2}) (P_{1}^{k j} (x_{1}) - f_{1} (x_{1})) f_{2} (x_{2})] | \leq | E_{μ_{1}^{k} \times U^{r_{2}^{S} (k)} \times U^{s (k, j) + r_{1} (k, j)}} [Q^{k j} (x_{1}, S_{2}^{k}) (P_{1}^{k j} (x_{1}) - f_{1} (x_{1})) F_{2}^{k}] | + δ_{2}^{1} (k)$

where $δ_{2}^{1} \in Δ^{1}$ . Applying Lemma B.3 for $P_{1}$ , we get

$| E_{μ_{1}^{k} \times μ_{2}^{k} \times U^{s (k, j) + r_{1} (k, j)}} [Q^{k j} (x_{1}, x_{2}) (P_{1}^{k j} (x_{1}) - f_{1} (x_{1})) f_{2} (x_{2})] | \leq δ_{1} (k, j) + δ_{2}^{1} (k)$

where $δ_{1} \in Δ$ .

Suppose $(S_{1}, r_{1}^{S}, a_{1}^{S})$ is a $Δ^{1} (l o g)$ -sampler for $μ_{1}$ . For the second term, we have

$| E_{μ_{1}^{k} \times μ_{2}^{k} \times U^{s (k, j) + r_{1} (k, j)}} [Q^{k j} (x_{1}, x_{2}) P_{1} (x_{1}) (P_{2}^{k j} (x_{2}) - f_{2} (x_{2}))] | \leq | E_{U^{r_{1}^{S} (k)} \times μ_{2}^{k} \times U^{s (k, j) + r_{1} (k, j)}} [Q^{k j} (S_{1}^{k}, x_{2}) P_{1} (S_{1}^{k}) (P_{2}^{k j} (x_{2}) - f_{2} (x_{2}))] | + δ_{1}^{1} (k)$

where $δ_{1}^{1} \in Δ^{1}$ . Applying Lemma B.3 for $P_{2}$ , we get

$| E_{μ_{1}^{k} \times μ_{2}^{k} \times U^{s (k, j) + r_{1} (k, j)}} [Q^{k j} (x_{1}, x_{2}) P_{1} (x_{1}) (P_{2}^{k j} (x_{2}) - f_{2} (x_{2}))] | \leq δ_{2} (k, j) + δ_{1}^{1} (k)$

where $δ_{2} \in Δ$ . Again, we got the required bound.

Appendix C

Proposition C.1

Consider a polynomial $q : N^{2} \to N$ . There is a function $λ_{q} : N^{3} \to [0, 1]$ s.t.

(i) $\forall k, j \in N : \sum i \in N λ_{q} (k, j, i) = 1$

(ii) For any function $ϵ : N^{2} \to [0, 1]$ we have

$ϵ (k, j) - \sum i \in N λ_{q} (k, j, i) ϵ (k, q (k, j) + i) \in Δ_{a v g}^{2}$

Proof of Proposition C.1

Given functions $q_{1}, q_{2} : N^{2} \to N$ s.t. $q_{1} (k, j) \geq q_{2} (k, j)$ for $k, j ≫ 0$ , the proposition for $q_{1}$ implies the proposition for $q_{2}$ by setting

$λ_{q_{2}} (k, j, i) := {\begin{matrix} λ_{q_{1}} (k, j, i - q_{1} (k, j) + q_{2} (k, j)) & if i - q_{1} (k, j) + q_{2} (k, j) \geq 0 0 & if i - q_{1} (k, j) + q_{2} (k, j) < 0 \end{matrix}$

Therefore, it is enough to prove to proposition for functions of the form $q (k, j) = j^{m + \frac{n log k}{log 3}}$ for $m > 0$ .

Consider $F : N \to N$ s.t.

$lim k \to \infty \frac{log log k}{log log F (k)} = 0$

Observe that

$lim k \to \infty \frac{log (m + \frac{n log k}{log 3})}{log log F (k) - log log 3} = 0$

$lim k \to \infty \frac{3^{m + \frac{n log k}{log 3}} \int x = 3 d (log log x)}{log log F (k) - log log 3} = 0$

Since $ϵ$ takes values in $[0, 1]$

$lim k \to \infty \frac{3^{m + \frac{n log k}{log 3}} \int x = 3 ϵ (k, ⌊ x ⌋) d (log log x)}{log log F (k) - log log 3} = 0$

Similarly

$lim k \to \infty \frac{F (k)^{m + \frac{n log k}{log 3}} \int x = F (k) ϵ (k, ⌊ x ⌋) d (log log x)}{log log F (k) - log log 3} = 0$

The last two equations imply that

$lim k \to \infty \frac{F (k) \int x = 3 ϵ (k, ⌊ x ⌋) d (log log x) - F (k)^{m + \frac{n log k}{log 3}} \int x = 3^{m + \frac{n log k}{log 3}} ϵ (k, ⌊ x ⌋) d (log log x)}{log log F (k) - log log 3} = 0$

Raising $x$ to a power is equivalent to adding a constant to $log log x$ , therefore

$lim k \to \infty \frac{F (k) \int x = 3 ϵ (k, ⌊ x ⌋) d (log log x) - F (k) \int x = 3 ϵ (k, ⌊ x^{m + \frac{n log k}{log 3}} ⌋) d (log log x)}{log log F (k) - log log 3} = 0$

$lim k \to \infty \frac{F (k) \int x = 3 (ϵ (k, ⌊ x ⌋) - ϵ (k, ⌊ x^{m + \frac{n log k}{log 3}} ⌋)) d (log log x)}{log log F (k) - log log 3} = 0$

Since $⌊ x^{m + \frac{n log k}{log 3}} ⌋ \geq ⌊ x ⌋^{m + \frac{n log k}{log 3}}$ we can choose $λ_{q}$ satisfying condition (i) so that

$j + 1 \int x = j ϵ (k, ⌊ x^{m + \frac{n log k}{log 3}} ⌋) d (log log x) = (log log (j + 1) - log log j) \sum i λ_{q} (k, j, i) ϵ (k, j^{m + \frac{n log k}{log 3}} + i)$

It follows that

$j + 1 \int x = j ϵ (k, ⌊ x^{m + \frac{n log k}{log 3}} ⌋) d (log log x) = j + 1 \int x = j \sum i λ_{q} (k, ⌊ x ⌋, i) ϵ (k, ⌊ x ⌋^{m + \frac{n log k}{log 3}} + i) d (log log x)$

$lim k \to \infty \frac{F (k) \int x = 3 (ϵ (k, ⌊ x ⌋) - \sum_{i} λ_{q} (k, ⌊ x ⌋, i) ϵ (k, ⌊ x ⌋^{m + \frac{n log k}{log 3}} + i)) d (log log x)}{log log F (k) - log log 3} = 0$

$lim k \to \infty \frac{\sum_{j = 3}^{F (k) - 1} (log log (j + 1) - log log j) (ϵ (k, j) - \sum_{i} λ_{q} (k, j, i) ϵ (k, j^{m + \frac{n log k}{log 3}} + i))}{log log F (k) - log log 3} = 0$

$ϵ (k, j) - \sum i \in N λ_{q} (k, j, i) ϵ (k, q (k, j) + i) \in Δ_{a v g}^{2}$

Lemma C.1

Consider $(f, μ)$ a distributional estimation problem, $(P, r, a)$ , $(Q, s, b)$ $(p o l y, l o g)$ -predictor schemes. Suppose $p : N^{2} \to N$ a polynomial and $δ \in Δ_{a v g}^{2}$ are s.t.

$\forall i, k, j \in N : E [(P^{k, p (k, j) + i} - f)^{2}] \leq E [(Q^{k j} - f)^{2}] + δ (k, j)$

Then $\exists δ^{'} \in Δ_{a v g}^{2}$ s.t.

$E [(P^{k j} - f)^{2}] \leq E [(Q^{k j} - f)^{2}] + δ^{'} (k, j)$

Proof of Lemma C.1

By Proposition C.1 we have

$~ δ (k, j) := E [(P^{k j} - f)^{2}] - \sum i λ_{p} (k, j, i) E [(P^{k, p (k, j) + i} - f)^{2}] \in Δ_{a v g}^{2}$

$E [(P^{k j} - f)^{2}] = \sum i λ_{p} (k, j, i) E [(P^{k, p (k, j) + i} - f)^{2}] + ~ δ (k, j)$

$E [(P^{k j} - f)^{2}] \leq \sum i λ_{p} (k, j, i) (E [(Q^{k j} - f)^{2}] + δ (k, j)) + ~ δ (k, j)$

$E [(P^{k j} - f)^{2}] \leq E [(Q^{k j} - f)^{2}] + δ (k, j) + ~ δ (k, j)$

Proof of Theorem 1

Define $ϵ (k, j)$ by

$ϵ (k, j) := E_{μ^{k} \times U^{j}} [(Υ^{k j} (x, y, υ_{f, μ}^{k j}) - f (x))^{2}]$

It is easily seen that

$ϵ (k, j) \leq min \begin{matrix} | Q | \leq log j T_{Q}^{μ} (k, j) \leq j \end{matrix} E_{μ^{k} \times U^{j}} [(Q (x, y) - f (x))^{2}]$

Therefore, there is a polynomial $p : N^{3} \to N$ s.t. for any $(p o l y, l o g)$ -predictor scheme $(Q, s, b)$

$\forall i, j, k \in N : ϵ (k, p (s (k, j), T_{Q^{k j}}^{μ} (k, s (k, j)), 2^{| Q | + | b^{k j} |}) + i) \leq E_{μ^{k} \times U^{s (k, j)}} [(Q^{k j} - f)^{2}]$

Applying Lemma C.1, we get the desired result.

Proof of Theorem 2

Consider $(P, r, a)$ a $(p o l y, l o g)$ -predictor scheme. Choose $p : N^{2} \to N$ a polynomial s.t. evaluating $Λ [G]^{k, p (k, j)}$ involves running $P^{k j}$ until it halts "naturally" (such $p$ exists because $P$ runs in at most polynomial time and has at most logarithmic advice). Given $i, j, k \in N$ , consider the execution of $Λ [G]^{k, p (k, j) + i}$ . The standard deviation of $ϵ (P^{k j})$ with respect to the internal coin tosses of $Λ$ is at most $((p (k, j) + i) k)^{- \frac{1}{2}}$ . The expectation value is $E [(P^{k j} - f)^{2}] + γ_{P}$ where $| γ_{P} | \leq δ (k)$ for $δ \in Δ_{0}^{1}$ which doesn't depend on $i, k, j, P$ . By Chebyshev's inequality,

$P r [ϵ (P^{k j}) \geq E [(P^{k j} - f)^{2}] + δ (k) + ((p (k, j) + i) k)^{- \frac{1}{4}}] \leq ((p (k, j) + i) k)^{- \frac{1}{2}}$

Hence

$P r [ϵ (Q^{*}) \geq E [(P^{k j} - f)^{2}] + δ (k) + ((p (k, j) + i) k)^{- \frac{1}{4}}] \leq ((p (k, j) + i) k)^{- \frac{1}{2}}$

The standard deviation of $ϵ (Q)$ for any $Q$ is also at most $((p (k, j) + i) k)^{- \frac{1}{2}}$ . The expectation value is $E [(e v^{p (k, j) + i} (Q) - f)^{2}] + γ_{Q}$ where $| γ_{Q} | \leq δ (k)$ . Therefore

$P r [\exists Q < p (k, j) + i : ϵ (Q) \leq E [(e v^{p (k, j) + i} (Q) - f)^{2}] - δ (k) - k^{- \frac{1}{4}}] \leq (p (k, j) + i) (p (k, j) + i)^{- 1} k^{- \frac{1}{2}} = k^{- \frac{1}{2}}$

The extra $p (k, j) + i$ factor comes from summing probabilities over $p (k, j) + i$ programs. Combining we get

$P r [E [(e v^{p (k, j) + i} (Q^{*}) - f)^{2}] \geq E [(P^{k j} - f)^{2}] + 2 δ (k) + ((p (k, j) + i)^{- \frac{1}{4}} + 1) k^{- \frac{1}{4}}] \leq ((p (k, j) + i)^{- \frac{1}{2}} + 1) k^{- \frac{1}{2}}$

$E [(Λ [G]^{k, p (k, j) + i} - f)^{2}] \leq E [(P^{k j} - f)^{2}] + 2 δ (k) + ((p (k, j) + i)^{- \frac{1}{4}} + 1) k^{- \frac{1}{4}} + ((p (k, j) + i)^{- \frac{1}{2}} + 1) k^{- \frac{1}{2}}$

$E [(Λ [G]^{k, p (k, j) + i} - f)^{2}] \leq E [(P^{k j} - f)^{2}] + 2 δ (k) + (p (k, j)^{- \frac{1}{4}} + 1) k^{- \frac{1}{4}} + (p (k, j)^{- \frac{1}{2}} + 1) k^{- \frac{1}{2}}$

Applying Lemma C.1 we get the desired result.