harfe - AI Alignment Forum

Debate, Oracles, and Obfuscated Arguments

This can easily be done in the cryptographic example above: B can sample a new number , and then present $y$ to a fresh copy of A that has not seen the transcript for $x$ so far.

I don't understand how this is supposed to help. I guess the point is to somehow catch a fresh copy of A in a lie about a problem that is different from the original problem, and conclude that A is the dishonest debater?

But couldn't A just answer "I don't know"?

Even if it is a fresh copy, it would notice that it does not know the secret factors, so it could display different behavior than in the $x$ case where A knows the secret factors $p, q$ .

Vanessa Kosoy's Shortform

harfe2mo92

I think there are some subtleties with the (non-infra) bayesian VNM version, which come down to the difference between "extreme point" and "exposed point" of . If a point is an extreme point that is not an exposed point, then it cannot be the unique expected utility maximizer under a utility function (but it can be a non-unique maximizer).

For extreme points it might still work with uniqueness, if, instead of a VNM-decision-maker, we require a slightly weaker decision maker whose preferences satisfy the VNM axioms except continuity.

Vanessa Kosoy's Shortform

harfe2mo82

For any , if $Θ^{*} = Φ \lor Ψ$ then either $Φ \subseteq Ψ$ or $Ψ \subseteq Φ$ .

I think this condition might be too weak and the conjecture is not true under this definition.

If $Φ_{1} \subseteq Φ_{2}$ , then we have $E_{y \sim ξ} {min}_{μ \in Φ_{2}} E_{x \sim μ} u (x, y) \leq E_{y \sim ξ} {min}_{μ \in Φ_{1}} E_{x \sim μ} u (x, y)$ (because a minimum over a larger set is smaller). Thus, $Φ_{2}$ can only be the unique argmax if $Φ_{1} = Φ_{2}$ .

Consider the example $^D={[0,x]:x∈[0,1]}$ . Then $^D$ is closed. And $Θ^{*} = [0, 1]$ satisfies $Θ^{*} = Φ \lor Ψ ⟹ Φ \subseteq Ψ \lor Ψ \subseteq Φ$ . But per the above it cannot be a unique maximizer.

Maybe the issue can be fixed if we strengthen the condition so that $Φ^{*}$ has to be also minimal with respect to $\subseteq$ .

The Learning-Theoretic Agenda: Status 2023

harfe1y30

Regarding direction 17: There might be some potential drawbacks to ADAM. I think its possible that some very agentic programs have relatively low score. This is due to explicit optimization algorithms being low complexity.

(Disclaimer: the following argument is not a proof, and appeals to some heuristics/etc. We fix $M = M_{0}$ for these considerations too.) Consider an utility function $^U$ . Further, consider a computable approximation of the optimal policy (AIXI that explicitly optimizes for $^U$ ) and has an approximation parameter n (this could be AIXI-tl, plus some approximation of $^U$ ; higher $n$ is better approximation). We will call this approximation of the optimal policy $π_{n}^{^U}$ . This approximation algorithm has complexity $K (π_{n}^{^U}) = C + K (^U) + K (n)$ , where $C > 0$ is a constant needed to describe the general algorithm (this should not be too large).

We can get better approximation by using a quickly growing function, such as the Ackermann function with $n = A (k, k)$ . Then we have $K (π_{A (k, k)}^{^U}) = C + K (^U) + K (A (k, k)) \leq C + K (^U) + log (k)$ .

What is the $g$ score of this policy? We have $g (π_{A (k, k)}^{^U}) = {max}_{U} ({min}_{π^{'} : \dots} K (π^{'}) - K (U))$ . Let $¯ U$ be maximal in this expression. If $K (¯ U) \geq K (^U) - C$ , then $g (π_{A (k, k)}^{^U}) = min π^{'} : E_{ζ_{M_{0}} π^{'}} (¯ U) \geq E_{ζ_{M_{0}} π_{A (k, k)}^{^U}} (¯ U) K (π^{'}) - K (¯ U) \leq K (π_{A (k, k)}^{^U}) - K (^U) + C \leq 2 C log (k)$ .

For the other case, let us assume that if $K (¯ U) < K (^U) - C$ , the policy $π_{A (k, k)}^{¯ U}$ is at least as good at maximizing $¯ U$ than $π_{A (k, k)}^{^U)}$ . Then, we have $g (π_{A (k, k)}^{^U}) = min π^{'} : E_{ζ_{M_{0}} π^{'}} (¯ U) \geq E_{ζ_{M_{0}} π_{A (k, k)}^{^U}} (¯ U) K (π^{'}) - K (¯ U) \leq K (π_{A (k, k)}^{¯ U}) - K (¯ U)) \leq C + log (k)$ .

I don't think that the assumption ( $(π_{A (k, k)}^{¯ U}$ maximizes $b a r U$ better than $(π_{A (k, k)}^{^U}$ ) is true for all $^U$ and $k$ , but plausibly we can select $^U$ such that this is the case (exceptions, if they exist, would be a bit weird, and if ADAM working well due to these weird exceptions feels a bit disappointing to me). A thing that is not captured by approximations such as AIXI-tl are programs that halt but have insane runtime (longer than $A (k, k)$ ). Again, it would feel weird to me if ADAM sort of works because of low-complexity extremely-long-running halting programs.

To summarize, maybe there exist policies $π_{A (k, k)}^{^U}$ which strongly optimize a non-trivial utility function $^U$ with approximation parameter $n = A (k, k)$ , but where $g (π_{A (k, k)}^{^U}) \leq 2 C + log (k)$ is relatively small.

Buck's Shortform

harfe1y*20

I am going to assume that in the code, when calculating p_alice_win_given_not_caught, we do not divide the term by two (since this is not that consistent with the description. I am also assuming that $0.25 - t_{2}$ is a typo and $0.5 - t_{2}$ is meant, which would also be more consistent with other stuff). So I am going to assume assume a symmetrical version.

Here, P(Alice wins) is $e^{- λ_{1} t_{1} - λ_{2} t_{2}} - e^{- λ_{1} / 2 - λ_{2} / 2} =: f (λ_{1}, λ_{2}, t_{1}, t_{2})$ . Wlog we can assume $0 < t < 1$ (otherwise Bob will run everything or nothing in shielded mode).

We claim that $((λ_{1}^{*}, λ_{2}^{*}), (t_{1}^{*}, t_{2}^{*})) := ((μ^{*}, μ^{*}), (t / 2, t / 2)$ is a (pure) Nash equilibrium, where $μ^{*} = log t / (t - 1)$ .

To verify, lets first show that Alice cannot make a better choice if Bob plays $(t_{1}^{*}, t_{2}^{*}) = (t / 2, t / 2)$ . We have $f (λ_{1}, λ_{2}, t_{1}^{*}, t_{2}^{*}) = e^{- (λ_{1} + λ_{2}) t / 2} - e^{- (λ_{1} + λ_{2}) / 2}$ . Since this only depends on the sum, we can make the substitution $μ = (λ_{1} + λ_{2}) / 2$ . Thus, we want to maximize $~ f (μ, t) = e^{- μ t} - e^{- μ}$ . We have ${~ f}^{'} (μ, t) = - t e^{- μ t} + e^{- μ} = 0$ . Rearranging, we get $t e^{- μ t} = e^{- μ}$ . Taking logs, we get $log t - μ t = - μ$ . Rearranging, we get $μ = log t / (t - 1)$ . Thus, $μ = μ^{*}$ is the optimal choice. This means, that if Bob sticks to his strategy, Alice cannot do better than $f (μ^{*}, μ^{*}, t / 2, t / 2)$ .

Now, lets show that Bob cannot do better. We have $f (λ_{1}^{*}, λ_{2}^{*}, t_{1}, t_{2}) = e^{- μ^{*} t_{1} - μ^{*} t_{2}} - e^{- μ^{*} / 2 - μ^{*} / 2} = e^{- μ^{*} t} - e^{- μ^{*}}$ . This does not depend on $t_{1}$ and $t_{2}$ anymore, so any choice of $t_{1}$ and $t_{2}$ is optimal if Alice plays $(μ^{*}, μ^{*})$ .

(If I picked the wrong version of the question, and you actually want some symmetry: I suspect that the solution will have similarities, or that in some cases the solution can be obtained by rescaling the problem back into a more symmetric form.)

Thoughts on hardware / compute requirements for AGI

harfe2y31

Nanotech industry-rebuilding comes earlier than von Neumann level? I doubt that. A lot of existing people are close to von Neumann level.

Maybe your argument is that there will be so many AGIs, that they can do Nanotech industry rebuilding while individually being very dumb. But I would then argue that the collective already exceeds von Neumann or large groups of humans in intelligence.

AI ALIGNMENT FORUM
AF

Posts

Wikitag Contributions

Comments