User Comment Replies — AI Alignment Forum

Debate, Oracles, and Obfuscated Arguments

2mo21

This can easily be done in the cryptographic example above: B can sample a new number $y = p^{'} \cdot q^{'}$ , and then present $y$ to a fresh copy of A that has not seen the transcript for $x$ so far.

I don't understand how this is supposed to help. I guess the point is to somehow catch a fresh copy of A in a lie about a problem that is different from the original problem, and conclude that A is the dishonest debater?

But couldn't A just answer "I don't know"?

Even if it is a fresh copy, it would notice that it does not know the secret factors, so it could display different ... (read more)

2Geoffrey Irving2mo

You'd need some coupling argument to know that the problems have related difficulty, so that if A is constantly saying "I don't know" to other similar problems it counts as evidence that A can't reliably know the answer to this one. But to be clear, we don't know how to make this particular protocol go through, since we don't know how to formalise that kind of similarity assumption in a plausibly useful way. We do know a different protocol with better properties (coming soon).

Vanessa Kosoy's Shortform

harfe

2mo92

I think there are some subtleties with the (non-infra) bayesian VNM version, which come down to the difference between "extreme point" and "exposed point" of $D$ . If a point is an extreme point that is not an exposed point, then it cannot be the unique expected utility maximizer under a utility function (but it can be a non-unique maximizer).

For extreme points it might still work with uniqueness, if, instead of a VNM-decision-maker, we require a slightly weaker decision maker whose preferences satisfy the VNM axioms except continuity.

4Vanessa Kosoy2mo

Another excellent catch, kudos. I've really been sloppy with this shortform. I corrected it to say that we can approximate the system arbitrarily well by VNM decision-makers. Although, I think it's also possible to argue that a system that selects a non-exposed point is not quite maximally influential, because it's selecting somethings that's very close to delegating some decision power to chance. Also, maybe this cannot happen when D is the inverse limit of finite sets? (As is the case in sequential decision making with finite action/observation spaces). I'm not sure.

Vanessa Kosoy's Shortform

harfe

2mo82

For any $Φ, Ψ \in^D$ , if $Θ^{*} = Φ \lor Ψ$ then either $Φ \subseteq Ψ$ or $Ψ \subseteq Φ$ .

I think this condition might be too weak and the conjecture is not true under this definition.

If $Φ_{1} \subseteq Φ_{2}$ , then we have $E_{y \sim ξ} {min}_{μ \in Φ_{2}} E_{x \sim μ} u (x, y) \leq E_{y \sim ξ} {min}_{μ \in Φ_{1}} E_{x \sim μ} u (x, y)$ (because a minimum over a larger set is smaller). Thus, $Φ_{2}$ can only be the unique argmax if $Φ_{1} = Φ_{2}$ .

Consider the example $^D={[0,x]:x∈[0,1]}$ . Then $^D$ is closed. And $Θ^{*} = [0, 1]$ satisfies $Θ^{*} = Φ \lor Ψ ⟹ Φ \subseteq Ψ \lor Ψ \subseteq Φ$ . But per the above it cannot be a unique maximizer.

Maybe the issue can be fixed if we strengthen the condition so that $Φ^{*}$ has to be also minimal with res... (read more)

3Vanessa Kosoy2mo

You're absolutely right, good job! I fixed the OP.

The Learning-Theoretic Agenda: Status 2023

harfe

1y30

Regarding direction 17: There might be some potential drawbacks to ADAM. I think its possible that some very agentic programs have relatively low $g$ score. This is due to explicit optimization algorithms being low complexity.

(Disclaimer: the following argument is not a proof, and appeals to some heuristics/etc. We fix $M = M_{0}$ for these considerations too.) Consider an utility function $^U$ . Further, consider a computable approximation of the optimal policy (AIXI that explicitly optimizes for $^U$ ) and has an approximation parameter n (this could be AIXI-tl, plus s... (read more)

3Vanessa Kosoy1y

Yes, this is an important point, of which I am well aware. This is why I expect unbounded-ADAM to only be a toy model. A more realistic ADAM would use a complexity measure that takes computational complexity into account instead of K. For example, you can look at the measure C I defined here. More realistically, this measure should be based on the frugal universal prior.

Buck's Shortform

harfe

1y*20

I am going to assume that in the code, when calculating p_alice_win_given_not_caught, we do not divide the $λ_{2}$ term by two (since this is not that consistent with the description. I am also assuming that $0.25 - t_{2}$ is a typo and $0.5 - t_{2}$ is meant, which would also be more consistent with other stuff). So I am going to assume assume a symmetrical version.

Here, P(Alice wins) is $e^{- λ_{1} t_{1} - λ_{2} t_{2}} - e^{- λ_{1} / 2 - λ_{2} / 2} =: f (λ_{1}, λ_{2}, t_{1}, t_{2})$ . Wlog we can assume $0 < t < 1$ (otherwise Bob will run everything or nothing in shielded mode).

We claim that $((λ_{1}^{*}, λ_{2}^{*}), (t_{1}^{*}, t_{2}^{*})) := ((μ^{*}, μ^{*}), (t / 2, t /$ ... (read more)

2Buck Shlegeris1y

Thanks heaps! I wanted the asymmetric version but concurrently with your comment I figured out how to convert from the asymmetric version to the symmetric version. I'll credit you when I post the writeup that this is a part of, and I might run a draft by you in case you want to point out any errors. :)

Thoughts on hardware / compute requirements for AGI

harfe

2y31

Nanotech industry-rebuilding comes earlier than von Neumann level? I doubt that. A lot of existing people are close to von Neumann level.

Maybe your argument is that there will be so many AGIs, that they can do Nanotech industry rebuilding while individually being very dumb. But I would then argue that the collective already exceeds von Neumann or large groups of humans in intelligence.

1Vladimir Nesov2y

The argument is that once there is an AGI at IQ 130-150 level (not "very dumb", but hardly von Neumann), that's sufficient to autonomously accelerate research using the fact that AGIs have much higher serial speed than humans. This can continue for a long enough time to access research from very distant future, including nanotech for building much better AGI hardware at scale. There is no need for stronger intelligence in order to get there. The motivation for this to happen is the AI safety concern with allowing cognition that's more dangerous than necessary, and any non-straightforward improvements to how AGI thinks create such danger. For LLM-based AGIs, anchoring to human level that's available in the training corpus seems more plausible than for other kinds of AGIs (so that improvement in capability would become less than absolutely straightforward specifically at human level). If AGIs have an opportunity to prevent this AI safety risk, they might be motivated to take that opportinity, which would result in intentional significant delay in further improvement of AGI capabilities. I'm not saying that this is an intuitively self-evident claim, there is a specific reason I'm giving for why I see it as plausible. Even when there is a technical capability to build giant AGIs the size of cities, there is still the necessary intermediate of motive in bridging the gap from capability to actuality.

AI ALIGNMENT FORUM
AF

All of harfe's Comments + Replies