Posts

Sorted by New

0Dagon's Shortform

Wikitag Contributions

Comments

Sorted by

Newest

ryan_greenblatt's Shortform

Dagon3mo20

Thanks for this - I'm in a more peripheral part of the industry (consumer/industrial LLM usage, not directly at an AI lab), and my timelines are somewhat longer (5 years for 50% chance), but I may be using a different criterion for "automate virtually all remote workers". It'll be a fair bit of time (in AI frame - a year or ten) between "labs show generality sufficient to automate most remote work" and "most remote work is actually performed by AI".

Why I hate the "accident vs. misuse" AI x-risk dichotomy (quick thoughts on "structural risk")

Dagon2y*52

The distinction between "accidental" and "negligent" is always a bit political. It's a question of assignment of credit/blame for hypothetical worlds, which is pretty much impossible in any real-world causality model.

I do agree that in most discussions, "accident" often implies a single unexpected outcome, rather than a repeated risk profile and multiple moves toward the bad outcome. Even so, if it doesn't reach the level of negligence for any one actor, Eliezer's term "inadequate equilibrium" may be more accurate.

Which means that using a different word will be correctly identified as a desire to shift responsibility from "it's a risk that might happen" to "these entities are bringing that risk on all of us".

Humans do acausal coordination all the time

Dagon2y710

Interesting take, but I'll note that these are not acausal, just indirect-causal. Voting is a good example - counts are public, so future voters KNOW how many of their fellow citizens take it seriously enough to participate.

In all of these examples, there is a signaling path to future impact. Which humans are perhaps over-evolved to focus on.

Unifying Bargaining Notions (1/2)

Dagon3y-22

I really wish you'd included the outside-of-game considerations. The example of what to eat for dinner is OVERWHELMINGLY about the future relationship between the diners, not about the result itself. This is true of all real-world bargaining (where you're making commitments and compromises) - you're giving up some immediate value in order to make future interactions way better.

Oracle predictions don't apply to non-existent worlds

Dagon4y00

Thanks for patience with this. I am still missing some fundamental assumption or framing about why this is non-obvious (IMO, either the Oracle is wrong, or the choice is illusory). I'll continue to examine the discussions and examples in hopes that it will click.

Oracle predictions don't apply to non-existent worlds

Dagon4y00

Hmm. So does this only apply to CDT agents, who foolishly believe that their decision is not subject to predictions?

Oracle predictions don't apply to non-existent worlds

Dagon4y00

Is there an ELI5 doc about what's "normal" for Oracles, and why they're constrained in that way? The examples I see confuse me in that they are exploring what seem like edge cases, and I'm missing the underlying model that makes these cases critical.

Specifically, when you say "It's only guaranteed to be correct on the actual decision", why does the agent not know what "correct" means for the decision?

Oracle predictions don't apply to non-existent worlds

Dagon4y00

Sure, that's a sane Oracle. The Weird Oracle used in so many thought experiments doesn't say ""The taxi will arrive in one minute!", it says "You will grab your coat in time for the taxi.".

A world in which the alignment problem seems lower-stakes

Dagon4y10

I don't follow the half-universe argument. Are you somehow sending the AGI outside of your light-cone? Or have you crafted the AGI utility function and altered your own to not care about the others' half? I don't get the model of utility that works for

The only information you have about the other half is your utility.

My conception of utility is that it's a synthetic calculation from observations about the state of the universe, not that it's a thing on it's own which can carry information.

Open problem: how can we quantify player alignment in 2x2 normal-form games?

Dagon4y00

Sorry, I didn't mean to be accusatory in that, only descriptive in a way that I hope will let me understand what you're trying to model/measure as "alignment", with the prerequisite understanding of what the payout matrix indicates. http://cs.brown.edu/courses/cs1951k/lectures/2020/chapters1and2.pdf is one reference, but I'll admit it's baked in to my understanding to the point that I don't know where I first saw it. I can't find any references to the other interpretation (that the payouts are something other than a ranking of preferences by each player).

So the question is "what DO these payout numbers represent"? and "what other factors go into an agent's decision of which row/column to choose"?