The solution comes in the next post! Feel free to discuss amongst yourselves.

Reminder: Your sentence should explain impact from all of the perspectives we discussed (from XYZ to humans).

New Comment
18 comments, sorted by Click to highlight new comments since: Today at 6:32 PM

I set a fifteen minute timer, and wrote down my thoughts:

Okay, the main thought I have so far is that the examples mostly seem to separate “Affects personal goals” from “Affects convergent instrumental goals”.

1. The pebbles being changed mostly affects the pebble sorters personal goals, and otherwise has little impact in the scheme of things for how able everyone is to get their goals achieved. In the long term, it doesn’t even affect the pebblesorters’ ability to achieve their goals (there basically is a constant amount of resources in the universe to turn into pebbles and sort, and the amount on their planet is miniscule).

2. The badness of the traffic jam is actually mostly determined by it being bad for most agents’ goals in that situation (constrained to travel by cars and such). I might personally care more if I was in a medical emergency or something, but this wouldn’t be true from the perspective of convergent instrumental goals.

3. Asteroid hitting earth damages any agents’ ability to affect the world. We care more because we’re on the planet, but overall it is determined by convergent instrumental goals.

4. Star exploding is the same as asteroid on earth.

5. Not sure the point of the last one regarding epistemic state.

I have a sense that TurnTrout is suggesting we build an AI that will optimise your personal goals, but while attempting to make no changes on the level of convergent instrumental goals - kill no agents, change no balances of power, move no money, etc.

However, this runs into the standard problem of being useless. For example, you could tell an agent to cure cancer, and then it would cure all the people who have cancer. But then, in order to make sure it changes no balance of power or agent lives or anything, it would make sure to kill all the people who would’ve died, and make sure that other humans do not find out the AI has a cure for cancer. This is so that they don’t start acting very differently (e.g. turning off the AI and then taking the cancer cure, which *would* disturb the balance of power).

Hmm. I do think it would be somewhat useful. Like, this AI would be happy to help you get your goal as long as it didn’t change your power. For example, if you really wanted a good tv show to watch, it would happily create it for you, because it doesn’t change your or other agents’ abilities to affect the universe. (Though I think there are arguments that tv shows do affect humans that way, because the interaction between human values and motivations are a weird mess.) But, for example, I think the pebble sorters could build an AI that is happy to produce things they find valuable, as long as it doesn’t upset broader power balances. Which is potentially quite neat, and if this works out, a valuable insight.

But it doesn’t do the most important things, which are ensure that the future of the universe is also valuable. Because that’s necessarily a power thing - privileging a certain set of values over convergent values. And if you tried to combine such power-averse AIs to do a powerful thing, then they would stop doing it, because they’re superintelligent and would understand that they would be upsetting the power balance.

Okay, that’s my fifteen minutes. I don’t think I got what TurnTrout was trying to guide me to getting, even though I think he previously told me in person and I've read the next post. (Though maybe I did get it and just didn't manage to put it in a pithy sentence. )

Extra: I forgot to analyse the humans being tortured, so I'll add it here: Again, not a big deal from the perspective of any agent, though is low in our personal utility function. I think that a power-averse AI would be happy to find a plan to stop the humans being tortured, as long as it didn't severely incapacitate whatever agents were doing the torturing.

Great responses.

What you're inferring is impressively close to where the sequence is leading in some ways, but the final destination is more indirect and avoids the issues you rightly point out (with the exception of the "ensuring the future is valuable" issue; I really don't think we can or should build eg low-impact yet ambitious singletons - more on that later).

Starting assumptions: impact is measured on a per-belief basis, depends on scale, and is a relative measurement to prior expectation. (This is how I am interpreting the three reminders at the end of the post.)

To me, this sounds like a percent difference. The change between the new value observed and the old value expected (whether based in actual experience or imagined, i.e. accounting for some personal bias) is measured, then divided by the original quantity as a comparison to determine the magnitude of the difference relative to the original expectation.

My sentence: You can tell that something is a big deal to you by how surprising it feels.

While I agree that using percentages would make impact more comparable between agents and timesteps, it also leads to counterintuitive results (at least counterintuitive to me)

Consider a sequence of utilities at times 0, 1, 2 with , and .

Now the drop from to would be more dramatic (decrease by 100%) compared to the drop from to (decrease by 99%) if we were using percentages. But I think the agent should 'care more' about the larger drop in absolute utility (i.e. spend more resources to prevent it from happening) and I suppose we might want to let impact correspond to something like 'how much we care about this event happening'.

Vzcnpg vf gur nzbhag V zhfg qb guvatf qvssreragyl gb ernpu zl tbnyf
Ngyrnfg guerr ovt fgebat vaghvgvbaf. N guvat gung unccraf vs vg gheaf gur erfhygf bs zl pheerag npgvbaf gb or jnl jbefr vf ovt vzcnpg. N guvat gung unccraf vs gur srnfvovyvgl be hgvyvgl bs npgvba ng zl qvfcbfny vf punatrq n ybg gura gung vf n ovt qrny (juvpu bsgra zrnaf gung npgvba zhfg or cresbezrq be zhfg abg or cresbezrq). Vs gurer vf n ybg bs fhecevfr ohg gur jnl gb birepbzr gur fhecevfrf vf gb pneel ba rknpgyl nf V jnf nyernql qbvat vf ybj gb ab vzcnpg.

For ease of reference, I'm going to translate any ROT13 comments into normal spoilers.

Impact is the amount I must do things differently to reach my goals Atleast three big strong intuitions. A thing that happens if it turns the results of my current actions to be way worse is big impact. A thing that happens if the feasibility or utility of action at my disposal is changed a lot then that is a big deal (which often means that action must be performed or must not be performed). If there is a lot of surprise but the way to overcome the surprises is to carry on exactly as I was already doing is low to no impact.

Rot13:

Gur vzcnpg bs na rirag ba lbh vf gur qvssrerapr orgjrra gur rkcrpgrq inyhr bs lbhe hgvyvgl shapgvba tvira pregnvagl gung gur rirag jvyy unccra, naq gur pheerag rkcrpgrq inyhr bs lbhe hgvyvgl shapgvba.

Zber sbeznyyl, jr fnl gung gur rkcrpgrq inyhr bs lbhe hgvyvgl shapgvba vf gur fhz, bire nyy cbffvoyr jbeyqfgngrf K, bs C(K)*H(K), juvyr gur rkcrpgrq inyhr bs lbhe hgvyvgl shapgvba tvira pregnvagl gung n fgngrzrag R nobhg gur jbeyq vf gehr vf gur fhz bire nyy cbffvoyr jbeyqfgngrf K bs C(K|R)*H(K). Gur vzcnpg bs R orvat gehr, gura, vf gur nofbyhgr inyhr bs gur qvssrerapr bs gubfr gjb dhnagvgvrf.

Translation to normal spoiler text:

The impact of an event on you is the difference between the expected value of your utility function given certainty that the event will happen, and the current expected value of your utility function.

More formally, we say that the expected value of your utility function is the sum, over all possible worldstates , of , while the expected value of your utility function given certainty that a statement about the world is true is the sum over all possible worldstates of . The impact of being true, then, is the absolute value of the difference of those two quantities.

I'll take a crack at this.

To a first order approximation, something is a "big deal" to an agent if it causes a "large" swing in its expected utility.

Rot13'd because I might have misformatted

V guvax V zvtug or fcbvyrerq sebz ernqvat gur bevtvany cncre, ohg zl thrff vf "Gur vzcnpg gb fbzrbar bs na rirag vf ubj zhpu vg punatrf gurve novyvgl gb trg jung jr jnag". Uhznaf pner nobhg Vaveba rkvfgvat orpnhfr vg znxrf vg uneqre gb erqhpr fhssrevat naq vapernfr unccvarff. (Abg fher ubj gb fdhner guvf qrsvavgvba bs vzcnpg jvgu svaqvat bhg arj vasb gung jnf nyernql gurer, nf va gur pnfr bs Vaveba, vg unq nyernql rkvfgrq, jr whfg sbhaq bhg nobhg vg.) Crooyvgrf pner nobhg nyy gurve crooyrf orpbzvat bofvqvna orpnhfr vg punatrf gurve novyvgl gb fgnpx crooyrf. Obgu uhznaf naq crooyvgrf pner nobhg orvat uvg ol na nfgrebvq orpnhfr vg'f uneqre gb chefhr bar'f inyhrf vs bar vf xvyyrq ol na nfgrebvq.

Translation to normal spoiler text:

I think I might be spoilered from reading the original paper, but my guess is "The impact to someone of an event is how much it changes their ability to get what we want". Humans care about Iniron existing because it makes it harder to reduce suffering and increase happiness. (Not sure how to square this definition of impact with finding out new info that was already there, as in the case of Iniron, it had already existed, we just found out about it.) Pebblites care about all their pebbles becoming obsidian because it changes their ability to stack pebbles. Both humans and pebblites care about being hit by an asteroid because it's harder to pursue one's values if one is killed by an asteroid.

In draft.js, you have to start a new line. Like a quote. In Markdown, you do spoilers differently.

Check out the FAQ.

Speaking as someone who hit "Edit" on his post over 10 times before checking the FAQ: if you haven't messed with your profile settings about handling comments/posts yet, save yourself some time and just check the FAQ before trying to add spoiler text. The right formatting wasn't as obvious as I expected, although it was simple.

If I'm already somewhat familiar with your work and ideas, do you still recommend these exercises?

Probably, unless you already deeply get the thing the exercise is pointing at. I wrote this sequence in part because my past writing didn't do a great job imparting the important insights. Since I don't a priori know who already does and doesn't get each idea, you might as well follow along (note that usually the exercises are much shorter than this post's).

The spoiler seems to be empty?

So secret that even a spoiler tag wasn't good enough.