All of Diffractor's Comments + Replies

Actually, they apply anyways in all circumstances, not just after the rescaling and shifting is done! Scale-and-shift invariance means that no matter how you stretch and shift the two axes, the bargaining solution always hits the same probability-distribution over outcomes, so monotonicity means "if you increase the payoff numbers you assign for some or all of the outcomes, the Pareto frontier point you hit will give you an increased number for your utility score over what it'd be otherwise" (no matter how you scale-and-shift). And independence of ir... (read more)

If you're looking for curriculum materials, I believe that the most useful reference would probably be my "Infra-exercises", a sequence of posts containing all the math exercises you need to reinvent a good chunk of the theory yourself. Basically, it's the textbook's exercise section, and working through interesting math problems and proofs on one's own has a much better learning feedback loop and retention of material than slogging through the old posts. The exercises are short on motivation and philosophy compared to the posts, though, much like how a fu... (read more)

So, if you make Nirvana infinite utility, yes, the fairness criterion becomes "if you're mispredicted, you have any probability at all of entering the situation where you're mispredicted" instead of "have a significant probability of entering the situation where you're mispredicted", so a lot more decision-theory problems can be captured if you take Nirvana as infinite utility. But, I talk in another post in this sequence (I think it was "the many faces of infra-beliefs") about why you want to do Nirvana as 1 utility instead of infinite utility.

Parfit's Hi... (read more)

So, the flaw in your reasoning is after updating we're in the city, $e_{2}$ doesn't go "logically impossible, infinite utility". We just go "alright, off-history measure gets converted to 0 utility", a perfectly standard update. So $e_{2}$ updates to (0,0) (ie, there's 0 probability I'm in this situation in the first place, and my expected utility for not getting into this situation in the first place is 0, because of probably dying in the desert)

As for the proper way to do this analysis, it's a bit finicky. There's something called "acausal form... (read more)

0Thomas Larsen3y

Thank you so much for your detailed reply. I'm still thinking this through, but this is awesome. A couple things: 1. I don't see the problem at the bottom. I thought we were operating in the setting where Nirvana meant infinite reward? It seems like of course if N is small, we will get weird behavior because the agent will sometimes reason over logically impossible worlds. 2. Is Parfit's Hitchiker with a perfect predictor unsalvageable because it violates this fairness criteria? 3. The fairness criterion in your comment is the pseudocausality condition, right?

Said actions or lack thereof cause a fairly low utility differential compared to the actions in other, non-doomy hypotheses. Also I want to draw a critical distinction between "full knightian uncertainty over meteor presence or absence", where your analysis is correct, and "ordinary probabilistic uncertainty between a high-knightian-uncertainty hypotheses, and a low-knightian uncertainty one that says the meteor almost certainly won't happen" (where the meteor hypothesis will be ignored unless there's a meteor-inspired modification to what you do that's al... (read more)

Something analogous to what you are suggesting occurs. Specifically, let's say you assign 95% probability to the bandit game behaving as normal, and 5% to "oh no, anything could happen, including the meteor". As it turns out, this behaves similarly to the ordinary bandit game being guaranteed, as the "maybe meteor" hypothesis assigns all your possible actions a score of "you're dead" so it drops out of consideration.

The important aspect which a hypothesis needs, in order for you to ignore it, is that no matter what you do you get the same outcome, whether ... (read more)

2Charlie Steiner3y

The meteor doesn't have to really flatten things out, there might be some actions that we think remain valuable (e.g. hedonism, saying tearful goodbyes). And so if we have Knightian uncertainty about the meteor, maximin (as in Vanessa's link) means we'll spend a lot of time on tearful goodbyes.

Well, taking worst-case uncertainty is what infradistributions do. Did you have anything in mind that can be done with Knightian uncertainty besides taking the worst-case (or best-case)?

And if you were dealing with best-case uncertainty instead, then the corresponding analogue would be assuming that you go to hell if you're mispredicted (and then, since best-case things happen to you, the predictor must accurately predict you).

1Charlie Steiner3y

What if you assumed the stuff you had the hypothesis about was independent of the stuff you have Knightian uncertainty about (until proven otherwise)? E.g. if you're making hypotheses about a multi-armed bandit and the world also contains a meteor that might smash through your ceiling and kill you at any time, you might want to just say "okay, ignore the meteor, pretend my utility has a term for gambling wins that doesn't depend on the meteor at all." The reason I want to consider stuff more like this is because I don't like having to evaluate my utility function over all possibilities to do either an argmax or an argmin - I want to be lazy. The weird thing about this is now whether this counts as argmax or argmin (or something else) depends on what my utility function looks like when I do include the meteor. If getting hit by the meteor only makes things worse (though potentially the meteor can still depend on which arm of of the bandit I pull!) then ignoring it is like being optimistic. If it only makes things better (like maybe the world I'm ignoring isn't a meteor, it's a big space full of other games I could be playing) then ignoring it is like being pessimistic.

Diffractor3y60Review for 2020 Review

This post is still endorsed, it still feels like a continually fruitful line of research. A notable aspect of it is that, as time goes on, I keep finding more connections and crisper ways of viewing things which means that for many of the further linked posts about inframeasure theory, I think I could explain them from scratch better than the existing work does. One striking example is that the "Nirvana trick" stated in this intro (to encode nonstandard decision-theory problems), has transitioned from "weird hack that happens to work" to "pops straight out... (read more)

3Richard Ngo3y

I'm feeling very excited about this agenda. Is there currently a publicly-viewable version of the living textbook? Or any more formal writeup which I can include in my curriculum? (If not I'll include this post, but I expect many people would appreciate a more polished writeup.)

1Charlie Steiner3y

I'm confused about the Nirvana trick then. (Maybe here's not the best place, but oh well...) Shouldn't it break the instant you do anything with your Knightian uncertainty other than taking the worst-case?

Solve Corrigibility Week

Finite Factored Sets: Polynomials and Probability

Availability: Almost all times between 10 AM and PM, California time, regardless of day. Highly flexible hours. Text over voice is preferred, I'm easiest to reach on Discord. The LW Walled Garden can also be nice.

Troll Bridge

Diffractor3y20

A note to clarify for confused readers of the proof. We started out by assuming $□ (c r o s s \to U = - 10)$ , and $c r o s s$ . We conclude $□ (c r o s s \to U = 10) \lor □ (c r o s s \to U = 0)$ by how the agent works. But the step from there to $□ ⊥$ (ie, inconsistency of PA) isn't entirely spelled out in this post.

Pretty much, that follows from a proof by contradiction. Assume con(PA) ie $\neg □ ⊥$ , and it happens to be a con(PA) theorem that the agent can't prove in advance what it will do, ie, $\neg □ (\neg c r o s s)$ . (I can spell this out in more detail if anyone wants) However, com... (read more)

In the proof of Lemma 3, it should be

"Finally, since $χ_{C}^{F} (z, z) = z$ , we have that ${poly}_{C}^{F} (z) \cdot {poly}_{B ∖ C}^{F} (z) = Q_{z}^{F}$ .

Thus, $Q_{z}^{F} \cdot Q_{x \cap y \cap z}^{F}$ and $Q_{x \cap z}^{F} \cdot Q_{y \cap z}^{F}$ are both equal to ${poly}_{C}^{F} (x \cap z) \cdot {poly}_{B ∖ C}^{F} (y \cap z) \cdot {poly}_{C}^{F} (z) \cdot {poly}_{B ∖ C}^{F} (z)$ .

instead.

3Scott Garrabrant4y

Fixed, Thanks.

Yet More Modal Combat

Any idea of how well this would generalize to stuff like Chicken or games with more than 2-players, 2-moves?

The Many Faces of Infra-Beliefs

I don't know, we're hunting for it, relaxations of dynamic consistency would be extremely interesting if found, and I'll let you know if we turn up with anything nifty.

2Stuart Armstrong4y

Hum... how about seeing enforcement of dynamic consistency as having a complexity/computation cost, and Dutch books (by other agents or by the environment) providing incentives to pay the cost? And hence the absence of these Dutch books meaning there is little incentive to pay that cost?

The Many Faces of Infra-Beliefs

Looks good.

Re: the dispute over normal bayesianism: For me, "environment" denotes "thingy that can freely interact with any policy in order to produce a probability distribution over histories". This is a different type signature than a probability distribution over histories, which doesn't have a degree of freedom corresponding to which policy you pick.

But for infra-bayes, we can associate a classical environment with the set of probability distributions over histories (for various possible choices of policy), and then the two distinct notions becom... (read more)

2Rohin Shah4y

Ah right, that makes sense. That was a mistake on my part, my bad.

Stuart_Armstrong's Shortform

I'd say this is mostly accurate, but I'd amend number 3. There's still a sort of non-causal influence going on in pseudocausal problems, you can easily formalize counterfactual mugging and XOR blackmail as pseudocausal problems (you need acausal specifically for transparent newcomb, not vanilla newcomb). But it's specifically a sort of influence that's like "reality will adjust itself so contradictions don't happen, and there may be correlations between what happened in the past, or other branches, and what your action is now, so you can exploit this by ac... (read more)

2Rohin Shah4y

Thanks for checking! I've changed point 3 to: Re: What I meant was that if you define a Bayesian belief over world-histories (oa)*, that is equivalent to having a Bayesian belief over environments E, which I think you agree with. I've edited slightly to make this clearer.

Sounds like a special case of crisp infradistributions (ie, all partial probability distributions have a unique associated crisp infradistribution)

Given some $Q$ , we can consider the (nonempty) set of probability distributions equal to $Q$ where $Q$ is defined. This set is convex (clearly, a mixture of two probability distributions which agree with $Q$ about the probability of an event will also agree with $Q$ about the probability of an event).

Convex (compact) sets of probability distributions = crisp infradistributions.... (read more)

You're completely right that hypotheses with unconstrained Murphy get ignored because you're doomed no matter what you do, so you might as well optimize for just the other hypotheses where what you do matters. Your "-1,000,000 vs -999,999 is the same sort of problem as 0 vs 1" reasoning is good.

Again, you are making the serious mistake of trying to think about Murphy verbally, rather than thinking of Murphy as the personification of the "inf" part of the $E_{Ψ} [f] := {inf}_{(m, b) \in Ψ} m (f) + b$ definition of expected value, and writing actual equations. $Ψ$ &nb... (read more)

Diffractor4y20

There's actually an upcoming post going into more detail on what the deal is with pseudocausal and acausal belief functions, among several other things, I can send you a draft if you want. "Belief Functions and Decision Theory" is a post that hasn't held up nearly as well to time as "Basic Inframeasure Theory".

1DanielFilan4y

Thanks for the offer, but I don't think I have room for that right now.

If you use the Anti-Nirvana trick, your agent just goes "nothing matters at all, the foe will mispredict and I'll get -infinity reward" and rolls over and cries since all policies are optimal. Don't do that one, it's a bad idea.

For the concave expectation functionals: Well, there's another constraint or two, like monotonicity, but yeah, LF duality basically says that you can turn any (monotone) concave expectation functional into an inframeasure. Ie, all risk aversion can be interpreted as having radical uncertainty over some aspects of how the environment... (read more)

2Rohin Shah4y

Sorry, I meant the combination of best-case reasoning (sup instead of inf) and the anti-Nirvana trick. In that case the agent goes "Murphy won't mispredict, since then I'd get -infinity reward which can't be the best that I do". Hmm, that makes sense, I think? Perhaps I just haven't really internalized the learning aspect of all of this.

Belief Functions And Decision Theory

Maximin, actually. You're maximizing your worst-case result.

It's probably worth mentioning that "Murphy" isn't an actual foe where it makes sense to talk about destroying resources lest Murphy use them, it's just a personification of the fact that we have a set of options, any of which could be picked, and we want to get the highest lower bound on utility we can for that set of options, so we assume we're playing against an adversary with perfectly opposite utility function for intuition. For that last paragraph, translating it back out from the "Murphy" t... (read more)

0awenonian4y

I'm glad to hear that the question of what hypotheses produce actionable behavior is on people's minds. I modeled Murphy as an actual agent, because I figured a hypothesis like "A cloaked superintelligence is operating the area that will react to your decision to do X by doing Y" is always on the table, and is basically a template for allowing Murphy to perform arbitrary action Y. I feel like I didn't quite grasp what you meant by "a constraint on Murphy is picked according to this probability distribution/prior, then Murphy chooses from the available options of the hypothesis they picked" But based on your explanation after, it sounds like you essentially ignore hypotheses that don't constrain Murphy, because they act as an expected utility drop on all states, so it just means you're comparing -1,000,000 and -999,999, instead of 0 and 1. For example, there's a whole host of hypotheses of the form "A cloaked superintelligence converts all local usable energy into a hellscape if you do X", and since that's a possibility for every X, no action X is graded lower than the others by its existence. That example is what got me thinking, in the first place, though. Such hypotheses don't lower everything equally, because, given other Laws of Physics, the superintelligence would need energy to hell-ify things. So arbitrarily consuming energy would reduce how bad the outcomes could be if a perfectly misaligned superintelligence was operating in the area. And, given that I am positing it as a perfectly misaligned superintelligence, we should both expect it to exist in the environment Murphy chooses (what could be worse?) and expect any reduction of its actions to be as positive of changes as a perfectly aligned superintelligence's actions could be, since preventing a maximally detrimental action should match, in terms of Utility, enabling a maximally beneficial action. Therefore, entropy-bombs. Thinking about it more, assuming I'm not still making a mistake, this might ju

Less Basic Inframeasure Theory

So, first off, I should probably say that a lot of the formalism overhead involved in this post in particular feels like the sort of thing that will get a whole lot more elegant as we work more things out, but "Basic inframeasure theory" still looks pretty good at this point and worth reading, and the basic results (ability to translate from pseudocausal to causal, dynamic consistency, capturing most of UDT, definition of learning) will still hold up.

Yes, your current understanding is correct, it's rebuilding probability theory in more generality to be sui... (read more)

2Alex Flint4y

Ah this is helpful, thank you. So let's say I'm estimating the position of a train on a straight section of track as a single real number and I want to do an update each time I receive a noisy measurement of the train's position. Under the theory you're laying out here I might have, say, three Gaussians N(0, 1), N(1, 10), N(4, 6), and rather than updating a single pdf over the position of the train, I'm updating measures associated with each of these three pdf. Is that roughly correct? (I realize this isn't exactly a great example of how to use this theory since train positions are perfectly realizable, but I just wanted to start somewhere familiar to me.) Do you by chance have any worked examples where you go through the update procedure for some concrete prior and observation? If not, do you have any suggestions for what would be a good toy problem where I could work through an update at a very concrete level?

Introduction to Cartesian Frames

So, we've also got an analogue of KL-divergence for crisp infradistributions.

We'll be using $P$ and $Q$ for crisp infradistributions, and $p$ and $q$ for probability distributions associated with them. $D_{K L}$ will be used for the KL-divergence of infradistributions, and $d_{K L}$ will be used for the KL-divergence of probability distributions. For crisp infradistributions, the KL-divergence is defined as

$D_{K L} (P | Q) := {max}_{q \in Q} {min}_{p \in P} d_{K L} (p | q)$

I'm not entirely sure why it's like this, but it has the basic properties yo... (read more)

John_Maxwell's Shortform

Diffractor4y20

Potential counterargument: Second-strike capabilities are still relevant in the interstellar setting. You could build a bunch of hidden ships in the oort cloud to ram the foe and do equal devastation if the other party does it first, deterring a first strike even with tensions and an absence of communication. Further, while the "ram with high-relativistic objects" idea works pretty well for preemptively ending a civilization confined to a handful of planets, AI's would be able to colonize a bunch of little asteroids and KBO's and comets in the oort cloud, and the higher level of dispersal would lead to preemptive total elimination being less viable.

1John Maxwell4y

That's possible, but I'm guessing that it's not hard for a superintelligent AI to suddenly swallow an entire system using something like gray goo.

Diffractor4y70

I will be hosting a readthrough of this sequence on MIRIxDiscord again, PM for a link.

Needed: AI infohazard policy

Diffractor5y60

So, here's some considerations (not an actual policy)

It's instructive to look at the case of nuclear weapons, and the key analogies or disanalogies to math work. For nuclear weapons, the basic theory is pretty simple and building the hardware is the hard part, while for AI, the situation seems reversed. The hard part there is knowing what to do in the first place, not scrounging up the hardware to do it.

First, a chunk from Wikipedia

Most of the current ideas of the Teller–Ulam design came into public awareness after the DOE attempted to censor a magazine ar

... (read more)

2Ofer5y

Publishing under a pseudonym may end up being counterproductive due to the Streisand effect. Identities behind many pseudonyms may suddenly be publicly revealed following a publication on some novel method for detecting similarities in writing style between texts.

1Vanessa Kosoy5y

Regarding making a policy ahead of time, I think we can have an evolving model of what ingredients are missing to get transformative AI, and some rule of thumb that says how dangerous your result is, given how much progress it makes towards each ingredient (relevant but clearly insufficient < might or might not be sufficient < plausibly a full solution), how concrete/actionable it is (abstract idea < impractical method < practical method) and how original/surprising it is (synthesis of ideas in the field < improvement on idea in the field < application of idea outside the field < completely out of the blue). One problem is, the model itself might be an infohazard. This consideration pushes towards making the guidelines secret in themselves, but that would make it much harder to debate and disseminate them. Also, the new result might have major implications for the model. So, yes, certainly there is no replacement for the inside view, but I still feel that we can have guidelines that help focusing on the right considerations.

0David Manheim5y

OpenAI's phased release of GPT2 seems like a clear example of exactly this. And there is a forthcoming paper looking at the internal deliberations around this from Toby Shevlane, in addition to his extant work on the question of how disclosure potentially affects misuse.

Maximin over outcomes would lead to the agent devoting all its efforts towards avoiding the worst outcomes, sacrificing overall utility, while maximin over expected value pushes towards policies that do acceptably on average in all of the environments that it may find itself in.

Regarding "why listen to past me", I guess to answer this question I'd need to ask about your intuitions on Counterfactual mugging. What would you do if it's one-shot? What would you do if it's repeated? If you were told about the problem beforehand, would you pay money for a commitment mechanism to make future-you pay up the money if asked? (for +EV)

Diffractor5y20

Yeah, looking back, I should probably fix the m- part and have the signs being consistent with the usual usage where it's a measure minus another one, instead of the addition of two signed measures, one a measure and one a negative measure. May be a bit of a pain to fix, though, the proof pages are extremely laggy to edit.

Wikipedia's definition can be matched up with our definition by fixing a partial order where $(m^{'}, b^{'}) \geq (m, b)$ iff there's a $(m^{*}, b^{*})$ that's a sa-measure s.t. $(m, b) + (m^{*}, b^{*}) = (m^{'}, b^{'})$ , and this generalizes to any closed c... (read more)

We go to the trouble of sa-measures because it's possible to add a sa-measure to an a-measure, and get another a-measure where the expectation values of all the functions went up, while the new a-measure we landed at would be impossible to make by adding an a-measure to an a-measure.

Basically, we've gotta use sa-measures for a clean formulation of "we added all the points we possibly could to this set", getting the canonical set in your equivalence class.

Admittedly, you could intersect with the cone of a-measures again at the end (as we do in the next post... (read more)

(A -> B) -> A

Diffractor5y60

I found a paper about this exact sort of thing. Escardo and Olivia call that type signature a "selection functional", and the type signature $(A \to B) \to B$ is called a "quantification functional", and there's several interesting things you can do with them, like combining multiple selection functionals into one in a way that looks reminiscent of game theory. (ie, if $ϵ$ has type signature $(A \to C) \to A$ , and $δ$ has type signature $(B \to C) \to B$ , then $ϵ \otimes δ$ has type signature $((A \times B$ ... (read more)

Counterfactual Induction

Oh, I see what the issue is. Propositional tautology given $A$ means $A ⊢_{p c} ϕ$ , not $A ⊢ ϕ$ . So yeah, when A is a boolean that is equivalent to $⊥$ via boolean logic alone, we can't use that A for the exact reason you said, but if A isn't equivalent to $⊥$ via boolean logic alone (although it may be possible to infer $⊥$ by other means), then the denominator isn't necessarily small.

Counterfactual Induction

Open Problems Regarding Counterfactuals: An Introduction For Beginners

Yup, a monoid, because $ϕ \lor ⊥ = ϕ$ and $A \cup \emptyset = A$ , so it acts as an identitity element, and we don't care about the order. Nice catch.

You're also correct about what propositional tautology given A means.

1Gurkenglas5y

Then that minimum does not make a good denominator because it's always extremely small. It will pick phi to be as powerful as possible to make L small, aka set phi to bottom. (If the denominator before that version is defined at all, bottom is a propositional tautology given A.)

Diffractor6y40

See if this works.

Dutch-Booking CDT

Diffractor6y20

(lightly edited restatement of email comment)

Let's see what happens when we adapt this to the canonical instance of "no, really, counterfactuals aren't conditionals and should have different probabilities". The cosmic ray problem, where the agent has the choice between two paths, it slightly prefers taking the left path, but its conditional on taking the right path is a tiny slice of probability mass that's mostly composed of stuff like "I took the suboptimal action because I got hit by a cosmic ray".

There will be 0 utili... (read more)

1Vivek Hebbar7mo

If the agent follows EDT, it seems like you are giving it epistemically unsound credences. In particular, the premise is that it's very confident it will go left, and the consequence is that it in fact goes right. This was the world model's fault, not EDT's fault. (It is notable though that EDT introduces this loopiness into the world model's job.)

1Abram Demski6y

(lightly edited version of my original email reply to above comment; note that Diffractor was originally replying to a version of the Dutch-book which didn't yet call out the fact that it required an assumption of nonzero probability on actions.) I agree that this Dutch-book argument won't touch probability zero actions, but my thinking is that it really should apply in general to actions whose probability is bounded away from zero (in some fairly broad setting). I'm happy to require an epsilon-exploration assumption to get the conclusion. Your thought experiment raises the issue of how to ensure in general that adding bets to a decision problem doesn't change the decisions made. One thought I had was to make the bets always smaller than the difference in utilities. Perhaps smaller Dutch-books are in some sense less concerning, but as long as they don't vanish to infinitesimal, seems legit. A bet that's desirable at one scale is desirable at another. But scaling down bets may not suffice in general. Perhaps a bet-balancing scheme to ensure that nothing changes the comparative desirability of actions as the decision is made? For your cosmic ray problem, what about: You didn't specify the probability of a cosmic ray. I suppose it should have probability higher than the probability of exploration. Let's say 1/million for cosmic ray, 1/billion for exploration. Before the agent makes the decision, it can be given the option to lose .01 util if it goes right, in exchange for +.02 utils if it goes right & cosmic ray. This will be accepted (by either a CDT agent or EDT agent), because it is worth approximately +.01 util conditioned on going right, since cosmic ray is almost certain in that case. Then, while making the decision, cosmic ray conditioned on going right looks very unlikely in terms of CDT's causal expectations. We give the agent the option of getting .001 util if it goes right, if it also agrees to lose .02 conditioned on going right & cosmic ray. CDT ag

Cooperative Oracles

Diffractor6y10

It actually is a weakening. Because all changes can be interpreted as making some player worse off if we just use standard Pareto optimality, the second condition mean that more changes count as improvements, as you correctly state. The third condition cuts down on which changes count as improvements, but the combination of conditions 2 and 3 still has some changes being labeled as improvements that wouldn't be improvements under the old concept of Pareto Optimality.

The definition of an almost stratified Pareto optimum was adapted from this , and was... (read more)

1Vanessa Kosoy6y

"the combination of conditions 2 and 3 still has some changes being labeled as improvements that wouldn't be improvements under the old concept of Pareto Optimality." Why? Condition 3 implies that U_{RO,j} \leq U_{RO',j}. So, together with condition 2, we get that U_{RO,j} \leq U_{RO',j} for any j. That precisely means that this is a Pareto improvement in the usual sense.

Beliefs at different timescales

Diffractor6y20

My initial inclination is to introduce $X_{n}$ as the space of events on turn $n$ , and define $X_{a : b} := b \prod i = a X_{i}$ and then you can express it as $\sum σ \in X_{k + 2 : k + n} P_{n} (x_{k + 1}, σ | x_{0} . . . x_{k})$ .

Beliefs at different timescales

Diffractor6y10

The notation for the sum operator is unclear. I'd advise writing the sum as $i = k + 2, . . ., k + n$ and using an $i$ subscript inside the sum so it's clearer what is being substituted where.

1Nisan6y

The sum isn't over i, though, it's over all possible tuples of length n−1. Any ideas for how to make that more clear?

Diffractor6y20

Wasn't there a fairness/continuity condition in the original ADT paper that if there were two "agents" that converged to always taking the same action, then the embedder would assign them the same value? (more specifically, if $E_{t} (| A_{t} - B_{t} |) < δ$ , then $E_{t} (| E_{t} (A_{t}) - E_{t} (B_{t}) |) < ϵ$ ) This would mean that it'd be impossible to have $E_{t} (E_{t} (A D T_{t, ϵ}))$ be low while $E_{t} (E_{t} (s t r a i g h t_{t}))$ is high, so the argument still goes through.

Although, after this whole line of discussion, I'm realizing that there are enough substantial differences between the ori... (read more)

1Jessica Taylor6y

Yes, the continuity condition on embedders in the ADT paper would eliminate the embedder I meant. Which means the answer might depend on whether ADT considers discontinuous embedders. (The importance of the continuity condition is that it is used in the optimality proof; the optimality proof can't apply to chicken for this reason).

Diffractor6y10

in the ADT paper, the asymptotic dominance argument is about the limit of the agent's action as epsilon goes to 0. This limit is not necessarily computable, so the embedder can't contain the agent, since it doesn't know epsilon. So the evil problem doesn't work.

Agreed that the evil problem doesn't work for the original ADT paper. In the original ADT paper, the agents are allowed to output distributions over moves. I didn't like this because it implicitly assumes that it's possible for the agent to perfectly randomize, an... (read more),

2Jessica Taylor6y

The fact that we take the limit as epsilon goes to 0 means the evil problem can't be constructed, even if randomization is not allowed. (The proof in the ADT paper doesn't work, but that doesn't mean something like it couldn't possibly work) You're right, this is an error in the proof, good catch. Re chicken: The interpretation of the embedder that I meant is "opponent only uses the embedder where it is up against [whatever policy you plugged in]". This embedder does not get knocked down by the reality filter. Let Et be the embedder. The logical inductor expects Ut to equal the crash/crash utility, and it also expects Et(⌈ADTϵ⌉) to equal the crash/crash utility. The expressions Ut and Et(⌈ADTϵ⌉) are provably equal, so of course the logical inductor expects them to be the same, and the reality check passes. The error in your argument is that you are embedding actions rather than agents. The fact that NeverSwerveBot and ADT both provably always take the straight action does not mean the embedder assigns them equal utilities.

Reflective AIXI and Anthropics

Diffractor7y20

I got an improved reality-filter that blocks a certain class of environments that lead conjecture 1 to fail, although it isn't enough to deal with the provided chicken example and lead to a proof of conjecture 1. (the $t$ subscripts will be suppressed for clarity)

Instead of the reality-filter for $E$ being $| E (E (A D T)) - E (U) | < ϵ$

it is now

$\sum_{F \in E} P (A D T = a m^{F}) \cdot | E (E (A D T) | A D T = a m^{F}) - E (U | A D T = a m^{F}) | < ϵ$

This doesn't just check whether reality is recovered on average, it also checks whether all the "plausible conditionals" line up as well. Some of the con... (read more)

Reflective AIXI and Anthropics

Although this also feels like something nearby is actually desired behavior. If you rewrite the tape to be describing som... (read more)

1interstice7y

Well, it COULD be the case that the K-complexity of the memory-erased AIXI environment is lower, even when it learns that this happened. The reason for this is that there could be many possible past AIXI's who have their memory erased/altered and end up in the same subjective situation. Then the memory-erasure hypothesis can use the lowest K-complexity AIXI who ends up with these memories. As the AIXI learns more it can gradually piece together which of the potential past AIXI's it actually was and the K-complexity will go back up again. EDIT: Oh, I see you were talking about actually having a RANDOM memory in the sense of a random sequence of 1s and 0s. Yeah, but this is no different than AIXI thinking that any random process is high K-complexity. In general, and discounting merging, the memory-altering subroutine will increase the complexity of the environment by a constant plus the complexity of whatever transformation you want to apply to the memories.

Not quite. If taking bet 9 is a prerequisite to taking bet 10, then AIXI won't take bet 9, but if bet 10 gets offered whether or not bet 9 is accepted, then AIXI will be like "ah, future me will take the bet, and wind up with 10+ $ϵ$ in the heads world and -20+2 $ϵ$ in the tails world. This is just a given. I'll take this +15/-15 bet as it has net positive expected value, and the loss in the heads world is more than counterbalanced by the reduction in the magnitude of loss for the tails world"

Something else feels slightly off, but I can'... (read more)

1Diffractor7y

I figured out what feels slightly off about this solution. For events like "I have a long memory and accidentally dropped a magnet on it", it intuitively feels like describing your spot in the environment and the rules of your environment is much lower K-complexity than finding a turing machine/environment that starts by giving you the exact (long) scrambled sequence of memories that you have, and then resumes normal operating. Although this also feels like something nearby is actually desired behavior. If you rewrite the tape to be describing some other simple environment, you would intuitively expect the AIXI to act as if it's in the simple environment for a brief time before gaining enough information to conclude that things have changed and rederive the new rules of where it is.

Yup, I meant counterfactual mugging. Fixed.

Reflective AIXI and Anthropics

I think I remember the original ADT paper showing up on agent foundations forum before a writeup on logical EDT with exploration, and my impression of which came first was affected by that. Also, the "this is detailed in this post" was referring to logical EDT for exploration. I'll edit for clarity.

3Jessica Taylor7y

OK, I helped invent ADT so I know it conceptually came after. (I don't think it was "shortly after"; logical EDT was invented very shortly after logical inductors, in early 2016, and ADT was in late 2016). I think you should link to the ADT paper in the intro section so people know what you're talking about.