All of Diffractor's Comments + Replies

That original post lays out UDT1.0, I don't see anything about precomputing the optimal policy within it. The UDT1.1 fix of optimizing the global policy instead of figuring out the best thing to do on the fly, was first presented here, note that the 1.1 post that I linked came chronologically after the post you linked.

1Jessica Taylor
Ok, I misunderstood. (See also my post on the relation between local and global optimality, and another post on coordinating local decisions using MCMC)
3Wei Dai
I gave this explanation at the start of the UDT1.1 post:

I think I have a contender for something which evades the conditional-threat issue stated at the end, as well as obvious variants and strengthenings of it, and which would be threat-resistant in a dramatically stronger sense than ROSE.

There's still a lot of things to check about it that I haven't done yet. And I'm unsure how to generalize to the n-player case. And it still feels unpleasantly hacky, according to my mathematical taste.

But the task at least feels possible, now.

EDIT: it turns out it was still susceptible to the conditional-threat issue, but t... (read more)

For 1, it's just intrinsically mathematically appealing (continuity is always really nice when you can get it), and also because of an intution that if your foe experiences a tiny preference perturbation, you should be able to use small conditional payments to replicate their original preferences/incentive structure and start negotiating with that, instead.

I should also note that nowhere in the visual proof of the ROSE value for the toy case, is continuity used. Continuity just happens to appear.

For 2, yes, it's part of game setup. The buttons are of whate... (read more)

2Daniel Kokotajlo
OK, thanks! Continuity does seem appealing to me but it seems negotiable; if you can find an even more threat-resistant bargaining solution (or an equally threat-resistant one that has some other nice property) I'd prefer it to this one even if it lacked continuity.

My preferred way of resolving it is treating the process of "arguing over which equilibrium to move to" as a bargaining game, and just find a ROSE point from that bargaining game. If there's multiple ROSE points, well, fire up another round of bargaining. This repeated process should very rapidly have the disagreement points close in on the Pareto frontier, until everyone is just arguing over very tiny slices of utility.

This is imperfectly specified, though, because I'm not entirely sure what the disagreement points would be, because I'm not sure how the "... (read more)

Yeah, "transferrable utility games" are those where there is a resource, and the utilities of all players are linear in that resource (in order to redenominate everyone's utilities as being denominated in that resource modulo a shift factor). I believe the post mentioned this.

Agreed. The bargaining solution for the entire game can be very different from adding up the bargaining solutions for the subgames. If there's a subgame where Alice cares very much about victory in that subgame (interior decorating choices) and Bob doesn't care much, and another subgame where Bob cares very much about it (food choice) and Alice doesn't care much, then the bargaining solution of the entire relationship game will end up being something like "Alice and Bob get some relative weights on how important their preferences are, and in all the subgam... (read more)

Actually, they apply anyways in all circumstances, not just after the rescaling and shifting is done! Scale-and-shift invariance means that no matter how you stretch and shift the two axes, the bargaining solution always hits the same probability-distribution over outcomes,  so monotonicity means "if you increase the payoff numbers you assign for some or all of the outcomes, the Pareto frontier point you hit will give you an increased number for your utility score over what it'd be otherwise" (no matter how you scale-and-shift). And independence of ir... (read more)

If you're looking for curriculum materials, I believe that the most useful reference would probably be my "Infra-exercises", a sequence of posts containing all the math exercises you need to reinvent a good chunk of the theory yourself. Basically, it's the textbook's exercise section, and working through interesting math problems and proofs on one's own has a much better learning feedback loop and retention of material than slogging through the old posts. The exercises are short on motivation and philosophy compared to the posts, though, much like how a fu... (read more)

So, if you make Nirvana infinite utility, yes, the fairness criterion becomes "if you're mispredicted, you have any probability at all of entering the situation where you're mispredicted" instead of "have a significant probability of entering the situation where you're mispredicted", so a lot more decision-theory problems can be captured if you take Nirvana as infinite utility. But, I talk in another post in this sequence (I think it was "the many faces of infra-beliefs") about why you want to do Nirvana as 1 utility instead of infinite utility.

Parfit's Hi... (read more)

So, the flaw in your reasoning is after updating we're in the city,  doesn't go "logically impossible, infinite utility". We just go "alright, off-history measure gets converted to 0 utility", a perfectly standard update. So  updates to (0,0) (ie, there's 0 probability I'm in this situation in the first place, and my expected utility for not getting into this situation in the first place is 0, because of probably dying in the desert)

As for the proper way to do this analysis, it's a bit finicky. There's something called "acausal form... (read more)

0Thomas Larsen
Thank you so much for your detailed reply. I'm still thinking this through, but this is awesome. A couple things:  1. I don't see the problem at the bottom. I thought we were operating in the setting where Nirvana meant infinite reward? It seems like of course if N is small, we will get weird behavior because the agent will sometimes reason over logically impossible worlds.  2. Is Parfit's Hitchiker with a perfect predictor unsalvageable because it violates this fairness criteria?  3. The fairness criterion in your comment is the pseudocausality condition, right? 

Said actions or lack thereof cause a fairly low utility differential compared to the actions in other, non-doomy hypotheses. Also I want to draw a critical distinction between "full knightian uncertainty over meteor presence or absence", where your analysis is correct, and "ordinary probabilistic uncertainty between a high-knightian-uncertainty hypotheses, and a low-knightian uncertainty one that says the meteor almost certainly won't happen" (where the meteor hypothesis will be ignored unless there's a meteor-inspired modification to what you do that's al... (read more)

Something analogous to what you are suggesting occurs. Specifically, let's say you assign 95% probability to the bandit game behaving as normal, and 5% to "oh no, anything could happen, including the meteor". As it turns out, this behaves similarly to the ordinary bandit game being guaranteed, as the "maybe meteor" hypothesis assigns all your possible actions a score of "you're dead" so it drops out of consideration.

The important aspect which a hypothesis needs, in order for you to ignore it, is that no matter what you do you get the same outcome, whether ... (read more)

2Charlie Steiner
The meteor doesn't have to really flatten things out, there might be some actions that we think remain valuable (e.g. hedonism, saying tearful goodbyes). And so if we have Knightian uncertainty about the meteor, maximin (as in Vanessa's link) means we'll spend a lot of time on tearful goodbyes.

Well, taking worst-case uncertainty is what infradistributions do. Did you have anything in mind that can be done with Knightian uncertainty besides taking the worst-case (or best-case)?

And if you were dealing with best-case uncertainty instead, then the corresponding analogue would be assuming that you go to hell if you're mispredicted (and then, since best-case things happen to you, the predictor must accurately predict you).

1Charlie Steiner
What if you assumed the stuff you had the hypothesis about was independent of the stuff you have Knightian uncertainty about (until proven otherwise)? E.g. if you're making hypotheses about a multi-armed bandit and the world also contains a meteor that might smash through your ceiling and kill you at any time, you might want to just say "okay, ignore the meteor, pretend my utility has a term for gambling wins that doesn't depend on the meteor at all." The reason I want to consider stuff more like this is because I don't like having to evaluate my utility function over all possibilities to do either an argmax or an argmin - I want to be lazy. The weird thing about this is now whether this counts as argmax or argmin (or something else) depends on what my utility function looks like when I do include the meteor. If getting hit by the meteor only makes things worse (though potentially the meteor can still depend on which arm of of the bandit I pull!) then ignoring it is like being optimistic. If it only makes things better (like maybe the world I'm ignoring isn't a meteor, it's a big space full of other games I could be playing) then ignoring it is like being pessimistic.

This post is still endorsed, it still feels like a continually fruitful line of research. A notable aspect of it is that, as time goes on, I keep finding more connections and crisper ways of viewing things which means that for many of the further linked posts about inframeasure theory, I think I could explain them from scratch better than the existing work does. One striking example is that the "Nirvana trick" stated in this intro (to encode nonstandard decision-theory problems), has transitioned from "weird hack that happens to work" to "pops straight out... (read more)

3Richard Ngo
I'm feeling very excited about this agenda. Is there currently a publicly-viewable version of the living textbook? Or any more formal writeup which I can include in my curriculum? (If not I'll include this post, but I expect many people would appreciate a more polished writeup.)
1Charlie Steiner
I'm confused about the Nirvana trick then. (Maybe here's not the best place, but oh well...) Shouldn't it break the instant you do anything with your Knightian uncertainty other than taking the worst-case?

Availability: Almost all times between 10 AM and PM, California time, regardless of day. Highly flexible hours. Text over voice is preferred, I'm easiest to reach on Discord. The LW Walled Garden can also be nice.

A note to clarify for confused readers of the proof. We started out by assuming , and . We conclude  by how the agent works. But the step from there to  (ie, inconsistency of PA) isn't entirely spelled out in this post.

Pretty much, that follows from a proof by contradiction. Assume con(PA) ie , and it happens to be a con(PA) theorem that the agent can't prove in advance what it will do, ie, . (I can spell this out in more detail if anyone wants) However, com... (read more)

In the proof of Lemma 3, it should be 

"Finally, since , we have that .

Thus,  and  are both equal to .

instead.

3Scott Garrabrant
Fixed, Thanks.

Any idea of how well this would generalize to stuff like Chicken or games with more than 2-players, 2-moves?

I don't know, we're hunting for it, relaxations of dynamic consistency would be extremely interesting if found, and I'll let you know if we turn up with anything nifty.

2Stuart Armstrong
Hum... how about seeing enforcement of dynamic consistency as having a complexity/computation cost, and Dutch books (by other agents or by the environment) providing incentives to pay the cost? And hence the absence of these Dutch books meaning there is little incentive to pay that cost?

Looks good. 

Re: the dispute over normal bayesianism: For me, "environment" denotes "thingy that can freely interact with any policy in order to produce a probability distribution over histories". This is a different type signature than a probability distribution over histories, which doesn't have a degree of freedom corresponding to which policy you pick.

But for infra-bayes, we can associate a classical environment with the set of probability distributions over histories (for various possible choices of policy), and then the two distinct notions becom... (read more)

2Rohin Shah
Ah right, that makes sense. That was a mistake on my part, my bad.

I'd say this is mostly accurate, but I'd amend number 3. There's still a sort of non-causal influence going on in pseudocausal problems, you can easily formalize counterfactual mugging and XOR blackmail as pseudocausal problems (you need acausal specifically for transparent newcomb, not vanilla newcomb). But it's specifically a sort of influence that's like "reality will adjust itself so contradictions don't happen, and there may be correlations between what happened in the past, or other branches, and what your action is now, so you can exploit this by ac... (read more)

2Rohin Shah
Thanks for checking! I've changed point 3 to: Re: What I meant was that if you define a Bayesian belief over world-histories (oa)*, that is equivalent to having a Bayesian belief over environments E, which I think you agree with. I've edited slightly to make this clearer.

Sounds like a special case of crisp infradistributions (ie, all partial probability distributions have a unique associated crisp infradistribution)

Given some , we can consider the (nonempty) set of probability distributions equal to  where  is defined. This set is convex (clearly, a mixture of two probability distributions which agree with  about the probability of an event will also agree with  about the probability of an event).

Convex (compact) sets of probability distributions = crisp infradistributions.... (read more)

You're completely right that hypotheses with unconstrained Murphy get ignored because you're doomed no matter what you do, so you might as well optimize for just the other hypotheses where what you do matters. Your "-1,000,000 vs -999,999 is the same sort of problem as 0 vs 1" reasoning is good.

Again, you are making the serious mistake of trying to think about Murphy verbally, rather than thinking of Murphy as the personification of the "inf" part of the  definition of expected value, and writing actual equations. &nb... (read more)

There's actually an upcoming post going into more detail on what the deal is with pseudocausal and acausal belief functions, among several other things, I can send you a draft if you want. "Belief Functions and Decision Theory" is a post that hasn't held up nearly as well to time as "Basic Inframeasure Theory".

1DanielFilan
Thanks for the offer, but I don't think I have room for that right now.

If you use the Anti-Nirvana trick, your agent just goes "nothing matters at all, the foe will mispredict and I'll get -infinity reward" and rolls over and cries since all policies are optimal. Don't do that one, it's a bad idea.

For the concave expectation functionals: Well, there's another constraint or two, like monotonicity, but yeah, LF duality basically says that you can turn any (monotone) concave expectation functional into an inframeasure. Ie, all risk aversion can be interpreted as having radical uncertainty over some aspects of how the environment... (read more)

2Rohin Shah
Sorry, I meant the combination of best-case reasoning (sup instead of inf) and the anti-Nirvana trick. In that case the agent goes "Murphy won't mispredict, since then I'd get -infinity reward which can't be the best that I do". Hmm, that makes sense, I think? Perhaps I just haven't really internalized the learning aspect of all of this.

Maximin, actually. You're maximizing your worst-case result.

It's probably worth mentioning that "Murphy" isn't an actual foe where it makes sense to talk about destroying resources lest Murphy use them, it's just a personification of the fact that we have a set of options, any of which could be picked, and we want to get the highest lower bound on utility we can for that set of options, so we assume we're playing against an adversary with perfectly opposite utility function for intuition. For that last paragraph, translating it back out from the "Murphy" t... (read more)

0awenonian
I'm glad to hear that the question of what hypotheses produce actionable behavior is on people's minds.  I modeled Murphy as an actual agent, because I figured a hypothesis like "A cloaked superintelligence is operating the area that will react to your decision to do X by doing Y" is always on the table, and is basically a template for allowing Murphy to perform arbitrary action Y. I feel like I didn't quite grasp what you meant by "a constraint on Murphy is picked according to this probability distribution/prior, then Murphy chooses from the available options of the hypothesis they picked" But based on your explanation after, it sounds like you essentially ignore hypotheses that don't constrain Murphy, because they act as an expected utility drop on all states, so it just means you're comparing -1,000,000 and -999,999, instead of 0 and 1. For example, there's a whole host of hypotheses of the form "A cloaked superintelligence converts all local usable energy into a hellscape if you do X", and since that's a possibility for every X, no action X is graded lower than the others by its existence. That example is what got me thinking, in the first place, though. Such hypotheses don't lower everything equally, because, given other Laws of Physics, the superintelligence would need energy to hell-ify things. So arbitrarily consuming energy would reduce how bad the outcomes could be if a perfectly misaligned superintelligence was operating in the area. And, given that I am positing it as a perfectly misaligned superintelligence, we should both expect it to exist in the environment Murphy chooses (what could be worse?) and expect any reduction of its actions to be as positive of changes as a perfectly aligned superintelligence's actions could be, since preventing a maximally detrimental action should match, in terms of Utility, enabling a maximally beneficial action. Therefore, entropy-bombs. Thinking about it more, assuming I'm not still making a mistake, this might ju

So, first off, I should probably say that a lot of the formalism overhead involved in this post in particular feels like the sort of thing that will get a whole lot more elegant as we work more things out, but "Basic inframeasure theory" still looks pretty good at this point and worth reading, and the basic results (ability to translate from pseudocausal to causal, dynamic consistency, capturing most of UDT, definition of learning) will still hold up.

Yes, your current understanding is correct, it's rebuilding probability theory in more generality to be sui... (read more)

2Alex Flint
Ah this is helpful, thank you. So let's say I'm estimating the position of a train on a straight section of track as a single real number and I want to do an update each time I receive a noisy measurement of the train's position. Under the theory you're laying out here I might have, say, three Gaussians N(0, 1), N(1, 10), N(4, 6), and rather than updating a single pdf over the position of the train, I'm updating measures associated with each of these three pdf. Is that roughly correct? (I realize this isn't exactly a great example of how to use this theory since train positions are perfectly realizable, but I just wanted to start somewhere familiar to me.) Do you by chance have any worked examples where you go through the update procedure for some concrete prior and observation? If not, do you have any suggestions for what would be a good toy problem where I could work through an update at a very concrete level?

So, we've also got an analogue of KL-divergence for crisp infradistributions. 

We'll be using  and  for crisp infradistributions, and  and  for probability distributions associated with them.  will be used for the KL-divergence of infradistributions, and  will be used for the KL-divergence of probability distributions. For crisp infradistributions, the KL-divergence is defined as

I'm not entirely sure why it's like this, but it has the basic properties yo... (read more)

Potential counterargument: Second-strike capabilities are still relevant in the interstellar setting. You could build a bunch of hidden ships in the oort cloud to ram the foe and do equal devastation if the other party does it first, deterring a first strike even with tensions and an absence of communication. Further, while the "ram with high-relativistic objects" idea works pretty well for preemptively ending a civilization confined to a handful of planets, AI's would be able to colonize a bunch of little asteroids and KBO's and comets in the oort cloud, and the higher level of dispersal would lead to preemptive total elimination being less viable.

1John Maxwell
That's possible, but I'm guessing that it's not hard for a superintelligent AI to suddenly swallow an entire system using something like gray goo.

I will be hosting a readthrough of this sequence on MIRIxDiscord again, PM for a link.

So, here's some considerations (not an actual policy)

It's instructive to look at the case of nuclear weapons, and the key analogies or disanalogies to math work. For nuclear weapons, the basic theory is pretty simple and building the hardware is the hard part, while for AI, the situation seems reversed. The hard part there is knowing what to do in the first place, not scrounging up the hardware to do it.

First, a chunk from Wikipedia

Most of the current ideas of the Teller–Ulam design came into public awareness after the DOE attempted to censor a magazine ar

... (read more)
2Ofer
Publishing under a pseudonym may end up being counterproductive due to the Streisand effect. Identities behind many pseudonyms may suddenly be publicly revealed following a publication on some novel method for detecting similarities in writing style between texts.
1Vanessa Kosoy
Regarding making a policy ahead of time, I think we can have an evolving model of what ingredients are missing to get transformative AI, and some rule of thumb that says how dangerous your result is, given how much progress it makes towards each ingredient (relevant but clearly insufficient < might or might not be sufficient < plausibly a full solution), how concrete/actionable it is (abstract idea < impractical method < practical method) and how original/surprising it is (synthesis of ideas in the field < improvement on idea in the field < application of idea outside the field < completely out of the blue). One problem is, the model itself might be an infohazard. This consideration pushes towards making the guidelines secret in themselves, but that would make it much harder to debate and disseminate them. Also, the new result might have major implications for the model. So, yes, certainly there is no replacement for the inside view, but I still feel that we can have guidelines that help focusing on the right considerations.
0David Manheim
OpenAI's phased release of GPT2 seems like a clear example of exactly this. And there is a forthcoming paper looking at the internal deliberations around this from Toby Shevlane, in addition to his extant work on the question of how disclosure potentially affects misuse.

Maximin over outcomes would lead to the agent devoting all its efforts towards avoiding the worst outcomes, sacrificing overall utility, while maximin over expected value pushes towards policies that do acceptably on average in all of the environments that it may find itself in.

Regarding "why listen to past me", I guess to answer this question I'd need to ask about your intuitions on Counterfactual mugging. What would you do if it's one-shot? What would you do if it's repeated? If you were told about the problem beforehand, would you pay money for a commitment mechanism to make future-you pay up the money if asked? (for +EV)

Yeah, looking back, I should probably fix the m- part and have the signs being consistent with the usual usage where it's a measure minus another one, instead of the addition of two signed measures, one a measure and one a negative measure. May be a bit of a pain to fix, though, the proof pages are extremely laggy to edit.

Wikipedia's definition can be matched up with our definition by fixing a partial order where  iff there's a  that's a sa-measure s.t. , and this generalizes to any closed c... (read more)

We go to the trouble of sa-measures because it's possible to add a sa-measure to an a-measure, and get another a-measure where the expectation values of all the functions went up, while the new a-measure we landed at would be impossible to make by adding an a-measure to an a-measure.

Basically, we've gotta use sa-measures for a clean formulation of "we added all the points we possibly could to this set", getting the canonical set in your equivalence class.

Admittedly, you could intersect with the cone of a-measures again at the end (as we do in the next post... (read more)

I found a paper about this exact sort of thing. Escardo and Olivia call that type signature a "selection functional", and the type signature is called a "quantification functional", and there's several interesting things you can do with them, like combining multiple selection functionals into one in a way that looks reminiscent of game theory. (ie, if has type signature , and has type signature , then has type signature ... (read more)

Oh, I see what the issue is. Propositional tautology given means , not . So yeah, when A is a boolean that is equivalent to via boolean logic alone, we can't use that A for the exact reason you said, but if A isn't equivalent to via boolean logic alone (although it may be possible to infer by other means), then the denominator isn't necessarily small.

Yup, a monoid, because and , so it acts as an identitity element, and we don't care about the order. Nice catch.

You're also correct about what propositional tautology given A means.

1Gurkenglas
Then that minimum does not make a good denominator because it's always extremely small. It will pick phi to be as powerful as possible to make L small, aka set phi to bottom. (If the denominator before that version is defined at all, bottom is a propositional tautology given A.)

(lightly edited restatement of email comment)

Let's see what happens when we adapt this to the canonical instance of "no, really, counterfactuals aren't conditionals and should have different probabilities". The cosmic ray problem, where the agent has the choice between two paths, it slightly prefers taking the left path, but its conditional on taking the right path is a tiny slice of probability mass that's mostly composed of stuff like "I took the suboptimal action because I got hit by a cosmic ray".

There will be 0 utili... (read more)

1Vivek Hebbar
If the agent follows EDT, it seems like you are giving it epistemically unsound credences. In particular, the premise is that it's very confident it will go left, and the consequence is that it in fact goes right. This was the world model's fault, not EDT's fault. (It is notable though that EDT introduces this loopiness into the world model's job.)
1Abram Demski
(lightly edited version of my original email reply to above comment; note that Diffractor was originally replying to a version of the Dutch-book which didn't yet call out the fact that it required an assumption of nonzero probability on actions.) I agree that this Dutch-book argument won't touch probability zero actions, but my thinking is that it really should apply in general to actions whose probability is bounded away from zero (in some fairly broad setting). I'm happy to require an epsilon-exploration assumption to get the conclusion. Your thought experiment raises the issue of how to ensure in general that adding bets to a decision problem doesn't change the decisions made. One thought I had was to make the bets always smaller than the difference in utilities. Perhaps smaller Dutch-books are in some sense less concerning, but as long as they don't vanish to infinitesimal, seems legit. A bet that's desirable at one scale is desirable at another. But scaling down bets may not suffice in general. Perhaps a bet-balancing scheme to ensure that nothing changes the comparative desirability of actions as the decision is made? For your cosmic ray problem, what about:  You didn't specify the probability of a cosmic ray. I suppose it should have probability higher than the probability of exploration. Let's say 1/million for cosmic ray, 1/billion for exploration. Before the agent makes the decision, it can be given the option to lose .01 util if it goes right, in exchange for +.02 utils if it goes right & cosmic ray. This will be accepted (by either a CDT agent or EDT agent), because it is worth approximately +.01 util conditioned on going right, since cosmic ray is almost certain in that case. Then, while making the decision, cosmic ray conditioned on going right looks very unlikely in terms of CDT's causal expectations. We give the agent the option of getting .001 util if it goes right, if it also agrees to lose .02 conditioned on going right & cosmic ray. CDT ag

It actually is a weakening. Because all changes can be interpreted as making some player worse off if we just use standard Pareto optimality, the second condition mean that more changes count as improvements, as you correctly state. The third condition cuts down on which changes count as improvements, but the combination of conditions 2 and 3 still has some changes being labeled as improvements that wouldn't be improvements under the old concept of Pareto Optimality.

The definition of an almost stratified Pareto optimum was adapted from this , and was... (read more)

1Vanessa Kosoy
"the combination of conditions 2 and 3 still has some changes being labeled as improvements that wouldn't be improvements under the old concept of Pareto Optimality." Why? Condition 3 implies that U_{RO,j} \leq U_{RO',j}. So, together with condition 2, we get that U_{RO,j} \leq U_{RO',j} for any j. That precisely means that this is a Pareto improvement in the usual sense.

My initial inclination is to introduce as the space of events on turn , and define and then you can express it as .

The notation for the sum operator is unclear. I'd advise writing the sum as and using an subscript inside the sum so it's clearer what is being substituted where.

1Nisan
The sum isn't over i, though, it's over all possible tuples of length n−1. Any ideas for how to make that more clear?

Wasn't there a fairness/continuity condition in the original ADT paper that if there were two "agents" that converged to always taking the same action, then the embedder would assign them the same value? (more specifically, if , then ) This would mean that it'd be impossible to have be low while is high, so the argument still goes through.

Although, after this whole line of discussion, I'm realizing that there are enough substantial differences between the ori... (read more)

1Jessica Taylor
Yes, the continuity condition on embedders in the ADT paper would eliminate the embedder I meant. Which means the answer might depend on whether ADT considers discontinuous embedders. (The importance of the continuity condition is that it is used in the optimality proof; the optimality proof can't apply to chicken for this reason).
in the ADT paper, the asymptotic dominance argument is about the limit of the agent's action as epsilon goes to 0. This limit is not necessarily computable, so the embedder can't contain the agent, since it doesn't know epsilon. So the evil problem doesn't work.

Agreed that the evil problem doesn't work for the original ADT paper. In the original ADT paper, the agents are allowed to output distributions over moves. I didn't like this because it implicitly assumes that it's possible for the agent to perfectly randomize, an... (read more),

2Jessica Taylor
The fact that we take the limit as epsilon goes to 0 means the evil problem can't be constructed, even if randomization is not allowed. (The proof in the ADT paper doesn't work, but that doesn't mean something like it couldn't possibly work) You're right, this is an error in the proof, good catch. Re chicken: The interpretation of the embedder that I meant is "opponent only uses the embedder where it is up against [whatever policy you plugged in]". This embedder does not get knocked down by the reality filter. Let Et be the embedder. The logical inductor expects Ut to equal the crash/crash utility, and it also expects Et(⌈ADTϵ⌉) to equal the crash/crash utility. The expressions Ut and Et(⌈ADTϵ⌉) are provably equal, so of course the logical inductor expects them to be the same, and the reality check passes. The error in your argument is that you are embedding actions rather than agents. The fact that NeverSwerveBot and ADT both provably always take the straight action does not mean the embedder assigns them equal utilities.

I got an improved reality-filter that blocks a certain class of environments that lead conjecture 1 to fail, although it isn't enough to deal with the provided chicken example and lead to a proof of conjecture 1. (the subscripts will be suppressed for clarity)

Instead of the reality-filter for being

it is now

This doesn't just check whether reality is recovered on average, it also checks whether all the "plausible conditionals" line up as well. Some of the con... (read more)

I figured out what feels slightly off about this solution. For events like "I have a long memory and accidentally dropped a magnet on it", it intuitively feels like describing your spot in the environment and the rules of your environment is much lower K-complexity than finding a turing machine/environment that starts by giving you the exact (long) scrambled sequence of memories that you have, and then resumes normal operating.

Although this also feels like something nearby is actually desired behavior. If you rewrite the tape to be describing som... (read more)

1interstice
Well, it COULD be the case that the K-complexity of the memory-erased AIXI environment is lower, even when it learns that this happened. The reason for this is that there could be many possible past AIXI's who have their memory erased/altered and end up in the same subjective situation. Then the memory-erasure hypothesis can use the lowest K-complexity AIXI who ends up with these memories. As the AIXI learns more it can gradually piece together which of the potential past AIXI's it actually was and the K-complexity will go back up again. EDIT: Oh, I see you were talking about actually having a RANDOM memory in the sense of a random sequence of 1s and 0s. Yeah, but this is no different than AIXI thinking that any random process is high K-complexity. In general, and discounting merging, the memory-altering subroutine will increase the complexity of the environment by a constant plus the complexity of whatever transformation you want to apply to the memories.

Not quite. If taking bet 9 is a prerequisite to taking bet 10, then AIXI won't take bet 9, but if bet 10 gets offered whether or not bet 9 is accepted, then AIXI will be like "ah, future me will take the bet, and wind up with 10+ in the heads world and -20+2 in the tails world. This is just a given. I'll take this +15/-15 bet as it has net positive expected value, and the loss in the heads world is more than counterbalanced by the reduction in the magnitude of loss for the tails world"

Something else feels slightly off, but I can'... (read more)

1Diffractor
I figured out what feels slightly off about this solution. For events like "I have a long memory and accidentally dropped a magnet on it", it intuitively feels like describing your spot in the environment and the rules of your environment is much lower K-complexity than finding a turing machine/environment that starts by giving you the exact (long) scrambled sequence of memories that you have, and then resumes normal operating. Although this also feels like something nearby is actually desired behavior. If you rewrite the tape to be describing some other simple environment, you would intuitively expect the AIXI to act as if it's in the simple environment for a brief time before gaining enough information to conclude that things have changed and rederive the new rules of where it is.

Yup, I meant counterfactual mugging. Fixed.

I think I remember the original ADT paper showing up on agent foundations forum before a writeup on logical EDT with exploration, and my impression of which came first was affected by that. Also, the "this is detailed in this post" was referring to logical EDT for exploration. I'll edit for clarity.

3Jessica Taylor
OK, I helped invent ADT so I know it conceptually came after. (I don't think it was "shortly after"; logical EDT was invented very shortly after logical inductors, in early 2016, and ADT was in late 2016). I think you should link to the ADT paper in the intro section so people know what you're talking about.

I actually hadn't read that post or seen the idea anywhere before writing this up. It's a pretty natural resolution, so I'd be unsurprised if it was independently discovered before. Sorry about being unable to assist.

The extra penalty to describe where you are in the universe corresponds to requiring sense data to pin down *which* star you are near, out of the many stars, even if you know the laws of physics, so it seems to recover desired behavior.

Load More