One of the main arguments given against Evidential Decision Theory (EDT) is that it would “one-box” in medical Newcomb problems. Whether this is the winning action has been a hotly debated issue on LessWrong. A majority, including experts in the area such as Eliezer Yudkowsky and Wei Dai, seem to think that one should two-box (See e.g. Yudkowsky 2010, p.67). Others have tried to argue in favor of EDT by claiming that the winning action would be to one-box, or by offering reasons why EDT would in some cases two-box after all. In this blog post, I want to argue that EDT gets it right: one-boxing is the correct action in medical Newcomb problems. I introduce a new thought experiment, the Coin Flip Creation problem, in which I believe the winning move is to one-box. This new problem is structurally similar to other medical Newcomb problems such as the Smoking Lesion, though it might elicit the intuition to one-box even in people who would two-box in some of the other problems. I discuss both how EDT and other decision theories would reason in the problem and why people’s intuitions might diverge in different formulations of medical Newcomb problems.

Two kinds of Newcomblike problems

There are two different kinds of Newcomblike problems. In Newcomb’s original paradox, both EDT and Logical Decision Theories (LDT), such as Timeless Decision Theory (TDT) would one-box and therefore, unlike CDT, win $1 million. In medical Newcomb problems, EDT’s and LDT’s decisions diverge. This is because in the latter, a (physical) causal node that isn’t itself a decision algorithm influences both the current world state and our decisions – resulting in a correlation between action and environment but, unlike the original Newcomb, no “logical” causation.

It’s often unclear exactly how a causal node can exert influence on our decisions. Does it change our decision theory, utility function, or the information available to us? In the case of the Smoking Lesion problem, it seems plausible that it’s our utility function that is being influenced. But then it seems that as soon as we observe our utility function (“notice a tickle”; see Eells 1982), we lose “evidential power” (Almond 2010a, p.39), i.e. there’s nothing new to learn about our health by acting a certain way if we already know our utility function. In any case, as long as we don’t know and therefore still have the evidential power, I believe we should use it.

The Coin Flip Creation Problem is an adaption of Caspar Oesterheld’s “Two-Boxing Gene” problem and, like the the latter, attempts to take Newcomb’s original problem and make it into a medical Newcomb problem, triggering the intuition that we should one-box. In Oesterheld’s Two-Boxing Gene, it’s stated that a certain gene correlates with our decision to one-box or two-box in Newcomb’s problem, and that Omega, instead of simulating our decision algorithm, just looks at this gene.

Unfortunately, it’s not specified how the correlation between two-boxing and the gene arises, casting doubt on whether it’s a medical Newcomb problem at all, and whether other decision algorithms would disagree with one-boxing. Wei Dai argues that in the Two-Boxing Gene, if Omega conducts a study to find out which genes correlate with which decision algorithm, then Updateless Decision Theory (UDT) could just commit to one-boxing and thereby determine that all the genes UDT agents have will always correlate with one-boxing. So in some sense, UDT’s genes will still indirectly constitute a “simulation” of UDT’s algorithm, and there is a logical influence between the decision to one-box and Omega’s decision to put $1 million in box A. Similar considerations could apply for other LDTs.

The Coin Flip Creation problem is intended as an example of a problem in which EDT would give the right answer, but all causal and logical decision theories would fail. It works explicitly through a causal influence on the decision theory itself, thus reducing ambivalence about the origin of the correlation.

The Coin Flip Creation problem

One day, while pondering the merits and demerits of different acausal decision theories, you’re visited by Omega, a being assumed to possess flawless powers of prediction and absolute trustworthiness. You’re presented with Newcomb’s paradox, but with one additional caveat: Omega informs you that you weren’t born like a normal human being, but were instead created by Omega. On the day you were born, Omega flipped a coin: If it came up heads, Omega created you in such a way that you would one-box when presented with the Coin Flip Creation problem, and it put $1 million in box A. If the coin came up tails, you were created such that you’d two-box, and Omega didn’t put any money in box A. We don’t know how Omega made sure what your decision would be. For all we know, it may have inserted either CDT or EDT into your source code, or even just added one hard-coded decision rule on top of your messy human brain. Do you choose both boxes, or only box A?

It seems like EDT gets it right: one-boxing is the winning action here. There’s a correlation between our decision to one-box, the coin flip, and Omega’s decision to put money in box A. Conditional on us one-boxing, the probability that there is money in box A increases, and we “receive the good news” – that is, we discover that the coin must have come up heads, and we thus get the million dollars. In fact, we can be absolutely certain of the better outcome if we one-box. However, the problem persists if the correlation between our actions and the content of box A isn’t perfect. As long as the correlation is high enough, it is better to one-box.

Nevertheless, neither causal nor logical counterfactuals seem to imply that we can determine whether there is money in box A. The coin flip isn’t a decision algorithm itself, so we can’t determine its outcome. The logical uncertainty about our own decision output doesn’t seem to coincide with the empirical uncertainty about the outcome of the coin flip. In absence of a causal or logical link between their decision and the content of box A, CDT and TDT would two-box.

Updateless Decision Theory

As far as I understand, UDT would come to a similar conclusion. AlephNeil writes in a post about UDT:

In the Smoking Lesion problem, the presence of a 'lesion' is somehow supposed to cause Player's to choose to smoke (without altering their utility function), which can only mean that in some sense the Player's source code is 'partially written' before the Player can exercise any control over it. However, UDT wants to 'wipe the slate clean' and delete whatever half-written nonsense is there before deciding what code to write.

Ultimately this means that when UDT encounters the Smoking Lesion, it simply throws away the supposed correlation between the lesion and the decision and acts as though that were never a part of the problem.

This approach seems wrong to me. If we use an algorithm that changes our own source code, then this change, too, has been physically determined and can therefore correlate with events that aren’t copies of our own decision algorithm. If UDT reasons as though it could just rewrite its own source code and discard the correlation with the coin flip altogether, then UDT two-boxes and thus by definition ends up in the world where there is no money in box A.

Note that updatelessness seemingly makes no difference in this problem, since it involves no a priori decision: Before the coin flip, there’s a 50% chance of becoming either a one-boxing or a two-boxing agent. In any case, we can’t do anything about the coin flip, and therefore also can’t influence whether box A contains any money.

I am uncertain how UDT works, though, and would be curious about others people’s thoughts. Maybe UDT reasons that by one-boxing, it becomes a decision theory of the sort that would never be installed into an agent in a tails world, thus rendering impossible all hypothetical tails worlds with UDT agents in them. But if so, why wouldn’t UDT “one-box” in the Smoking Lesion? As far as the thought experiments are specified, the causal connection between coin flip and two-boxing in the Coin Flip Creation appears to be no different from the connection between gene and smoking in the Smoking Lesion.

More adaptations and different formalizations of LDTs exist, e.g. Proof-Based Decision Theory. I could very well imagine that some of those might one-box in the thought experiment I presented. If so, then I’m once again curious as to where the benefits of such decision theories lie in comparison to plain EDT (aside from updatelessness – see Concluding thoughts).

Coin Flip Creation, Version 2

Let’s assume UDT would two-box in the Coin Flip Creation. We could alter our thought experiment a bit so that UDT would probably one-box after all:

The situation is identical to the Coin Flip Creation, with one key difference: After Omega flips the coin and creates you with the altered decision algorithm, it actually simulates your decision, just as in Newcomb’s original paradox. Only after Omega has determined your decision via simulation does it decide whether to put money in box A, conditional on your decision. Do you choose both boxes, or only box A?

Here is a causal graph for the first and second version of the Coin Flip Creation problem. In the first version, a coin flip determines whether there is money in box A. In the second one, a simulation of your decision algorithm decides:

Since in Version 2, there’s a simulation involved, UDT would probably one-box. I find this to be a curious conclusion. The situation remains exactly the same – we can rule out any changes in the correlation between our decision and our payoff. It seems confusing to me, then, that the optimal decision should be a different one.

Copy-altruism and multi-worlds

The Coin Flip Creation problem assumes a single world and an egoistic agent. In the following, I want to include a short discussion of how the Coin Flip Creation would play out in a multi-world environment.

Suppose Omega’s coin is based on a quantum number generator and produces 50% heads worlds and 50% tails worlds. If we’re copy-egoists, EDT still recommends to one-box, since doing so would reveal to us that we’re in one of the branches in which the coin came up heads. If we’re copy-altruists, then in practice, we’d probably care a bit less about copies whose decision algorithms have been tampered with, since they would make less effective use of the resources they gain than we ourselves would (i.e. their decision algorithm sometimes behaves differently). But in theory, if we care about all the copies equally, we should be indifferent with respect to one-boxing or two-boxing, since there will always be 50% of us in either of the worlds no matter what we do. The two groups always take the opposite action. The only thing we can change is whether our own copy belongs to the tails or the heads group.

To summarize, UDT and EDT would both be indifferent in the altruistic multi-world case, but UDT would (presumably) two-box, and EDT would one-box, in both the copy-egoistic multi-worlds and in the single-world case.

“But I don’t have a choice”

There seems to be an especially strong intuition of “absence of free will” inherent to the Coin Flip Creation problem. When presented with the problem, many respond that if someone had created their source code, they didn’t have any choice to begin with. But that’s the exact situation in which we all find ourselves at all times! Our decision architecture and choices are determined by physics, just like a hypothetical AI’s source code, and all of our choices will thus be determined by our “creator.” When we’re confronted with the two boxes, we know that our decisions are predetermined, just like every word of this blogpost has been predetermined. But that knowledge alone won’t help us make any decision. As far as I’m aware, even an agent with complete knowledge of its own source code would have to treat its own decision outputs as uncertain, or it would fail to implement a decision algorithm that takes counterfactuals into account.

Note that our decision in the Coin Flip Creation is also no less determined than in Newcomb’s paradox. In both cases, the prediction has been made, and physics will guide our thoughts and our decision in a deterministic and predictable manner. Nevertheless, we can still assume that we have a choice until we make our decision, at which point we merely “find out” what has been our destiny all along.

Concluding thoughts

I hope that the Coin Flip Creation motivates some people to reconsider EDT’s answers in Newcomblike problems. A thought experiment somewhat similar to the Coin Flip Creation can be found in Arif Ahmed 2014.

Of course, the particular setup of the Coin Flip Creation means it isn’t directly relevant to the question of which decision theory we should program into an AI. We obviously wouldn’t flip a coin before creating an AI. Also, the situation doesn’t really look like a decision problem from the outside; an impartial observer would just see Omega forcing you to pick either A or B. Still, the example demonstrates that from the inside view, evidence from the actions we take can help us achieve our goals better. Why shouldn’t we use this information? And if evidential knowledge can help us, why shouldn’t we allow a future AI to take it into account? In any case, I’m not overly confident in my analysis and would be glad to have any mistakes pointed out to me.

Medical Newcomb is also not the only class of problems that challenge EDT. Evidential blackmail is an example of a different problem, wherein giving the agent access to specific compromising information is used to extract money from EDT agents. The problem attacks EDT from a different angle, though: namely by exploiting it’s lack of updatelessness, similar to the challenges in Transparent Newcomb, Parfit’s Hitchhiker, Counterfactual Mugging, and the Absent-Minded Driver. I plan to address questions related to updatelessness, e.g. whether it makes sense to give in to evidential blackmail if you already have access to the information and haven’t precommitted not to give in, at a later point.

Acknowledgement

I wrote this post while working for the Foundational Research Institute, which is now the Center on Long-Term Risk
New Comment
13 comments, sorted by Click to highlight new comments since: Today at 10:28 PM

There seems to be an especially strong intuition of “absence of free will” inherent to the Coin Flip Creation problem. When presented with the problem, many respond that if someone had created their source code, they didn’t have any choice to begin with. But that’s the exact situation in which we all find ourselves at all times!

I think this is missing the point of the objection.

Consider the three different decision theories, CDT, EDT, and LDT; suppose there are three gurus who teach those decision theories to any orphans left in their care. And suppose Omega does the coin flip six times, ends up with three heads children and three tails children, and gives a matched pair to each of the gurus.

When the day comes, the first set of children reason that they can't change the coinflip because the lack of causal dependence, and try to take both boxes. One succeeds, and the other discovers that, mysteriously, they one-boxed instead, and got the million.

The second set of children reason that taking one box is correlated with having the million, and so they try to take just the one box. One succeeds, and the other discovers that, mysteriously, they two-boxed instead, and only got the thousand.

The third set, you know the drill. One one-boxes, the other two-boxes.

The point of decision theories is not that they let you reach from beyond the Matrix and change reality in violation of physics; it's that you predictably act in ways that optimize for various criteria. But this is a decision problem where your action has been divorced from your intended action, and so attributing the victory of heads children to EDT is mistaken, because of the tails child with EDT who wanted to two-box but couldn't.


(Also, Betteridge's Law.)

"because of the tails child with EDT who wanted to two-box but couldn't."

This is also a very common situation in the real world: deciding to do something and then going and doing something else instead, like when you decide to do your work and then waste your time instead.

The point of decision theories is not that they let you reach from beyond the Matrix and change reality in violation of physics; it's that you predictably act in ways that optimize for various criteria.

I agree with this. But I would argue that causal counterfactuals somehow assume that we can "reach from beyond the Matrix and change reality in violation of physics". They work by comparing what would happen if we detached our “action node” from its ancestor nodes and manipulated it in different ways. So causal thinking in some way seems to violate the deterministic way the world works. Needless to say, all decision theories somehow have to reason through counterfactuals, so they all have to form “impossible” hypotheses. My point is that if we assume that we can have a causal influence on the future, then this is already a kind of violation of determinism, and I would reason that assuming that we can also have a retro-causal one on the past doesn’t necessarily make things worse. In some sense, it might even be more in line with how the world works: the future is as fixed as the past, and the EDT approach is to merely “find out” which respective past and future are true.

But this is a decision problem where your action has been divorced from your intended action, and so attributing the victory of heads children to EDT is mistaken, because of the tails child with EDT who wanted to two-box but couldn't.

Hmm, I'm not sure. It seems as thought in your setup, the gurus have to change the children's decision algorithms, in which case of course the correlation would vanish. Or the children use a meta decision theory like "think about the topic and consider what the guru tells you and then try to somehow do whatever winning means". But if Omega created you with the intention of making you one-box or two-box, it could easily just have added some rule or change the meta theory so that you would end up just not being convinced of the "wrong" theory. You would have magically ended up doing (and thinking) the right thing, without "wanting" but not "being able to". I mean, I am trying to convince you of some decision theory right now, and you already have some knowledge and meta decision theory that ultimately will lead you to either adopt or reject it. Maybe the fact that you're not yet convinced shows that you're living in the tails world? ;) Maybe Omega's trick is to make the tails people think about guru cases in order to get them to reject EDT?

One could maybe even object to Newcomb's original problem on similar grounds. Imagine the prediction has already been made 10 years ago. You learned about decision theories and went to one of the gurus in the meantime, and are now confronted with the problem. Are you now free to choose or does the prediction mess with your new, intended action, so that you can't choose the way you want? I don't believe so – you'll feel just as free to choose as if the prediction had happened 10 minutes ago. Only after deciding freely, you find out that you have been determined to decide this way from the beginning, because Omega of course also accounted for the guru.

In general, I tend to think that adding some "outside influence" to a Newcomb's problem either makes it a different decision problem, or it's irrelevant and just confuses things.

So causal thinking in some way seems to violate the deterministic way the world works.

I agree there's a point here that lots of decision theories / models of agents / etc. are dualistic instead of naturalistic, but I think that's orthogonal to EDT vs. CDT vs. LDT; all of them assume that you could decide to take any of the actions that are available to you.

My point is that if we assume that we can have a causal influence on the future, then this is already a kind of violation of determinism

I suspect this is a confusion about free will. To be concrete, I think that a thermostat has a causal influence on the future, and does not violate determinism. It deterministically observes a sensor, and either turns on a heater or a cooler based on that sensor, in a way that does not flow backwards--turning on the heater manually will not affect the thermostat's attempted actions except indirectly through the eventual effect on the sensor.

One could maybe even object to Newcomb's original problem on similar grounds. Imagine the prediction has already been made 10 years ago. You learned about decision theories and went to one of the gurus in the meantime, and are now confronted with the problem. Are you now free to choose or does the prediction mess with your new, intended action, so that you can't choose the way you want?

This depends on the formulation of Newcomb's problem. If it says "Omega predicts you with 99% accuracy" or "Omega always predicts you correctly" (because, say, Omega is Laplace's Demon), then Omega knew that you would learn about decision theory in the way that you did, and there's still a logical dependence between the you looking at the boxes in reality and the you looking at the boxes in Omega's imagination. (This assumes that the 99% fact is known of you in particular, rather than 99% accuracy being something true of humans in general; this gets rid of the case that 99% of the time people's decision theories don't change, but 1% of the time they do, and you might be in that camp.)

If instead the formulation is "Omega observed the you of 10 years ago, and was able to determine whether or not you then would have one-boxed or two-boxed on traditional Newcomb's with perfect accuracy. The boxes just showed up now, and you have to decide whether to take one or both," then the logical dependence is shattered, and two-boxing becomes the correct move.

If instead the formulation is "Omega observed the you of 10 years ago, and was able to determine whether or not you then would have one-boxed or two-boxed on this version of Newcomb's with perfect accuracy. The boxes just showed up now, and you have to decide whether to take one or both," then the logical dependence is still there, and one-boxing is the correct move.

(Why? Because how can you tell whether you're the actual you looking at the real boxes, or the you in Omega's imagination, looking at simulated boxes?)

I suspect this is a confusion about free will. To be concrete, I think that a thermostat has a causal influence on the future, and does not violate determinism. It deterministically observes a sensor, and either turns on a heater or a cooler based on that sensor, in a way that does not flow backwards--turning on the heater manually will not affect the thermostat's attempted actions except indirectly through the eventual effect on the sensor.

Fair point :) What I meant was that for every world history, there is only one causal influence I could possibly have on the future. But CDT reasons through counterfactuals that are physically impossible (e.g. two-boxing in a world where there is money in box A), because it combines world states with actions it wouldn't take in those worlds. EDT just assumes that it's choosing between different histories, which is kind of "magical", but at least all those histories are internally consistent. Interestingly, e.g. Proof-Based DT would probably amount to the same kind of reasoning? Anyway, it's probably a weak point if at all, and I fully agree that the issue is orthogonal to the DT question!

I basically agree with everything else you write, and I don't think it contradicts my main points.

This is similar to the ASP problem, an unusual anthropic use case. The issue with UDT is that it's underspecified for such cases, but I think some of its concepts are still clearer than the classical probability/causality language.

UDT can be reframed in the following way. There is an abstract agent that's not part of any real world of interest, which is just a process that runs according to its program and can't be disrupted with an anvil dropped on its head. It covers all possibilities, so it includes more than one history. Worlds can "incarnate" parts of this process, either directly, by straightforward interpretation of its program with particular observations fed to it, or indirectly, by reasoning about it. As a result, certain events in certain worlds are controlled by the abstract process through such incarnations. (This imagery doesn't apply to PD though, where the controlled thing is not an event in a world; this restriction puts it closer to what TDT does, whereas proof-based UDT is more general.)

The normal way of describing UDT's algorithm (in this restricted form) is that there are three phases. In the first phase, usually screened off by the problem statement, the agent identifies the events in the worlds of interest that it controls. Then, in the second phase, it examines the consequences of the possible action strategies, and selects a strategy. In the third phase, it enacts the strategy, selecting a concrete action depending on observations.

The problem with this in anthropic problems, such as ASP and your Coin Flip Creation problem, is that strategy-selection and action-selection can affect which events are influenced by incarnations of the agent. Some of the computations that could be performed on any of the phases make it impossible to incarnate the agent in some of the situations where it would otherwise get to be incarnated, so the results of the first phase can depend on how the agent is thinking on the subsequent phases. For example, if the agent is just simulated to completion, then it loses access to the action if it takes too long to complete. This also applies to abstract reasoning about the agent, where it can diagonalize that reasoning to make it impossible.

So an agent should sometimes decide how to think, in a way that doesn't discourage too many situations in the worlds where it's thinking that. This creates additional problems (different agents that have to think differently, unlike the unified UDT), but that's outside the scope of this post. For ASP, the trick is to notice how simple its thinking has to be to retain control over Predictor's prediction, and to make the decision within that constraint.

For Coin Flip Creation, an agent that decides counter to its gene doesn't get to inhabit the world with that gene, since there is no difference between the decision making setups in the two worlds other than the agents who are making the decision. The agent will be "eliminated" by Omega from the world whose gene is different from the agent's decision (i.e. not allowed to reach the decision making setup, via an arrangement of the initial conditions), and instead a different agent will be put in control in that world. So one-boxing makes the two-box gene world inaccessible to the agent, and conversely. Since I assume randomizing is impossible or punished in some way, the choice is really between which world the agent will inhabit, in which case the one-box world seems a bit better (the other world will be inhabited by an agent with a different decision theory, possibly a crazier one, less capable of putting money to good use). If the agent is "altruistic" and doesn't expect much difference in how its counterpart will manage its funds, the choice doesn't matter. On the other hand, if the agent were told its gene, then it should just go with it (act according to the gene), since that will give it access to both worlds (in this case, it doesn't matter at all what's in the boxes).

Thanks for your comment! I find your line of reasoning in the ASP problem and the Coin Flip Creation plausible. So your point is that, in both cases, by choosing a decision algorithm, one also gets to choose where this algorithm is being instantiated? I would say that in the CFC, choosing the right action is sufficient, while in the ASP you also have to choose the whole UDP program so as to be instantiated in a beneficial way (similar to the distinction of how TDT iterates over acts and UDT iterates over policies).

Would you agree that the Coin Flip Creation is similar to e.g. the Smoking Lesion? I could also imagine that by not smoking, UDT would become more likely to be instantiated in a world where the UDT agent doesn't have the gene (or that the gene would eliminate (some of) the UDT agents from the worlds where they have cancer). Otherwise there couldn't be a study showing a correlation between UDT agents' genes and their smoking habits. If the participants of the study used a different decision theory or, unlike us, didn't have knowledge of the results of the study, UDT would probably smoke. But in this case I would argue that EDT would do so as well, since conditioning on all of this information puts it out of the reference class of the people in the study.

One could probably generalize this kind of "likelihood of being instantiated" reasoning. My guess would be that an UDT version that takes it into account might behave according to conditional probabilities like EDT. Take e.g. the example from this post by Nate Soares. If there isn't a principled difference to the Coin Flip Case that I've overlooked, then UDT might reason that if it takes "green", it will become very likely that it will be instantiated only in a world where gamma rays hit the UDT agent (since apparently, UDT agents that choose green are "eliminated" from worlds without gamma rays – or at least that's what I have to assume if I don't know any additional facts). Therefore our specified version of UDT takes the red box. The main argument I'm trying to make is that if you solve the problem like this, then UDT would (at least here, and possibly in all cases) become equivalent to updateless EDT. Which as far as I know would be a relief, since (u)EDT seems easier to formalize?

So your point is that, in both cases, by choosing a decision algorithm, one also gets to choose where this algorithm is being instantiated?

To clarify, it's the algorithm itself that chooses how it behaves. So I'm not talking about how algorithm's instantiation depends on the way programmer chooses to write it, instead I'm talking about how algorithm's instantiation depends on the choices that the algorithm itself makes, where we are talking about a particular algorithm that's already written. Less mysteriously, the idea of algorithm's decisions influencing things describes a step in the algorithm, it's how the algorithm operates, by figuring out something we could call "how algorithm's decisions influence outcomes". The algorithm then takes that thing and does further computations that depend on it.

My thoughts:

1) "Copy-egoistic" and "copy-altruistic" seems misleading, because Omega creates different agents in the heads and tails case. Plain "egoistic" and "altruistic" would work though.

2) Multiple worlds vs single world should be irrelevant to UDT.

3) I think UDT would one-box if it's egoistic, and be indifferent if it's altruistic.

Here's why I think egoistic UDT would one-box. From the problem setup it's provable that one-boxing implies finding money in box A. That's exactly the information that UDT requires for decision making ("logical counterfactual"). It doesn't need to deduce unconditionally that there's money in box A or that it will one-box.

I agree with points 1) and 2). Regarding point 3), that's interesting! Do you think one could also prove that if you don't smoke, you can't (or are less likely to) have the gene in the Smoking Lesion? (See also my response to Vladimir Nesov's comment.)

I can only give a clear-cut answer if you reformulate the smoking lesion problem in terms of Omega and specify the UDT agent's egoism or altruism :-)

That's what I was trying to do with the Coin Flip Creation :) My guess: once you specify the Smoking Lesion and make it unambiguous, it ceases to be an argument against EDT.

What exactly do you think we need to specify in the Smoking Lesion?