Anthropics: different probabilities, different questions

Stuart_Armstrong

I've written before that different theories of anthropic probability are really answers to different questions. In this post I'll try to be as clear as possible on what that means, and explore the implications.

Introduction

One of Nick Bostrom's early anthropic examples involved different numbers of cars in different lanes. Here is a modification of that example:

You're driving along, when you turn into a dark tunnel and are automatically shunted into the left or the right lane. You can't see whether there are any other cars in your dark lane, but the car radio announces "there are cars in the right lane and $1$ in the left lane".

Given that, what is your probability of being in the left lane?

That probability is obviously $1 %$ . More interesting than that answer, is that there are multiple ways of reaching it. And each of these ways corresponds to answering a slightly different question. And this leads to my ultimate answer about anthropic probability:

Each theory of anthropic probability corresponds to answering a specific, different question about proportions. These questions are equivalent in non-anthropic setting, so each of them feels potentially like a "true" extension of probability to anthropics. Paradoxes and confusion in anthropics results from confusing one question with another.

So if I'm asked "what's the 'real' anthropic probability of $X$ ?", my answer is: tell me what you mean by probability, and I'll tell you what the answer is.

0. The questions

If $X$ is a feature that you might or might not have (like being in a left lane), here are several questions that might encode the probability of $X$ :

What proportion of potential observers have $X$ ?
What proportion of potential observers exactly like you have $X$ ?
What is the average proportion of potential observers with $X$ ?
What is the average proportion of potential observers exactly like you with $X$ ?

We'll look at each of these questions in turn^[1], and see what they say imply in anthropic and non-anthropic situations.

1. Proportion of potential observers: SIA

We're trying to answer "Given that, what is your probability of being in the left lane?" The "that" is means being in the tunnel in the above situations, so we're actually looking for a conditional probability, best expressed as:

What proportion of the potential observers, who are in the tunnel in the situation above, are also in the left lane?

The answer for that is an immediate "one in a hundred", or $1 %$ , since we know there are $100$ drivers in the tunnel, and $1$ of them is in the left lane. There may be millions of different tunnels, in trillions of different potential universes; but, assuming we don't need to worry about infinity^[2], we can count $100$ observers in the tunnel in that situation for each observer in the left lane.

1.1 Anthropic variant

Let's now see how this approach generalises to anthropic problems. Here is an anthropic version of the tunnel problem, based on the incubator version of the Sleeping Beauty problem:

A godly AI creates a tunnel, then flips a fair coin. If the coin comes out heads, it will create one person in the tunnel; if it was tails, it creates $99$ people.

You've just woken up in this tunnel; what is the probability that the coin was heads?

So, we want to answer:

What proportion of the potential observers, who are in the tunnel, are also in a world where the coin was heads?

We can't just count off observers within the same universe here, since the $99$ and the $1$ observers don't exist in the same universe. But we can pair up universes here: for each universe where the coin flip goes heads ( $1$ observer), there is another universe of equal probability where the coin flip goes tails ( $99$ observers).

So the answer to the proportion of potential observers question remains $1 %$ , just as in the non-anthropic situation.

This is exactly the "self-indication assumption" (SIA) version of probability, which counts observers in other potential universes as if they existed in a larger multiverse of potential universes^[3].

2. Proportion of potential observers exactly like you: SIA again

Let's now look at the second question:

What proportion of the potential observers exactly like you, who are in the tunnel in the situation above, are also in the left lane?

The phrase "exactly like you" is underdefined - do you require that the other yous be made of exactly the same material, in the same location, etc... I'll cash out the phrase as meaning "has had the same subjective experience as you". So we can cash out the left-lane probability as:

What proportion of the potential observers, with the same subjective experiences as you, who are in the tunnel in the situation above, are also in the left lane?

We can't count off observers within the same universe for this, as the chance of having multiple observers with the same subjective experience in the same universe is very low, unless there are huge numbers of observers.

Instead, assume that one in $Ω$ observers in the tunnel have the same subjective experiences as you. This proportion^[4] must be equal for an observer in the left and right lanes. If it weren't, you could deduce information about which lane you were in just from your experiences - so the proportion being equal is the same thing as the lane and your subjective experiences being independent. For any given little $ω$ , this gives the following proportions (where "Right 1 not you" is short for "the same world as 'Right 1 you,' apart from the first person on the right, who is replaced with a non-you observer"):

So the proportion of observers in the right/left lane with your subjective experience is $1 / Ω$ the proportion of observers in the right/left lane. When comparing those two proportions, the two $1 / Ω$ cancel out, and we get $1 %$ , as before.

2.1 Anthropic variant

Ask the anthropic version of the question:

What proportion of the potential observers who are in the tunnel, with the same subjective experiences as you, are also in a world where the coin was heads?

Then same argument as above shows this is also $1 %$ (where "Tails 1 not you" is short for "the same world as 'Tails 1 you,' apart from the first tails person, who is replaced with a non-you observer"):

This is still SIA, and reflects the fact that, for SIA, the reference class doesn't matter - as long as it include the observers subjectively indistinguishable from you. So questions about you are the same whether we talk about "observers" or "observers with the same subjective experiences as you".

3. Average proportions of observers: SSA

We now turn to the next question:

What is the average proportion of potential observers in the left lane, relative to the average proportion of potential observers in the tunnel?

Within a given world, say there are $N$ observers not in the tunnel and $t$ tunnels, so $N + t 100$ observers in total.

The proportion of observers in the left lane is $t / (N + t 100)$ while the proportion of observers in the tunnel is $100 t / (N + t 100)$ . The ratios of the these proportions in $1 : 100$ .

Then notice that if $a$ and $b$ are in a $1 : 100$ proportion in every possible world, the averages of $a$ and $b$ are in a $1 : 100$ proportion as well^[5], giving the standard probability of $1 %$ .

3.1 Anthropic variant

The anthropic variant of the question is then:

What is the average proportion of potential observers in a world where the coin was heads, relative to the average proportion of potential observers in the tunnel?

Within a given world, ignoring the coin, say there are $N$ observers not in the tunnel, and $t$ tunnels. Let's focus on the case with one tunnel, $t = 1$ . Then the coin toss splits this world into two equally probable worlds, the heads world, $W_{H}$ , with $N + 1$ observers, and the tails world, $W_{T}$ with $N + 99$ observers:

The proportion of observers in tunnels in $W_{H}$ is $\frac{1}{N + 1}$ . The proportion of observers in tunnels in $W_{T}$ is $\frac{99}{N + 99}$ . Hence, across these two worlds, the average proportion of observers in tunnels is the average of these two, specifically

$\frac{1}{2} (\frac{1}{N + 1} + \frac{99}{N + 99}) = \frac{50 N + 99}{(N + 1) (N + 99)} .$

If $N$ is zero, this is $99 / 99 = 1$ ; this is intuitive, since $N = 0$ means that all observers are in tunnels, so the average proportion of observers in tunnels is $1$ .

What about the proportion of observers in the tunnels in the heads worlds? Well, this is $\frac{1}{N + 1}$ is the heads world, and $0$ is the tails world, so the average proportion is:

$\frac{1}{2} (\frac{1}{N + 1} + 0) = \frac{1}{2 (N + 1)} .$

If $N$ is zero, this is $1 / 2$ -- the average between $1$ , the heads world proportion for $N = 0$ in $W_{H}$ (all observers are heads world observers in tunnels) and $0$ , the proportion of heads world observers in the tails world $W_{T}$ .

Taking the ratio $(1 / 2) / 1 = 1 / 2$ , the answer to that question is $1 / 2$ . This is the answer given by the "self-sampling assumption" (SSA), with gives the $1 / 2$ response in the sleeping beauty problem (of which this is a variant).

In general, the ratio would be:

$\frac{1}{2 (N + 1)} \div \frac{50 N + 99}{(N + 1) (N + 99)} = \frac{N + 99}{100 N + 198} .$

If $N$ is very large, this is approximately $1 / 100$ , i.e. the same answer as SIA would give. This shows the fact that, for SSA, the reference class of observers is important. The $N$ , the number of observers that are not in tunnel, define the probability estimate. So how we define observers will determine our probability^[6].

So, for a given pair of worlds equally likely worlds, $W_{H}$ and $W_{T}$ , the ratio of question 3. varies between $1 / 2$ and $1 / 100$ . This holds true for multiple tunnels as well. And it's not hard to see that this implies that, averaging across all worlds, we also get a ratio between $1 / 2$ (all observers in the reference class are in tunnels) and $1 / 100$ (almost no observers in the reference class are in tunnels).

4. Average proportions of observers exactly like you: FNC

Almost there! We have a last question to ask:

What is the average proportion of potential observers in the left lane, with the same subjective experiences as you, relative to the average proportion of potential observers in the tunnel, with the same subjective experiences as you?

I'll spare you the proof that this gives $1 %$ again, and turn directly to the anthropic variant:

What is the average proportion of potential observers in a world where the coin was heads, with the same subjective experiences as you, relative to the average proportion of potential observers in the tunnel, with the same subjective experiences as you?

By the previous section, this is the SSA probability with the reference class of "observers with the same subjective experiences as you". This turns out to be FNC, full non-indexical conditioning (FNC), which involves conditioning on any possible observation you've made, no matter how irrelevant. It's known that if all the observers have made the same observations, this reproduces SSA, but that as the number of unique observations increases, this tends to SIA.

That's because FNC is inconsistent - the odds of heads to tails change based on irrelevant observations which change your subjective experience. Here we can see what's going on: FNC is SSA with the reference class of observers with the same subjective experiences as you. But this reference class is variable: as you observe more, the size of the reference class changes, decreasing^[7] because others in the reference class will observe something different to what you do.

But SSA is not consistent across reference class changes! So FNC is not stable across new observations, even if those observations are irrelevant to the probability being estimated.

For example, imagine that we started, in the tails world, with all $99$ copies exactly identical to you, and then you make a complex observation. Then that world will split in many worlds where there are no exact copies of you (since none of them made exactly the same observation as you), a few worlds where there is one copy of you (that made the same observation as you), and many fewer worlds where there are more than one copy of you:

In the heads world, we only have no exact copies and one exact copy. We can ignore the worlds without observers exactly like us, and concentrate one the worlds with a single observer like us (this represents the vast majority of the probability mass). Then, since there are $99$ possible locations in the tails world and $1$ in the heads world, we get a ratio of roughly $99 : 1$ for tails over heads:

This give a ratio of roughly $100 : 1$ for "any coin result" over heads, and shows why FNC converges to SIA.

5. What decision to make: ADT

There's a fifth question you could ask:

What is the best action I can take, given what I know about the observers, our decision algorithms, and my utility function?

This transforms transforms the probability question into a decision-theoretic question. I've posted at length on Anthropic Decision Theory, which is the answer to that question. Since I've done a lot of work on that already, I won't be repeating that work here. I'll just point out that "what's the best decision" is something that can be computed independently of the various versions of "what's the probability".

5.1 How right do you want to be?

An alternate characterisation of the SIA and SSA questions could be to ask, "If I said 'I have $X$ ', would I want most of my copies to be correct (SIA) or my copies to be correct in most universes (SSA)?"

These can be seen as having two different utility functions (one linear in copies that are correct, one that gives rewards in universes where my copies are correct), and acting to maximise them. See the post here for more details.

6. Some "paradoxes" of anthropic reasoning

Given the above, let's look again at some of the paradoxes of anthropic reasoning. I'll choose three: the Doomsday argument, the presumptuous philosopher, and Robin Hanson's take on grabby aliens.

6.1 Doomsday argument

The Doomsday argument claims that the end of humanity is likely to be at hand - or at least more likely than we might think.

To see how the argument goes, we could ask "what proportion of humans will be in the last $90 %$ of all humans who have ever lived in their universe?" The answer to that is, tautologically^[8], $90 %$ .

The simplest Doomsday argument would then reason from that, saying "with $90 %$ probability, we are in the last $90 %$ of humans in our universe, so, with $90 %$ probability, humanity will end in this universe before it reaches $100$ times the human population to date."

What went wrong there? The use of the term "probability", without qualifiers. The sentence slipped from using probability in terms of ratios within universes (the SSA version) to ratios of which universes we find ourselves in (the SIA version).

As an illustration, imagine that the godly AI creates either world $W_{0}$ (with $0$ humans), $W_{10}$ (with $10$ humans), $W_{100}$ (with $100$ humans), or $W_{1, 000}$ (with $1, 000$ humans). Each option is with probability $1 / 4$ . These human are created in numbered room, in order, starting at room $1$ .

Then we might ask:

A. What proportion of humans are in the last $90 %$ of all humans created in their universe?

That proportion is undefined for $W_{0}$ . But for the other worlds, the proportion is $90 %$ (e.g. humans $2$ through $10$ for $W_{10}$ , humans $11$ through $100$ for $W_{100}$ etc...). Ignoring the undefined world, the average proportion is also $90 %$ .

Now suppose we are created in one of those rooms, and we notice that it is room number $100$ . This rules out worlds $W_{0}$ and $W_{10}$ ; but the average proportion remains $90 %$ .

But we might ask instead:

B. What proportion of humans in room $100$ are in the last $90 %$ of all humans created in their universe?

As before, humans being in room $100$ eliminates worlds $W_{0}$ and $W_{10}$ . The worlds $W_{100}$ and $W_{1, 000}$ are equally likely, and each have one human in room $100$ . In $W_{100}$ , we are in the last $90 %$ of humans; in $W_{1, 000}$ , we are not. So the answer to question B is $50 %$ .

Thus the answer to A is $90 %$ , the answer to B is $50 %$ , and there is no contradiction between these.

Another way of thinking of this: suppose you play a game where you invest a certain amount of coins. With probability $0.9$ , your money is multiplied by ten; with probability $0.1$ , you lost everything. You continue re-investing the money until you lose. This is illustrated by the following diagram, (with the initial investment indicated by green coins):

Then it is simultaneously true that:

$90 %$ of all the coins you earnt were lost the very first time you invested them, and
You have only $10 %$ chance of losing any given investment.

So being more precise about what is meant by "probability" dissolves the Doomsday argument.

6.2 Presumptuous philosopher

Nick Bostrom introduced the presumptuous philosopher thought experiment to illustrate a paradox of SIA:

It is the year 2100 and physicists have narrowed down the search for a theory of everything to only two remaining plausible candidate theories: T1 and T2 (using considerations from super-duper symmetry). According to T1 the world is very, very big but finite and there are a total of a trillion trillion observers in the cosmos. According to T2, the world is very, very, very big but finite and there are a trillion trillion trillion observers. The super-duper symmetry considerations are indifferent between these two theories. Physicists are preparing a simple experiment that will falsify one of the theories. Enter the presumptuous philosopher: “Hey guys, it is completely unnecessary for you to do the experiment, because I can already show you that T2 is about a trillion times more likely to be true than T1!”

The first thing to note is that the presumptuous philosopher (PP) may not even be right under SIA. We could ask:

A. What proportion of the observers exactly like the PP are in the $T_{1}$ universes relative to the $T_{2}$ universes?

Recall that SIA is independent of reference class, so adding "exactly like the PP" doesn't change this. So, what is the answer to A.?

Now, $T_{2}$ universes have a trillion times more observers than the $T_{1}$ universes, but that doesn't necessarily mean that the PP are more likely in them. Suppose that everyone in these universes knows their rank of birth; for the PP it's the number 24601:

Then since all universes have more that 24601 inhabitants, the PP exists equally likely in $T_{1}$ universes as $T_{2}$ universes; the proportion is therefore $50 %$ (interpreting "the super-duper symmetry considerations are indifferent between these two theories" as meaning "the two theories are equally likely").

Suppose however, the the PP does not know their rank, and the $T_{2}$ universes are akin to a trillion independent copies of the $T_{1}$ universes, each of which has an independent chance of generating an exact copy of PP:

Then SIA would indeed shift the odds by a factor of a trillion, giving a proportion of $1 / (10^{12} + 1)$ . But this is not so much a paradox, as the PP is correctly thinking "if all the exact copies of me in the multiverse of possibilities were to guess we were in $T_{2}$ universes, only one in a trillion of them would be wrong".

But if instead we were to ask:

1. What is the average proportion of PPs among other observers, in $T_{1}$ versus $T_{2}$ universes?

Then we would get the SSA answer. If the PPs know their birth rank, this is a proportion of $10^{12} : 1$ in favour of $T_{1}$ universes. That's because there is just one PP in each universe, and a trillion times more people in the $T_{2}$ universes, which dilutes the proportion.

If the PP doesn't know their birth rank, then this proportion is the same^[9] in the $T_{1}$ and $T_{2}$ universes. In probability terms, this would mean a "probability" of $50 %$ for $T_{1}$ and $T_{2}$ .

6.3 Anthropics and grabby aliens

The other paradoxes of anthropic reasoning can be treated similarly to the above. Now let's look at a more recent use of anthropics, due to Robin Hanson, Daniel Martin, Calvin McCarter, and Jonathan Paulson.

The basic scenario is one in which a certain number of alien species are "grabby": they will expand across the universe, at almost the speed of light, and prevent any other species of intelligent life from evolving independently within their expanding zone of influence^[10].

Humanity has not noticed any grabby aliens in the cosmos; so we are not within their zone of influence. If they had started nearby and some time ago - say within the Milky Way and half a million years ago - then they would be here by now.

What if grabby aliens recently evolved a few billion light years away? Well, we wouldn't see them until a few billion years have passed. So we're fine. But if humans had instead evolved several billion years in the future, then we wouldn't be fine: the grabby aliens would have reached this location before then, and prevented us evolving, or at least would have affected us.

Robin Hanson sees this as an anthropic solution to a puzzle: why did humanity evolve early, i.e. only 13.8 billion years after the Big Bang? We didn't evolve as early as we possibly could - the Earth is a latecomer among Earth-like planets. But the smaller stars will last for trillions of years. Most habitable epochs in the history of the galaxy will be on planets around these small stars, way into the future.

One possible solution to this puzzle is grabby aliens. If grabby aliens are likely (but not too likely), then we could only have evolved in this brief window before they reached us. I mentioned that SIA doesn't work for this (for the same reason that it doesn't care about the Doomsday argument). Robin Hanson then responded:

If your theory of the universe says that what actually happened is way out in the tails of the distribution of what could happen, you should be especially eager to find alternate theories in which what happened is not so far into the tails. And more willing to believe those alternate theories because of that fact.

That is essentially Bayesian reasoning. If you have two theories, $T_{1}$ and $T_{2}$ , and your observations are very unlikely given $T_{1}$ but more likely given $T_{2}$ , then this gives extra weight to $T_{2}$ .

Here we could have three theories:

$T_{0}$ : "There are grabby aliens nearby"
$T_{1}$ : "There are grabby aliens a moderate distance away"
$T_{2}$ : "Any grabby aliens are very far away"

The $T_{0}$ can be ruled out by the fact that we exist. Theory $T_{1}$ posits that humans could not have evolved much later than we did (or else the grabby aliens would have stopped us). Theory $T_{2}$ allows for the possibility that humans evolved much later than we did. So, from $T_{2}$ 's perspective, it is "surprising" that we evolved so early; from $T_{1}$ 's perspective, it isn't, as this is the only possible window.

But by "theory of the universe", Robin Hanson meant not only the theory of how the physical universe was, but the anthropic probability theory. The main candidates are SIA and SSA. SIA is indifferent between $T_{1}$ and $T_{2}$ . But SSA prefers $T_{1}$ (after updating on the time of our evolution). So we are more surprised under SIA than under SSA, which, in Bayesian/Robin reasoning, means that SSA is more likely to be correct.

But let's not talk about anthropic probability theories; let's instead see what questions are being answered. SIA is equivalent with asking the question:

What proportions of universes with human exactly like us, have moderately close grabby aliens ( $T_{1}$ ) versus very distant grabby aliens ( $T_{2}$ )?

Or, perhaps more relevant to our future:

In what proportions of universes with human exactly like us, would those humans, upon expanding in the universe, encounter grabby aliens ( $T_{1}$ ) or not encounter them ( $T_{2}$ )?

In contrast, the question SSA is asking is:

What is the average proportion of humans among all observers, in universes where there are nearby grabby aliens ( $T_{1}$ ) versus very distant grabby aliens ( $T_{2}$ )?

If we were launching an interstellar exploration mission, and were asking ourselves what "the probability" of encountering grabby alien life was, then question 1. seems a closer phrasing of that than question 2. is.

And question 2. has the usual reference class problems. I said "observers", but I could have defined this narrowly as "human observers"; in which case it would have given a more SIA-like answer. Or I could have defined it expansively as "all observers, including those that might have been created by grabby aliens"; in that case SSA ceases to prioritise $T_{1}$ theories and may prioritise $T_{2}$ ones instead. In that case, humans are indeed "way out in the tails", given $T_{2}$ : we are the very rare observers that have not seen or been created by grabby aliens.

In fact, the same reasoning that prefers SSA in the first place would have preferences over the reference class. The narrowest reference classes are the least surprising - given that we are humans in the 21st century with this history, how surprising is it that we are humans in the 21st century with this history? - so they would be "preferred" by this argument.

But the real response is that Robin is making a category error. If we substitute "question" for "theory", we can transform his point into:

If your question about the universe gets a very surprising answer, you should be especially eager to ask alternate questions with less surprising answers. And more willing to believe those alternate questions.

We could ask some variants of questions 3. and 4., by maybe counting causally disconnected segments of universes as different universes (this doesn't change questions 1. and 2.). We'll ignore this possibility in this post. ↩︎
And also assuming that the radio's description of the situation is correct! ↩︎
Notice here that I've counted off observers with other observers that have exactly the same probability of existing. To be technical, the question which gives SIA probabilities should be "what proportion of potential observers, weighted by their probability of existing, have $X$ ?" ↩︎
More accurately: probability-weighted proportion. ↩︎
Let $W$ be a set of worlds, $p$ a probability distribution over $W$ . Then the expectation of $a$ is $E (a) = \sum_{W \in W} p (W) a_{W} = \sum_{W \in W} p (W) b_{W} / 100 = (1 / 100) \sum_{W \in W} p (W) b_{W} = (1 / 100) E (b)$ , which is $1 / 100$ times the expectation of $b$ . ↩︎
If we replace "observers" with "observer moments", then this question is equivalent with the probability generated by the Strong Self-Sampling Assumption (SSSA). ↩︎
If you forget some observations, your reference class can increase, as previously different copies become indistinguishable. ↩︎
Assuming the population is divisible by $10$ . ↩︎
As usual with SSA and this kind of question, this depends on how you define the reference class of "other observers", and who counts as a PP. ↩︎
This doesn't mean they will sterilise planets or kill other species; just that any being evolving within their control will be affected by them and know that they're around. Hence grabby aliens are, by definition, not hidden from view. ↩︎